- Jun 04, 2015
-
-
Jungshik Shin authored
1. Add a one-way (encoding-only/fromUnicode) mapping for U+2212 to Shift_JIS, EUC-JP and ISO-2022-JP. The last just uses Shift_JIS. See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28661 2. Make GBK aliases list compliant to the encoding spec. 3. Add "xA3xA0 => U+3000" to the GBK (windows-936) and gb18030. This makes it possible to remove the corresponding override in Blink 4. Modify the following to GBK (windows-936). See [1] - Add U+01F9 <=> \xA8\xBF - Drop U+E7C8 <=> \xA8\xBF 5. The following change is put on hold (NOT included in the CL) until the resolution of [1] - Add U+1E3F <=> \xA8\xBC - Drop U+E7C7 <=> \xA8\xBC The corresponding Blink CL is https://codereview.chromium.org/1167523003/ [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3 BUG=425417,493824 TEST=Once ICU is rolled to this CL, Blink layout test fast/encoding/*. R=jsbell@chromium.org Review URL: https://codereview.chromium.org/1162723008
-
- Mar 19, 2015
-
-
Jungshik Shin (jungshik at google) authored
1. Update ucmlocal.mk and convertrs.txt to refer to euc-kr-html.ucm instead of windows-949.ucm 2. Tighten up the valid code range for the following converters: EUC-KR, Shift_JIS, Big5 This is to add back an ASCII range byte to the stream per the encoding spec when they're either illegal as a 'trail byte' or there's no assigned code point for a "lead + trail" sequence. For instance, with this change, '0xF3 0x41' in EUC-KR is converted to 'U+FFFD U+0041' instead of 'U+FFFD'. This change requires adding 2 ~ 8 new states to the conversion table of each converter mentioned above leading to 6.5kB net increase in the final data size. 3. Tighten the trail byte range for 2-byte sequences starting with 0x8E from [A1,E2] to [A1,DF] in EUC-JP and update the corresponding generating script. 4. Change the substitution characters for EUC-JP and Shift_JIS to match other converters. i.e. make them produce U+FFFD when encountering an invalid input. Before this chaange, they emitted U+001A. 5. Enable 'U_CHARSET_IS_UTF8' configuration flag. Chromium/Blink does not rely on ICU for the code conversion between the 'system native encoding' (if it's one of legacy encodings) and Unicode. With this configuration, we can cut down the code size a bit. 6. Update the icudtl.dat (all platforms) and assembly files (mac,linux) and the icudata dll (windows) See https://codereview.chromium.org/1026453002 for a new blink test added ( fast/encoding/char-decoding-invalid-trail.html ) BUG=450312,430823 TEST=Blink: fast/encoding/char-decoding-{truncated,invalid-trail}.html TEST=base_unittests --gtest_filter=*Conv*, browser_tests --gtest_filter=*ncoding* R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/984233002
-
- Jan 21, 2015
-
-
Jungshik Shin (jungshik at google) authored
A. Converter update per HTML encoding spec along with changes in the encoding name alias table. B. Remove all the codes for converters Blink and Chromium do not need (SCSU, Lotus, ISO-2022-xx other than JP, BOCU, UTF-7, etc). This is reapplying the following CLs (that we used for ICU 52.1) to ICU 54.1 : https://codereview.chromium.org/598383002 https://codereview.chromium.org/654153002 We have two upstream bugs filed for A and B above: http://www.icu-project.org/trac/ticket/11296 http://www.icu-project.org/trac/ticket/10303 In addiition to A and B, we unified Big5 and Big5-HKSCS per the encoding spec (bug 277868). That also includes properly supporting the four 2-character sequences ( see http://crbug.com/277868#c3 ). big5_gen.sh deviates from the current spec to work around a bug in the spec. (see https://www.w3.org/Bugs/Public/show_bug.cgi?id=27878) Moreover, ucmlocal.mk is added to list only encodings we want to support. Also, tighten the state table for windows-946-2000.ucm that we use for EUC-KR for now. And, drop 'base' map for windows-{936,949}-2000.ucm. Finally, add euc-kr-html.ucm along with scripts/euckr_gen.sh, but it is not yet used pending the resolution of bug 450312. Data size checkpoint: 20,566,864 bytes (the original ICU 54=25,343,024) BUG=277868, 428145, 450312 TEST=net_unittests --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" TEST=browser_tests --gtest_filter="*ncoding*" TEST=Blink: fast/encoding/* R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/839713003
-
- Sep 02, 2014
-
-
jshin@chromium.org authored
1. Timezone data files (4 of them) in source/data/misc to 2014f (the latest) to prepare for an upcoming Russian timezone change. 2. Add Shift_JIS converter compliant to the WHATWG encoding spec. 3. Update converters.txt and ucmlocal.mk accordingly 4. Update the pre-built data files for Linux/Mac/Android/Windows. (icudt.dll is not updated in this CL. It's not used in the default configuration. It'll be updated in a separate CL). 5. Fix a typo in ibm866_gen.sh. The acual table used does not need a change. BUG=277062,404445 TEST=After rolling icu to this revision, the following tests should pass. TEST=Blink: fast/encoding/* all pass except for fast/encoding/api/ascii-supersets.html that should fail by *passing* the test for Shift_JIS, which is expected to fail. Blink layout tests needs to be updated. TEST=browser_tests --gtest_filter="*ncoding*" TEST=In JS console, run the following to check if Europe/Moscow is 3 hrs ahead of UTC after Oct 26 and 4 hrs ahead before that and if Asia/Kamchatka remains 12 hrs ahead of UTC. nov1_2014_1500=new Date("11/01/2014 15:00Z") nov1_2014_1500.toLocaleString("en", {timeZone: "Europe/Moscow"}) nov1_2014_1500.toLocaleString("en", {timeZone: "UTC"}) nov1_2014_1500.toLocaleString("en", {timeZone: "Asia/Kamchatka"}) oct24_2014_1500=new Date("10/24/2014 15:00Z") oct24_2014_1500.toLocaleString("en", {timeZone: "Europe/Moscow"}) oct24_2014_1500.toLocaleString("en", {timeZone: "UTC"}) oct24_2014_1500.toLocaleString("en", {timeZone: "Asia/Kamchatka"}) TEST=net_unittest --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" R=jsbell@chromium.org Review URL: https://codereview.chromium.org/497543003 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@291774 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-