- Jan 29, 2016
-
-
Jungshik Shin authored
* Update the pre-built ICU data files for all platforms source/data/in/icudtl.dat for non-Android platforms {linux,mac}/icudt*.S for linux/mac android/icudtl.dat and android/icudt*.S for Android windows/icudt.dll for Windows * Update Android data trimming script 1. Make sure that 'default' calendar is kept in locales where it's relevant : root, th, fa, ar_SA, etc. 2. Add a minimal region data to work around a bug in ICU with pool.res handling * Update gn and gyp files * And add a TODO comment to update.sh to automate the build file update. * Add it_CH to the locale list. * Add sr_Latn to unit/reslocal.mk (required by sh) and line_normal_fi to brkitr/brklocal.mk (referred to in brkitr/fi.txt) in place of line_fi. * Update and add scripts for data building * Completely rewrite README.chromium * Check-in the prebuilt ICU data files/assembly sources for Linux,Mac,Windows,Chrome OS and Android. BUG=575007 TEST=Blink layout tests, webkit unittests TEST=All bots can build successfully TEST=net_unittests --gtest_filter="*ilenameUtil*" TEST=net_unittests --gtest_filter="*IDN*" (pending bug 336973) TEST=base_unittests --gtest_filter="*Conv*" TEST=browser_tests --gtest_filter="*ncoding*" TEST=base_unittests --gtest_filter="*essage*" TEST=ui_base_unittests --gtest_filter="*ormat*" TEST=ui_base_unittests --gtest_filter="L10n*" R=mark@chromium.org Review URL: https://codereview.chromium.org/1639543006 .
-
- Dec 14, 2015
-
-
Jungshik Shin authored
* Big5 : https://www.w3.org/Bugs/Public/show_bug.cgi?id=27878 Special case the following four more code points in addition to U+5341, U+5345 that are already special cased. U+2550, U+255E, U+2561, U+256A For those 6 code points, the last pointer instead of the first pointer in index-big5.txt is used for round-trip. The first pointer is for decoding-only. * KOI8-U ( https://www.w3.org/Bugs/Public/show_bug.cgi?id=17053 ) - 0xAE and 0xBE are mapped to U+04[50]E instead of U+255[DC]. - Add an alias KOI8-RU BUG=544228 TEST=1. http://goo.gl/reGQPU : encoding(form) test 2. Layout test: fast/encoding/* R=jsbell@chromium.org Review URL: https://codereview.chromium.org/1514253003 .
-
- Mar 19, 2015
-
-
Jungshik Shin (jungshik at google) authored
1. Update ucmlocal.mk and convertrs.txt to refer to euc-kr-html.ucm instead of windows-949.ucm 2. Tighten up the valid code range for the following converters: EUC-KR, Shift_JIS, Big5 This is to add back an ASCII range byte to the stream per the encoding spec when they're either illegal as a 'trail byte' or there's no assigned code point for a "lead + trail" sequence. For instance, with this change, '0xF3 0x41' in EUC-KR is converted to 'U+FFFD U+0041' instead of 'U+FFFD'. This change requires adding 2 ~ 8 new states to the conversion table of each converter mentioned above leading to 6.5kB net increase in the final data size. 3. Tighten the trail byte range for 2-byte sequences starting with 0x8E from [A1,E2] to [A1,DF] in EUC-JP and update the corresponding generating script. 4. Change the substitution characters for EUC-JP and Shift_JIS to match other converters. i.e. make them produce U+FFFD when encountering an invalid input. Before this chaange, they emitted U+001A. 5. Enable 'U_CHARSET_IS_UTF8' configuration flag. Chromium/Blink does not rely on ICU for the code conversion between the 'system native encoding' (if it's one of legacy encodings) and Unicode. With this configuration, we can cut down the code size a bit. 6. Update the icudtl.dat (all platforms) and assembly files (mac,linux) and the icudata dll (windows) See https://codereview.chromium.org/1026453002 for a new blink test added ( fast/encoding/char-decoding-invalid-trail.html ) BUG=450312,430823 TEST=Blink: fast/encoding/char-decoding-{truncated,invalid-trail}.html TEST=base_unittests --gtest_filter=*Conv*, browser_tests --gtest_filter=*ncoding* R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/984233002
-
- Jan 21, 2015
-
-
Jungshik Shin (jungshik at google) authored
A. Converter update per HTML encoding spec along with changes in the encoding name alias table. B. Remove all the codes for converters Blink and Chromium do not need (SCSU, Lotus, ISO-2022-xx other than JP, BOCU, UTF-7, etc). This is reapplying the following CLs (that we used for ICU 52.1) to ICU 54.1 : https://codereview.chromium.org/598383002 https://codereview.chromium.org/654153002 We have two upstream bugs filed for A and B above: http://www.icu-project.org/trac/ticket/11296 http://www.icu-project.org/trac/ticket/10303 In addiition to A and B, we unified Big5 and Big5-HKSCS per the encoding spec (bug 277868). That also includes properly supporting the four 2-character sequences ( see http://crbug.com/277868#c3 ). big5_gen.sh deviates from the current spec to work around a bug in the spec. (see https://www.w3.org/Bugs/Public/show_bug.cgi?id=27878) Moreover, ucmlocal.mk is added to list only encodings we want to support. Also, tighten the state table for windows-946-2000.ucm that we use for EUC-KR for now. And, drop 'base' map for windows-{936,949}-2000.ucm. Finally, add euc-kr-html.ucm along with scripts/euckr_gen.sh, but it is not yet used pending the resolution of bug 450312. Data size checkpoint: 20,566,864 bytes (the original ICU 54=25,343,024) BUG=277868, 428145, 450312 TEST=net_unittests --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" TEST=browser_tests --gtest_filter="*ncoding*" TEST=Blink: fast/encoding/* R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/839713003
-