- Jul 17, 2017
-
-
Jungshik Shin authored
They're supported by Google Translate and their names need to be shown in the Translate UI. Hmong (hmn) Javanese (jv) Luxembourgish (lb) Samoan (sm) The size impact for desktop Chrome is about 3kB. ( 10,175,056 => 10,178,576) BUG=733398 TEST=See the bug TBR=yyushkina@chromium.org Change-Id: I9ca232343847a9a95c1dabacb6336bd43ba39849 Reviewed-on: https://chromium-review.googlesource.com/574771 Reviewed-by:
Jungshik Shin <jshin@chromium.org>
-
- May 14, 2017
-
-
Jungshik Shin authored
* Highlights: - Emoji 5.0 data (partial; Emoji_Component property not included) - CLDR 31.0.1 (http://blog.unicode.org/2017/03/cldr-version-31-released.html) UTC and GMT are treated as distinct) - New case mapping API for styled text - C++ 11 is required - char16_t for UChar (UTF-16) - Source code is in UTF-8 * Size changes common: 10,130,560 => 10,175,056 android: 6,573,872 => 6,616,864 iOS: 6,562,352 => 6,605,152 On top of ICU 59.1 from the upstream, the following changes were applied. See https://chromium.googlesource.com/chromium/deps/icu/+log/chromium/59staging - Fix C++ 11 string literal assignment issue (upstream bug: 13192) - Fix C4229 warning by MSVC - Apply utf32.patch and include unistr.h in fuzzer_util - Update ICU data files - Fix wpo.patch - Apply Google locale patch and locale1.patch - update readme - Apply breakiterator related patches - Apply and update wpo.patch - Drop unused patch, apply data.build.win.patch, update README.chromium - Add /utf-8 flag for Windows/Visual Studio - Update BUILD.gn for UChar, stubdata and apply data_sym.patch - use stubdata.cpp instead of stubdata.c in icu.gyp - Update icu.gyp* files for v8 - Update BUILD.gn, apply data.build.patch and vscomp.patch - Add new files in ICU 59.1 - Get a fresh copy of ICU 59.1 from the upstream - Update update.sh script TBR=drott@chromium.org, yangguo@chromium.org Bug:699469 TEST: layout tests, all unittests, browser tests Change-Id: Ie1e77323aa0c7f872153680c4deca6471a771a5c Reviewed-on: https://chromium-review.googlesource.com/505173 Reviewed-by:
Jungshik Shin <jshin@chromium.org>
-
- May 05, 2017
-
-
Jungshik Shin authored
Add ios/icudtl.dat and ios/patch_locale.sh. Update README.chromium and BUILD.gn accordingly. Update scripts/copy_data.sh to take (ios|common|android). At the moment, iOS data is almost identical to that of Android, but in the future more cuts may be made (e.g. dictionary data for breakiterator). Bug: 718955 TEST: iOS Chrome works as before. Review-Url: https://codereview.chromium.org/2743123002 .
-
- Apr 11, 2017
-
-
Jungshik Shin authored
Delete empty units,units{Narrow,Short} blocks after trimming units data. Empty units* blocks in en_GB and a few other locales after trimming causes ICU to fail to fall back to get the duration data for those locales. In addition, fix source/data/translit/root_subset.txt. Rule*Ids block has to be present even though it's empty. When dropping Hans-Hant transform rules, root_subset.txt was changed to be completely empty, which broke "components_unittests --g_test_filter=AutofillProfileComparato*" . With these changes, regenerate ICU data files. The size is slightly smaller. android/icudtl.dat 6573872 => 6573792 common/icudt*dat 10130560 => 10130480 BUG=707515,677043,684609 TEST=components_unittests --gtest_filter=AutofillProfileComparato* TEST=ui_base_unittests --gtest_filter=L10nUtilTest.TimeDurationForm* R=derat@chromium.org Review-Url: https://codereview.chromium.org/2812943003 .
-
- Mar 07, 2017
-
-
Jungshik Shin authored
Fix the path for Khmer dictionary and word_ja in data_files_to_preserve.txt. Add icu4c to the upstream repository path. BUG=None Review-Url: https://codereview.chromium.org/2732393002 .
-
- Feb 21, 2017
-
-
Jungshik Shin authored
Size increase for affected data files is as follows: android/icudtl.dat 6573776 -> 6610128 bytes common/icudtb.dat 10130464 -> 10166816 bytes common/icudtl.dat 10130464 -> 10166816 bytes This CL supercedes CL 2328013002. CL by riesa@chromium.org. BUG=684609 R=jshin@chromium.org Review-Url: https://codereview.chromium.org/2652023002 .
-
- Oct 28, 2016
-
-
Jungshik Shin authored
Follw-up to https://chromium.googlesource.com/chromium/deps/icu/+/5feb9ad5 (due to a rietveld issue, part 1 was manually pushed). Update ICU to 58.1 release from ICU 56.1 part2. Listed below a tiny subset of what's new in 58.1: 1. Unicode 9.0 from Unicode 8.0 - Updated character properties including Emoji data up to 4.0beta. - Updated grapheme/word/line breaking rules for Emoji sequences and others. 2. CLDR 30.0.2 from CLDR 28 - Numerous locale data updates/improvements 3. Spoofing API changes 4. Greek uppercasing support as a part of regular case-mapping API. 5. Line breaking rule file format optimization. This change enables me to add CJ loose line breaking rules back (previously, it's dropped to save space) so that Blink can use it for CJ. See http://site.icu-project.org/download/58 for more details on ICU 58.1 and http://site.icu-project.org/download/57 for more details on ICU 57.1 For CLDR 30, see http://cldr.unicode.org/index/downloads/cldr-30 . The size impact: Non-Android: 10,127,200 => 10,128,624 (delta = 1,424 / 0.014%) Android: 6,563,152 => 6,571,936 (delta = 8,784 / 0.13%) Below are the list of changes made on top of the upstream ICU 58.1 in reverse order. Most of these changes were made in 58staging branch to run trybots and cherry-picked back for this CL. See https://chromium.googlesource.com/chromium/deps/icu/+/log/chromium/58staging https://codereview.chromium.org/2447513002/ : cr+blink update cl with 58staging branch head. * Fix a build on Win without std::string (v8) * Add ms932 alias to Shift_JIS * Apply Google-specific locale data patches * Fix a bug in scriptset * Update windows-1255 mapping * Disable C4333 warning by MSVC (harmless) * Apply and update utf32.patch and README.chromium * Update and apply vscomp.patch stringpiece patch removed. VS2015 seems to be fine with a redefinition. * Update pre-built ICU data files Update *local.mk with a new copyright line * Apply more patches The following patches were applied and updated: data_symb, vscomp, wpo The unnecessary part was dropped from vscomp * Update BUILD.gn and icu.gyp* files * Update android/brkitr.patch * Update and apply more patches * Update and apply cjdict.patch Apply data.build.patch * Delete obsolete patches: cmemory,regex * Update README.chromium and apply brkitr patches - Update README.chromium - Remove obsolete patches - Update linebrk.patch and apply it: add back line_loose_cj * Update wordbrk.patch and apply it * Update and apply khmer-dictbe.patch * Update data trimming - android/patch_locale.sh - scripts/trim_data.sh ExemplarCh* removed charac*Label removed relative/relativeTime removed for daysOfWeek and quarter * Update the following patches android/brkitr.patch patches/linebrk.patch patches/data.build.patch * Update cjdict.patch and linebrk.patch BUG=637001 TEST=Layout tests, all unittests, browser tests, ui tests. R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/2442923002 .
-
- Oct 23, 2016
-
-
Jungshik Shin authored
* Note that this CL will be followed by CLs with local changes. Until then, ICU should not be rolled in DEPS. See READ_THIS_FIRST for details. * Adjust scripts/update.sh and scripts/data_files_to_preserve.txt - CLDR/ICU added ckb/ast locale data. Drop them from the list to preserve. - source/layout does not exist in 58.1 any more. * Update the tree to ICU 58.1 from the upstream by running scripts/update.sh * Update README.chromium and add READ_THIS_FIRST to warn about the status of the tree. BUG=637001 TEST=None
-
- Oct 21, 2016
-
-
Jungshik Shin authored
There's no need for VS build files. Besides, update scripts/update.sh to post-edit source/configure for missing test/ directory. This clean up is necessary to get 'git cl upload/rietveld' work smoothly in an upcoming ICU update. BUG=637001 TEST=source/runConfigureICU Linux --disable-tests --disable-layout Review URL: https://codereview.chromium.org/2443653002 .
-
Jungshik Shin authored
We don't use source/test. It's kept to give API usage examples, but it got in the way of a version update (git cl upload keeps timing out). Also, update update.sh to delete source/test after downloading a new version from the upstream. BUG=637001 TEST=None Review URL: https://codereview.chromium.org/2435373002 .
-
- Jul 27, 2016
-
-
Jungshik Shin authored
Delete three pre-built assembly source files because they're now generated at build-time. Update data build scripts and README.chromium accordingly. Update copy_data.sh and copy_data_android.sh so that the assembly source files are not copied. Besides, convert the little endian data bundle to the big endian data bundle for non-Android platforms. BUG=v8:4828 TEST=Rebuild icu data following the procedure in README.chromium TEST='gn args <builddir>' with icu_use_data_file set to true or false TEST=build base_unittests and run with --gtest_filter=ICU* TEST=build base_unittests and run with --gtest_filter=Message*ormat* TEST=build 'd8' (v8) and try `(new Date()).toLocaleString("de")` Review URL: https://codereview.chromium.org/2182883004 .
-
- Jul 22, 2016
-
-
Jungshik Shin authored
Follow-up CL to https://codereview.chromium.org/2162393003 1. make_data_assembly now accepts '--mac' to generate assembly source for Mac 2. Fix icu.gyp to support all platforms BUG=v8:4828 TEST='d8' is built correct with icu_use_data_file set to either 0 or 1 on Mac/Linux TEST=run `GYP_DEFINES="target_arch=mips" ./gypfiles/gyp_v8` and make sure that ninja files use 'b' data/assembly file for Big Endian on Mac/Linux R=machenbach@chromium.org Review URL: https://codereview.chromium.org/2165403003 .
-
- Jul 21, 2016
-
-
Miran Karic authored
Add a script that generates an assembly file from a .dat file. This is needed for generating big endian assembly file after using icupkg to convert little endian icudtl.dat to big endian icudtb.dat. Also the icu.gyp file is modified so big endian architectures use appropriate files. Patch by miran.karic@ ( https://codereview.chromium.org/1967523002/) with a couple of fixes: 1. Two errors mentioned against PS#9 in the above CL. 2. Support copying icu data file for Big Endian targets. Besides, icudtb.dat was added to common. icudtb.dat was created by running 'icupkg -tb icudt56l.dat icudt56b.dat' and renaming icudt56b.dat to icudtb.dat. BUG=v8:4828 TEST='d8' is built correct with icu_use_data_file set to either 0 or 1. TEST=run `GYP_DEFINES="target_arch=mips" ./gypfiles/gyp_v8` and make sure that ninja files use 'b' data/assembly file for Big Endian. Review URL: https://codereview.chromium.org/2162393003 . Patch from Miran Karic <miran.karic@imgtec.com>.
-
- May 20, 2016
-
-
Jungshik Shin authored
What's new in 2016d is found at http://mm.icann.org/pipermail/tz-announce/2016-April/000038.html Rebuilt ICU data/assembly files are checked in (not shown in the codereview due to their sizes). While I'm at it, add scripts/LICENSE file that is identical to LICENSE at the top of the Chromium tree. Because LICENSE in third_party/icu is for ICU and is not applicable to files in scripts/. BUG=473288 TBR=mark TEST=In JavaScript console, run the following. apr30_2016_1200 = new Date("04/30/2016 12:00Z") may01_2016_1200 = new Date("05/01/2016 12:00Z") apr30_2016_1200.toLocaleString("en", {timeZone: "America/Caracas"}) may01_2016_1200.toLocaleString("en", {timeZone: "America/Caracas"}) On April 30, 2016, Caracas is 4:30 behind UTC. On May 1, it's 4:00 behind. Review URL: https://codereview.chromium.org/1985243002 .
-
- Mar 25, 2016
-
-
Jungshik Shin authored
1. Update the IANA tz data to 2016c What's new in 2016b and 2016c are found at http://mm.icann.org/pipermail/tz-announce/2016-March/000036.html (2016b) http://mm.icann.org/pipermail/tz-announce/2016-March/000037.html (2016c) 2. Locale data fixes - en-AU date format fix from the upstream - ar and fa: Prepend 'percent sign' with RTL mark (U+200F). From Android. - tr: Use ₺ (U+20BA; Turkish Lira Sign) instead of 'TL'. This is to revert a locale patch picked up from Google's internal build of ICU. (Android also uses U+20BA). In addition, icudtl.dat (the prebuilt ICU data file for platforms other than Android is moved out of source/data/in to common/. This way, the data build steps for non-Android and Android can be unified and a bit more streamlined. icu.gyp and BUILD.gn are updated accordingly as well as README.chromium. BUG=598000 TEST=See bug comment 0 and comment 1 R=mark@chromium.org Review URL: https://codereview.chromium.org/1823293002 .
-
- Feb 04, 2016
-
-
Jungshik Shin authored
1. ast,an,wa locale data (minimal) - make up the minimal locale data for 3 languages. - update source/data/{locale,lang}/reslocal.mk to have 3 languages - update scripts/data_file_to_preserve.txt to have files for 3 locales listed. - In chromium, these languages have to be added to the A-L list to show up in the A-L pull-down in settings. 2. IANA timezone db updte to 2016a See http://mm.icann.org/pipermail/tz-announce/2016-January/000035.html for the change. 3. Pre-built-data files are updated. Data (icudtl.dat) size changes between 54.1 and 56.1 non-Android platforms: 10,124,096 bytes (net change: -83,840) Android: 6,560,080 bytes (net change: +291,840 / 4.66%) BUG=575007,474333 R=jsbell@chromium.org Review URL: https://codereview.chromium.org/1665113004 .
-
- Jan 29, 2016
-
-
Jungshik Shin authored
* Update the pre-built ICU data files for all platforms source/data/in/icudtl.dat for non-Android platforms {linux,mac}/icudt*.S for linux/mac android/icudtl.dat and android/icudt*.S for Android windows/icudt.dll for Windows * Update Android data trimming script 1. Make sure that 'default' calendar is kept in locales where it's relevant : root, th, fa, ar_SA, etc. 2. Add a minimal region data to work around a bug in ICU with pool.res handling * Update gn and gyp files * And add a TODO comment to update.sh to automate the build file update. * Add it_CH to the locale list. * Add sr_Latn to unit/reslocal.mk (required by sh) and line_normal_fi to brkitr/brklocal.mk (referred to in brkitr/fi.txt) in place of line_fi. * Update and add scripts for data building * Completely rewrite README.chromium * Check-in the prebuilt ICU data files/assembly sources for Linux,Mac,Windows,Chrome OS and Android. BUG=575007 TEST=Blink layout tests, webkit unittests TEST=All bots can build successfully TEST=net_unittests --gtest_filter="*ilenameUtil*" TEST=net_unittests --gtest_filter="*IDN*" (pending bug 336973) TEST=base_unittests --gtest_filter="*Conv*" TEST=browser_tests --gtest_filter="*ncoding*" TEST=base_unittests --gtest_filter="*essage*" TEST=ui_base_unittests --gtest_filter="*ormat*" TEST=ui_base_unittests --gtest_filter="L10n*" R=mark@chromium.org Review URL: https://codereview.chromium.org/1639543006 .
-
Jungshik Shin authored
1. Apply post-56 patches from the trunk for measure/date format http://bugs.icu-project.org/trac/ticket/11986 http://bugs.icu-project.org/trac/ticket/12031 http://bugs.icu-project.org/trac/ticket/12030 http://bugs.icu-project.org/trac/ticket/12041 2. Generate a combined patch (measure_format.patch) for the above. 3. Split locale_google.patch into 'locale_google.patch' and 'relative_date.patch'. The latter is taken from Android. 4. Update README.chromium Besides, apply two local patches : {tzdetect,xlit..}.patch and adjust gb18030.ucm and the corresponding patch Also, remove obsolte patches and update README.chromium BUG=575007 R=mark@chromium.org Review URL: https://codereview.chromium.org/1621943002 .
-
Jungshik Shin authored
Make the tree ready for the application Google's and Chrome's data and post-56 code patches. 1. Fix trim_data.sh to run from anywhere. 2. Update patch_locale.sh for Android and add en_IN to the locale list 3. Apply data.build.patch 4. Exclude non-UI locale data for unit locale category 5. Add some regional variant locales to locale, unit, zone and coll. 6. Update locale lists for locale, unit, zone, and coll BUG=575007 TEST=None R=mark@chromium.org Review URL: https://codereview.chromium.org/1624643003 .
-
- Jan 07, 2016
-
-
Jungshik Shin authored
Add scripts/udpate.sh that automates the initial check-out of a new version of ICU. BUG=575007 TEST=None R=mark@chromium.org Review URL: https://codereview.chromium.org/1566043002 .
-
- Dec 14, 2015
-
-
Jungshik Shin authored
* Big5 : https://www.w3.org/Bugs/Public/show_bug.cgi?id=27878 Special case the following four more code points in addition to U+5341, U+5345 that are already special cased. U+2550, U+255E, U+2561, U+256A For those 6 code points, the last pointer instead of the first pointer in index-big5.txt is used for round-trip. The first pointer is for decoding-only. * KOI8-U ( https://www.w3.org/Bugs/Public/show_bug.cgi?id=17053 ) - 0xAE and 0xBE are mapped to U+04[50]E instead of U+255[DC]. - Add an alias KOI8-RU BUG=544228 TEST=1. http://goo.gl/reGQPU : encoding(form) test 2. Layout test: fast/encoding/* R=jsbell@chromium.org Review URL: https://codereview.chromium.org/1514253003 .
-
- Jun 04, 2015
-
-
Jungshik Shin authored
1. Add a one-way (encoding-only/fromUnicode) mapping for U+2212 to Shift_JIS, EUC-JP and ISO-2022-JP. The last just uses Shift_JIS. See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28661 2. Make GBK aliases list compliant to the encoding spec. 3. Add "xA3xA0 => U+3000" to the GBK (windows-936) and gb18030. This makes it possible to remove the corresponding override in Blink 4. Modify the following to GBK (windows-936). See [1] - Add U+01F9 <=> \xA8\xBF - Drop U+E7C8 <=> \xA8\xBF 5. The following change is put on hold (NOT included in the CL) until the resolution of [1] - Add U+1E3F <=> \xA8\xBC - Drop U+E7C7 <=> \xA8\xBC The corresponding Blink CL is https://codereview.chromium.org/1167523003/ [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3 BUG=425417,493824 TEST=Once ICU is rolled to this CL, Blink layout test fast/encoding/*. R=jsbell@chromium.org Review URL: https://codereview.chromium.org/1162723008
-
- Apr 02, 2015
-
-
Jungshik Shin (jungshik at google) authored
1. Update the IANA tz db to 2015b. - http://mm.icann.org/pipermail/tz-announce/2015-March/000029.html - Mongolia decided to observe DST again in 2015 starting on the last Sunday in March. - Palestine's DST start date is corrected to be March 28 instead of 27th. 2. Add a script to download the tz database files (update_tz.sh) 3. Check in scripts/make_n_copy_data.sh that I've been using to build ICU data/assembly files and update README.chromium. 4. Update android/patch_locale.sh to apply android/brkitr.patch as well. BUG=473288 TEST=1. In JavaScript console, run the following. mar27_2015_1200 = new Date("03/27/2015 12:00Z") mar28_2015_1200 = new Date("03/28/2015 12:00Z") mar27_2015_1200.toLocaleString("en", {timeZone: "Asia/Gaza"} mar28_2015_1200.toLocaleString("en", {timeZone: "Asia/Gaza"} apr15_2014_1200 = new Date("04/15/2014 12:00Z") apr15_2015_1200 = new Date("04/15/2015 12:00Z") apr15_2014_1200.toLocaleString("en", {timeZone: "Asia/Ulan_Bator"} apr15_2015_1200.toLocaleString("en", {timeZone: "Asia/Ulan_Bator"} In Asia/Gaza, Mar 27 12:00Z should be 2PM and mar28 12:00Z should be 3PM. In Asia/Ulan_Bator, April 15 12:00Z should be 8PM in 2014 and should be 9PM in 2015. Ulan_Bator does not work due to http://crbug.com/364374. R=mark@chromium.org Review URL: https://codereview.chromium.org/1051193002
-
- Mar 19, 2015
-
-
Jungshik Shin (jungshik at google) authored
1. Update ucmlocal.mk and convertrs.txt to refer to euc-kr-html.ucm instead of windows-949.ucm 2. Tighten up the valid code range for the following converters: EUC-KR, Shift_JIS, Big5 This is to add back an ASCII range byte to the stream per the encoding spec when they're either illegal as a 'trail byte' or there's no assigned code point for a "lead + trail" sequence. For instance, with this change, '0xF3 0x41' in EUC-KR is converted to 'U+FFFD U+0041' instead of 'U+FFFD'. This change requires adding 2 ~ 8 new states to the conversion table of each converter mentioned above leading to 6.5kB net increase in the final data size. 3. Tighten the trail byte range for 2-byte sequences starting with 0x8E from [A1,E2] to [A1,DF] in EUC-JP and update the corresponding generating script. 4. Change the substitution characters for EUC-JP and Shift_JIS to match other converters. i.e. make them produce U+FFFD when encountering an invalid input. Before this chaange, they emitted U+001A. 5. Enable 'U_CHARSET_IS_UTF8' configuration flag. Chromium/Blink does not rely on ICU for the code conversion between the 'system native encoding' (if it's one of legacy encodings) and Unicode. With this configuration, we can cut down the code size a bit. 6. Update the icudtl.dat (all platforms) and assembly files (mac,linux) and the icudata dll (windows) See https://codereview.chromium.org/1026453002 for a new blink test added ( fast/encoding/char-decoding-invalid-trail.html ) BUG=450312,430823 TEST=Blink: fast/encoding/char-decoding-{truncated,invalid-trail}.html TEST=base_unittests --gtest_filter=*Conv*, browser_tests --gtest_filter=*ncoding* R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/984233002
-
- Mar 02, 2015
-
-
Jungshik Shin (jungshik at google) authored
Fix the following errors found by jochen@ in https://codereview.chromium.org/960263002/ 1. brkitr: en_US_POSIX is not supported. Remove it from brklocal.mk : We don't use en_US_POSIX and the remaining dependency on it in some unittests was already removed. (we may need it back later, though, for breaking an FQDN into components.) 2. coll: Explicitly add id.txt required as the alias/parent of "in" and "id_ID". This should not affect the collation in Indonesian locale because falling back to the root locale should be fine. 3. lang: Add 'ro_MD.txt' required as the alias of 'mo.txt'. Also update make_mac_asseymbly.sh to get it to read off the ICU major version automatically. Besides, update README.chromium to refer to ICU 54 as done by the aforementioned CL. Rebuild the data files and assembly sources (the latter still required by stand-alone v8 builds) for all the platforms. icudtl.dll for Windows will be built and checked in in another CL. BUG=428145 TEST=Usual ICU update tests before rolling DEPS. See https://codereview.chromium.org/878723002 TBR=jochen@chromium.org Review URL: https://codereview.chromium.org/962643003
-
- Feb 19, 2015
-
-
Jungshik Shin (jungshik at google) authored
data/lang/en_GB.txt has an empty "Languages" block leading getDisplay{Name,Language} to fail in en-GB. Update trim_data.sh to remove an empty "Languages" block and run the script to fix data/lang/en_GB.txt and other locales if any. (only en_GB.txt is affected). Rebuild the icu data with the above changes for both Android and non-Android platforms. BUG=428145 TEST=linux_chromeos bots: browser_tests --gtest_filter=*GetUILang* TBR=mark@chromium.org Review URL: https://codereview.chromium.org/930203004
-
- Jan 31, 2015
-
-
Jungshik Shin (jungshik at google) authored
1. Fix a Windows build failure due to: a. 'signed vs unsigned' comparison b. 'possible data loss' in conversion : Apply pkasting's patch at http://bugs.icu-project.org/trac/ticket/11104 2. Drop a few currencies to cut down the data size by 50kB for non-Android platforms. 2. Build the ICU data for Android and check in. - Drop all display names for languages/scripts/regions except for zh-Han{s,t} as before. ( ~ 1.2MB reduction) - Drop cjdict by applying android/brkitr.patch. (~ 2MB reduction) - Include the display names for only 60+ currencies ( ~ 400kB reduction from the non-Android data. - Minimize the locale data for 9 locales Chrome on Android is not localized to. Drop currency names for those 9 locales. ( ~ 150kB reduction) Size change: 1. Non-android: 10,255,584 to 10,200,880 2. Android: - Final : 6,270,880 With 60+ currency names added (for bug 370849) and 9 unnecessary locale data dropped. It's 232,240 bytes larger than ICU 52.1 (6,038,640). - Without any currency names but with 9 unnecessary locale data: 6,026,816 - With 60+ currency names and 9 unnecessary locale data: 6,426,368 BUG=370849,428145 TEST=Build on Windows. Blink layout tests, webkit unittests. R=mark@chromium.org, wangxianzhu@chromium.org Review URL: https://codereview.chromium.org/877193003
-
- Jan 23, 2015
-
-
Jungshik Shin (jungshik at google) authored
1. Add {coll,curr,lang,locales,rbnf,region,sprep,translit,unit,zone}/*local.mk to exclude locale data for languages/locales that Chromium does not need. 2. Run scripts/trim_data.sh to cut down the data size further by excluding unused entries in each locale files. - Keep the display names for languages/scripts/locales in Chrome's Accept-Language list and remove the display names outside the set. - Minimize the locale data in data/{locales,lang} for non-UI languages in the A-L list. For them, we just need the "native" display name and exemplar character set. - Exclude historic, obscure and otherwise unnecessary currency display names. - Drop unnecessary Chinese collation rules; Big5/GB2312/UniHan. - Keep only the minimal unit data for duration and compound units. 3. Add css3transform.txt to data/translit for Greek upper/lowercasing support. 4. Add the minimal locale data for ckb and ku. 5. The tz db was updated previously to 2014j (the latest) so that no change is made except for README.chromium update. 6. Add the minimal locale data for ckb and ku. 7. Check in the pre-built data (icudtl.dat) shared by all non-Android platforms and assembly files for Linux/Mac The final data size is 10,255,584 bytes, which is about 200kB smaller than that for ICU 52.1. The pristine upstream ICU has the data of 25,343,024 bytes. The remaining steps are to build a smaller data file for Android and to build icudtl.dll for Windows (non-default build option). BUG=428145 TEST=net_unittests --gtest_filter="*ilenameUtil*" TEST=net_unittests --gtest_filter="*IDN*" TEST=base_unittests --gtest_filter="*Conv*" TEST=browser_tests --gtest_filter="*ncoding*" TEST=Blink: layout tests R=mark@chromium.org Review URL: https://codereview.chromium.org/872903002
-
- Jan 21, 2015
-
-
Jungshik Shin (jungshik at google) authored
A. Converter update per HTML encoding spec along with changes in the encoding name alias table. B. Remove all the codes for converters Blink and Chromium do not need (SCSU, Lotus, ISO-2022-xx other than JP, BOCU, UTF-7, etc). This is reapplying the following CLs (that we used for ICU 52.1) to ICU 54.1 : https://codereview.chromium.org/598383002 https://codereview.chromium.org/654153002 We have two upstream bugs filed for A and B above: http://www.icu-project.org/trac/ticket/11296 http://www.icu-project.org/trac/ticket/10303 In addiition to A and B, we unified Big5 and Big5-HKSCS per the encoding spec (bug 277868). That also includes properly supporting the four 2-character sequences ( see http://crbug.com/277868#c3 ). big5_gen.sh deviates from the current spec to work around a bug in the spec. (see https://www.w3.org/Bugs/Public/show_bug.cgi?id=27878) Moreover, ucmlocal.mk is added to list only encodings we want to support. Also, tighten the state table for windows-946-2000.ucm that we use for EUC-KR for now. And, drop 'base' map for windows-{936,949}-2000.ucm. Finally, add euc-kr-html.ucm along with scripts/euckr_gen.sh, but it is not yet used pending the resolution of bug 450312. Data size checkpoint: 20,566,864 bytes (the original ICU 54=25,343,024) BUG=277868, 428145, 450312 TEST=net_unittests --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" TEST=browser_tests --gtest_filter="*ncoding*" TEST=Blink: fast/encoding/* R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/839713003
-
- Oct 13, 2014
-
-
jshin@chromium.org authored
1. Replace the current encoding alias list (heavily patched) with our own HTML5-specific alias list. It's mostly generated from encoding.json, which is in turn derived from the WHATWG Encoding living standard. The most notable difference is that UTF-32 entries are kept until bug 417850 is resolved. Two other differences are: a. Two aliases for iso-8859-8-i (logical and csiso88598i) are not listed. They're dealt with in Blink. b. Chinese (gb*, big5*) aliases are not yet aligned to the encoding spec pending our decision on the unification of Big5 / Big5-HKSCS and GBK / GB18030. 2. Replace all the single-byte mapping tables with what's automatically generated with scripts/single-byte-gen.sh that uses index-* files downloaded from the WHATWG spec site. This will fix the decoding (ToUnicode) of windows-874 and windows-1253 while removing a lot of fallback/spurrious mapping entries in encoding direction ('FromUnicode') in a number of encodings. 3. Regenerate the ICU binary data files for Linux/Mac/Android/Windows/CrOS. 4. Remove now obsolete noop-*ucm files used to make ISO-2022-CN* decoder to turn an empty string. They're not necessary any more because ISO-2022-CN* were made 'replacement' encodings in Blink and our version of ICU does not have any code for ISO-2022-CN* any more. This cuts down the data size by 15kB. On Android, there's virtually no change in the data size because the previous data file on Android accidentally had smaller locale data for nb and ms. BUG=412053 TEST=browser_tests --gtest_filter="*ncoding*" TEST=net_unittest --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" TEST=Blink: fast/encoding/* TEST=http://www.w3.org/International/tests/repository/encoding/indexes/results-indexes TEST=http://www.w3.org/International/tests/repository/encoding/indexes/results-aliases TEST=http://www.w3.org/International/tests/repository/run?manifest=encoding/indexes&test=windows-1253_test TEST=http://www.w3.org/International/tests/repository/run?manifest=encoding/indexes&test=windows-874_test R=jsbell@chromium.org Review URL: https://codereview.chromium.org/598383002 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@292447 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
- Sep 25, 2014
-
-
jshin@chromium.org authored
UCONFIG_NO_NON_HTML5_CONVERTER was added earlier to our copy of ICU, but it was never set to 1. It's my oversight. 1. Turns UCON..CONVERTER on in icu.gyp to drop all the encodings not required by the Encoding spec. Dropped encodings include UTF-7, BOCU, SCSU, CESU, ISCII, ISO-2022-{KR, CN*}, HZ-GB, ISO-2022-JP's other than the original. 2. A lot more sections of the ICU converter code are excluded when it's set to 1 including the code for LMB (Lotus Multibyte) encodings and X11 compound text encoding (icu common). 3. The character encoding detections for encodings excluded are also disabled. (icu i18n) 4. ISO-2022-{KR, CN*} and HZ-GB can be dropped now because Blink treats them as replacement encoding. The corresponding alias entries from convertrs.txt are also removed. 5. ibm-874 was removed. We used to need it before Blink started, but not any more. We only need windows-874. 6. A mistaken in convertrs.txt was corrected : Big5-HKSCS was pointing to an old mapping table. 7. Per ICU upstream's suggestion, use '-html' suffix instead of '-html5' for the encoding tables derived from the WHATWG's encoding spec (ibm866, shift_jis and euc-jp). The static 64-bit release build of Chrome on Linux went down from 141,596,616 to 141,491,968 bytes (~ 100 kB reduction). Besides, the icu data size got smaller by ~ 19 kB ( 10,490,576 to 10,471,008 bytes). See http://bugs.icu-project.org/trac/ticket/11296 for an upstream bug I've filed on the issue. BUG=76328 TEST=browser_tests --gtest_filter="*ncoding*" TEST=net_unittest --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" TEST=Blink: fast/encoding/* TEST=With shared library build, the following has no match. nm libicuuc.so | egrep -i '(bocu|scsu|utf7|2022kr|2022cn|iscii)' nm libicui18n.so | egrep -i '(2022kr|2022cn|ibm42)' TEST=With static library build, the following has no match. nm chrome | egrep -i '(bocu|scsu|utf7|2022kr|2022cn|iscii|ibm42)' R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/587833004 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@292131 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
- Sep 02, 2014
-
-
jshin@chromium.org authored
1. Timezone data files (4 of them) in source/data/misc to 2014f (the latest) to prepare for an upcoming Russian timezone change. 2. Add Shift_JIS converter compliant to the WHATWG encoding spec. 3. Update converters.txt and ucmlocal.mk accordingly 4. Update the pre-built data files for Linux/Mac/Android/Windows. (icudt.dll is not updated in this CL. It's not used in the default configuration. It'll be updated in a separate CL). 5. Fix a typo in ibm866_gen.sh. The acual table used does not need a change. BUG=277062,404445 TEST=After rolling icu to this revision, the following tests should pass. TEST=Blink: fast/encoding/* all pass except for fast/encoding/api/ascii-supersets.html that should fail by *passing* the test for Shift_JIS, which is expected to fail. Blink layout tests needs to be updated. TEST=browser_tests --gtest_filter="*ncoding*" TEST=In JS console, run the following to check if Europe/Moscow is 3 hrs ahead of UTC after Oct 26 and 4 hrs ahead before that and if Asia/Kamchatka remains 12 hrs ahead of UTC. nov1_2014_1500=new Date("11/01/2014 15:00Z") nov1_2014_1500.toLocaleString("en", {timeZone: "Europe/Moscow"}) nov1_2014_1500.toLocaleString("en", {timeZone: "UTC"}) nov1_2014_1500.toLocaleString("en", {timeZone: "Asia/Kamchatka"}) oct24_2014_1500=new Date("10/24/2014 15:00Z") oct24_2014_1500.toLocaleString("en", {timeZone: "Europe/Moscow"}) oct24_2014_1500.toLocaleString("en", {timeZone: "UTC"}) oct24_2014_1500.toLocaleString("en", {timeZone: "Asia/Kamchatka"}) TEST=net_unittest --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" R=jsbell@chromium.org Review URL: https://codereview.chromium.org/497543003 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@291774 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
- May 24, 2014
-
-
jshin@chromium.org authored
When the upstream took our patches for CJ segmentation, over 10k words were dropped. Some of them are pretty common and not having them led to a Blink layout test failure. Add several of them back to cjdict.txt. In addition, remove a patch that breaks line breaking around single/double quotation marks. Rebuild the data for Linux/Mac/Windows/Android. BUG=132145 TEST=Once rolled, layouttest:fast/text/international/cjk-segmentation.html and fast/hyphen-min-preferred-width.html pass. TBR=mark Review URL: https://codereview.chromium.org/292123005 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@272650 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
- May 05, 2014
-
-
jshin@chromium.org authored
I was too aggressive in trimming the data and dropped the display names for languages that Chromium needs (for non-UI languages that are in the A-L list). It's not my intention (the comment in trim_data.sh said one thing, but the code did another). Besides, add Norweigian (nb) and Malay (ms) locale data that were not included by mistake. Also update trim_data.sh script NOT to drop 'ALIAS' lines which are used to indicate that a given locale is an alias to another locale. That also required adding ro_MD.txt (null locale which mo.txt is aliased to). The above three adds about 110kB to the icu data (from 10.3MB to 10.4MB). Also update the pre-built icu data files for Linux, Mac and Windows. The Android data will be updated in a follow-up patch. BUG=132145 TEST=When ICU is rolled, unit_tests:ExtensionL10* pass. TBR=mark Review URL: https://codereview.chromium.org/264973016 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@268285 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
- Apr 29, 2014
-
-
jshin@chromium.org authored
- Add missing half-width kana entries (omitted by mistake) - Drop 'extra' decoding only mapping. See https://www.w3.org/Bugs/Public/show_bug.cgi?id=25266 - Regenerate icu data files (*dat and assembly source files) for Linux, Mac, Windows and Android. (they'll not be shown at codereview.chromium.org because they're too large). BUG=132145,78847 TEST=When ICU is rolled in, base_unittests --gtest_filter=*ICU* and layout tests R=jsbell@chromium.org Review URL: https://codereview.chromium.org/251203003 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@266919 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
- Apr 28, 2014
-
-
jshin@chromium.org authored
1. Generate and add windows/icudt.dll with the procedure outlined in README.chromium. It uses a out-of-tree copy of the upstream ICU along with our custom-built icudtl.dat and a locally modified version of makedata.mak. We used to have a separate build/ directory for VS solution/project files to build icudtl.dll. Maintaining them is rather cumbersom now that we want to update our ICU (major version changes) more frequently. Note that icudt.dll is not used by default (icu_use_data_file_flag=1). The GN build still uses it by default and we should not break that build. 2. Add scripts/make_mac_assembly.sh to simplify the generation of the icu data assembly source file for Mac. 3. Update README.chromium accordingly. This CL was uploaded and reviewed at https://codereview.chromium.org/255943004/ Due to a malfunction at codereview.chromium.org, I'm landing this CL manually in two parts. This check-in is the 2nd part of the CL dealing with #2 and #3 above. BUG=132145 TEST=None until icu is rolled to this version. git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@266602 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
- Apr 22, 2014
-
-
jshin@chromium.org authored
Add 'filter_locale_data' function to trim_data.sh Chromium/Blink do not use most of unit* sections in locale data. Keep only duration and compound sub-sections. Update the icudtl.dat and two assembly source files for Mac/Linux. It saves ~200kB (uncompressed). 7z-compressed size reduction is 34kB. With all these changes (up to this CL) applied, the net increase of the ICU data from icu 46 to 52 is 49kB with 7z-compressed. (3,070,246 vs 3,021,457) and ~ 390kB uncompressed (10,370,656 vs 9,980,368 ). BUG=132145 TEST=None. TBR=mark Review URL: https://codereview.chromium.org/247663002 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@265354 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
- Apr 18, 2014
-
-
jshin@chromium.org authored
1. {big5,gb2312}han collation data is not used by anybody because they're useless as a sorting order. Add a function to trim_data.sh to remove them from zh.txt 2. Remove remove_unihan.sh and add back unihan rules to coll/{zh,ja,ko}.txt. In ICU 52, tools/genrb does NOT include unihan collation by default so that we don't have to bother to remove it from the rule files. 3. Remove obsolete patch files (locale[23].patch) 4. Add LICENSE file (converted from license.html) 5. Update README.chromium accordingly. 6. Check in the updated data file/assembly files. The net saving in icudtl.dat is ~ 220kB. BUG=132145 TEST=icudtl.dat is 10576480 TBR=mark Review URL: https://codereview.chromium.org/243763002 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@264857 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
jshin@chromium.org authored
Add a shell script to trim the ICU data further : trim_data.sh along with locale list files. The script does the following: 1. Remove the display names of languages NOT listed in Chrome's Accept-Language list. (800kB) 2. Minimize the locale data for locales listed in the A-L list that are not a UI locale in Chrome. For those locales, exemplar characters, the display name in the native language and layout direction are included. (640kB) 3. Filter the region data to drop numeric region display names other than 419 (Latin-America). (50kB) 4. Filter the currency data (display name and plurals) for historic currencies. (200kB) This CL also checks in icudtl.dat (source/data/in) and icudt_dat.S (mac and linux). Note that I dropped '52' (the version number) in the assembly source file name and icu.gyp was adjusted accordingly. With all these changes, icudtl.dat is ~ 800kB larger than that in ICU 4.6. The 7z compression (as used by the installer) makes the size difference go down to ~ 130kB. BUG=132145 TEST=The icudtl.dat (uncompressed) is about 10.7MB instead of 12.4MB without this CL. R=mark@chromium.org Review URL: https://codereview.chromium.org/239543018 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@264811 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-
- Apr 07, 2014
-
-
jshin@chromium.org authored
1. Remove all the obsolete patches. There are lots of them because most of local patches to ICU 4.6.1 have either been accepted or become obsolete. The largest local patch removed is our patches for CJ word breaker because they were upstreamed. Android didn't apply the CJK word breaker patch to ICU 4.6 to reduce the data size. In a follow-up CL, we'll have an Android-specific change for this issue. Besides, we don't include patches for files we locally add because the patches for new files are redundant. Instead, they're mentioned in README.chromium. 2. We don't need platform-specific headers any more (pmac, plinux, pwin, etc). They're combined into a single file and all platforms we care about are well-supported except for one issue on Android/QNX. putil.patch takes care of it. 3. Breakiterator patches for a few remaining issues. We also use a much smaller Khmer dictionary (upstream fix pending). 4. Converter - Introduced two WHATWG-encoding-standard-compliant mapping tables are added (derived directly from the spec with a script) for EUC-JP and CP866 - Disabled various non-HTML5-encodings such as SCSU,BOCU, UTF-7, CESU-8 saving ~30kB in the code size. Even though we link statically, they're still pulled in as a part of uconv. - Disabled ISO-2022-JP-[1-4] in ucnv2022.c - Removed a number of encoding alias entries in the alias table leading to ~40kB data size reduction. 5. Locale data : Haven't yet updated. We need to trim them substantially. 6. Unihan collation removal is now done with a script (scripts/remove_unihan.sh) 7. Updated timezone data to the latest (2014b) as of today. 8. Customized transliterator for Greek uppercasing 9. Updated data build related patches. The windows data build patch has yet to be updated. 10. The updated ICU data file/assembly source files are not included in this CL. They'll be updated in a separate CL. With all the size reduction changes applied, the data size went down from > 23MB to 12.4MB. However, it's still 2.5MB larger than ICU 4.6.1 data. The locale data trimming will bring it down further. 11. Update README.chromium accordingly. The only exceptions are item #5 and the android entry in item #3 (breakiterator. see #1 above) BUG=259715,76328 TEST=Following the procedure outlined in README.chromium, one can build the icu data file. R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/224943002 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@262192 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
-