Skip to content
Snippets Groups Projects
  1. Apr 11, 2017
    • Jungshik Shin's avatar
      Update trim_data to deal with locale fallback failure for units · d5c238dc
      Jungshik Shin authored
      Delete empty units,units{Narrow,Short} blocks after trimming units data.
      Empty units* blocks in en_GB and a few other locales after trimming
      causes ICU to fail to fall back to get the duration data for those
      locales.
      
      In addition, fix source/data/translit/root_subset.txt. Rule*Ids block has
      to be present even though it's empty. When dropping Hans-Hant transform
      rules, root_subset.txt was changed to be completely empty, which broke
      "components_unittests --g_test_filter=AutofillProfileComparato*" .
      
      With these changes, regenerate ICU data files. The size is slightly smaller.
      
      android/icudtl.dat  6573872 => 6573792
      common/icudt*dat    10130560 => 10130480
      
      BUG=707515,677043,684609
      TEST=components_unittests --gtest_filter=AutofillProfileComparato*
      TEST=ui_base_unittests --gtest_filter=L10nUtilTest.TimeDurationForm*
      R=derat@chromium.org
      
      Review-Url: https://codereview.chromium.org/2812943003 .
      d5c238dc
  2. Oct 28, 2016
    • Jungshik Shin's avatar
      ICU update to 58 part 2 · e0d9b90c
      Jungshik Shin authored
      Follw-up to https://chromium.googlesource.com/chromium/deps/icu/+/5feb9ad5
      (due to a rietveld issue, part 1 was manually pushed).
      
      Update ICU to 58.1 release from ICU 56.1 part2.
      
      Listed below a tiny subset of what's new in 58.1:
      
        1. Unicode 9.0 from Unicode 8.0
          - Updated character properties including Emoji data up to 4.0beta.
          - Updated grapheme/word/line breaking rules for Emoji sequences and others.
      
        2. CLDR 30.0.2 from CLDR 28
          - Numerous locale data updates/improvements
      
        3. Spoofing API changes
        4. Greek uppercasing support as a part of regular case-mapping API.
        5. Line breaking rule file format optimization. This change enables me
           to add CJ loose line breaking rules back (previously, it's dropped
           to save space) so that Blink can use it for CJ.
      
      See http://site.icu-project.org/download/58 for more details on ICU 58.1
      and http://site.icu-project.org/download/57 for more details on ICU 57.1
      
      For CLDR 30, see http://cldr.unicode.org/index/downloads/cldr-30 .
      
      The size impact:
         Non-Android: 10,127,200 => 10,128,624 (delta = 1,424 / 0.014%)
         Android: 6,563,152 => 6,571,936 (delta = 8,784 / 0.13%)
      
      Below are the list of changes made on top of the upstream ICU 58.1
      in reverse order. Most of these changes were made in 58staging branch
      to run trybots and cherry-picked back for this CL. See
         https://chromium.googlesource.com/chromium/deps/icu/+/log/chromium/58staging
         https://codereview.chromium.org/2447513002/ : cr+blink update cl with
             58staging branch head.
      
      * Fix a build on Win without std::string (v8)
      * Add ms932 alias to Shift_JIS
      
      * Apply Google-specific locale data patches
      
      * Fix a bug in scriptset
      
      * Update windows-1255 mapping
      
      * Disable C4333 warning by MSVC (harmless)
      
      * Apply and update utf32.patch and README.chromium
      
      * Update and apply vscomp.patch
        stringpiece patch removed. VS2015 seems to be fine with a redefinition.
      
      * Update pre-built ICU data files
         Update *local.mk with a new copyright line
      
      * Apply more patches
        The following patches were applied and updated: data_symb, vscomp, wpo
      
        The unnecessary part was dropped from vscomp
      
      * Update BUILD.gn and icu.gyp* files
      
      * Update android/brkitr.patch
      
      * Update and apply more patches
      
      * Update and apply cjdict.patch
         Apply data.build.patch
      
      * Delete obsolete patches: cmemory,regex
      
      * Update README.chromium and apply brkitr patches
      
        - Update README.chromium
        - Remove obsolete patches
        - Update linebrk.patch and apply it: add back line_loose_cj
      
      * Update wordbrk.patch and apply it
      
      * Update and apply khmer-dictbe.patch
      
      * Update data trimming
      
        - android/patch_locale.sh
        - scripts/trim_data.sh
           ExemplarCh* removed
           charac*Label removed
           relative/relativeTime removed for daysOfWeek and quarter
      
      * Update the following patches
      
        android/brkitr.patch
        patches/linebrk.patch
        patches/data.build.patch
      
      * Update cjdict.patch and linebrk.patch
      
      BUG=637001
      TEST=Layout tests, all unittests, browser tests, ui tests.
      R=jsbell@chromium.org, mark@chromium.org
      
      Review URL: https://codereview.chromium.org/2442923002 .
      e0d9b90c
  3. Jan 29, 2016
    • Jungshik Shin's avatar
      ICU 56 step 2 · 27b09232
      Jungshik Shin authored
      Make the tree ready for the application Google's and Chrome's data
      and post-56 code patches.
      
      1. Fix trim_data.sh to run from anywhere.
      2. Update patch_locale.sh for Android and add en_IN to the locale list
      3. Apply data.build.patch
      4. Exclude non-UI locale data for unit locale category
      5. Add some regional variant locales to locale, unit, zone and coll.
      6. Update locale lists for locale, unit, zone, and coll
      
      BUG=575007
      TEST=None
      R=mark@chromium.org
      
      Review URL: https://codereview.chromium.org/1624643003 .
      27b09232
  4. Feb 19, 2015
    • Jungshik Shin (jungshik at google)'s avatar
      Fix en_GB's language name failure · 8d46830a
      Jungshik Shin (jungshik at google) authored
      data/lang/en_GB.txt has an empty "Languages" block leading
      getDisplay{Name,Language} to fail in en-GB.
      
      Update trim_data.sh to remove an empty "Languages" block and run the
      script to fix data/lang/en_GB.txt and other locales if any. (only
      en_GB.txt is affected).
      
      Rebuild the icu data with the above changes for both Android and non-Android
      platforms.
      
      BUG=428145
      TEST=linux_chromeos bots: browser_tests --gtest_filter=*GetUILang*
      TBR=mark@chromium.org
      
      Review URL: https://codereview.chromium.org/930203004
      8d46830a
  5. Jan 23, 2015
    • Jungshik Shin (jungshik at google)'s avatar
      ICU update to 54 - step 6 · b9090ea5
      Jungshik Shin (jungshik at google) authored
      1. Add {coll,curr,lang,locales,rbnf,region,sprep,translit,unit,zone}/*local.mk
      to exclude locale data for languages/locales that Chromium does not need.
      
      2. Run scripts/trim_data.sh to cut down the data size further by excluding
      unused entries in each locale files.
         - Keep the display names for languages/scripts/locales in Chrome's
           Accept-Language list and remove the display names outside the set.
         - Minimize the locale data in data/{locales,lang} for non-UI languages
           in the A-L list. For them,
           we just need the "native" display name and exemplar character set.
         - Exclude historic, obscure and otherwise unnecessary currency display
           names.
         - Drop unnecessary Chinese collation rules; Big5/GB2312/UniHan.
         - Keep only the minimal unit data for duration and compound units.
      
      3. Add css3transform.txt to data/translit for Greek upper/lowercasing support.
      
      4. Add the minimal locale data for ckb and ku.
      
      5. The tz db was updated previously to 2014j (the latest) so that no change
         is made except for README.chromium update.
      
      6. Add the minimal locale data for ckb and ku.
      
      7. Check in the pre-built data (icudtl.dat) shared by all non-Android
         platforms and assembly files for Linux/Mac
      
      The final data size is 10,255,584 bytes, which is about 200kB smaller than
      that for ICU 52.1.  The pristine upstream ICU has the data of
      25,343,024 bytes.
      
      The remaining steps are to build a smaller data file for Android and
      to build icudtl.dll for Windows (non-default build option).
      
      BUG=428145
      TEST=net_unittests --gtest_filter="*ilenameUtil*"
      TEST=net_unittests --gtest_filter="*IDN*"
      TEST=base_unittests --gtest_filter="*Conv*"
      TEST=browser_tests --gtest_filter="*ncoding*"
      TEST=Blink: layout tests
      R=mark@chromium.org
      
      Review URL: https://codereview.chromium.org/872903002
      b9090ea5
  6. May 05, 2014
    • jshin@chromium.org's avatar
      Add back display names for non-UI languages in A-L list · 4266d6d1
      jshin@chromium.org authored
      I was too aggressive in trimming the data and dropped the display
      names for languages that Chromium needs (for non-UI languages
      that are in the A-L list). It's not my intention (the comment in
      trim_data.sh said one thing, but the code did another). 
      
      Besides, add Norweigian (nb) and Malay (ms) locale data that were not 
      included by mistake.
      
      Also update trim_data.sh script NOT to drop 'ALIAS' lines which are
      used to indicate that a given locale is an alias to another locale.
      That also required adding ro_MD.txt (null locale which mo.txt is 
      aliased to).
      
      The above three adds about 110kB to the icu data (from 10.3MB to 10.4MB).
      
      Also update the pre-built icu data files for Linux, Mac and Windows.
      The Android data will be updated in a follow-up patch.
      
      BUG=132145
      TEST=When ICU is rolled, unit_tests:ExtensionL10* pass.
      TBR=mark
      
      Review URL: https://codereview.chromium.org/264973016
      
      git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@268285 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
      4266d6d1
  7. Apr 22, 2014
    • jshin@chromium.org's avatar
      Trim unit* sections in data/locales/* · 4a39040d
      jshin@chromium.org authored
      Add 'filter_locale_data' function to trim_data.sh
      
      Chromium/Blink do not use most of unit* sections in locale data. Keep
      only duration and compound sub-sections. 
      
      Update the icudtl.dat and two assembly source files for Mac/Linux.
      
      It saves ~200kB (uncompressed). 7z-compressed size reduction is 34kB.
      
      With all these changes (up to this CL) applied, the net increase of the ICU data from icu 46 to 52 is 49kB with 7z-compressed.
      (3,070,246 vs  3,021,457) and ~ 390kB uncompressed (10,370,656 vs 9,980,368 ). 
      
      BUG=132145
      TEST=None.
      TBR=mark
      
      Review URL: https://codereview.chromium.org/247663002
      
      git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@265354 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
      4a39040d
  8. Apr 18, 2014
    • jshin@chromium.org's avatar
      Remove {big5,gb2312}han collation data · 991d1f1e
      jshin@chromium.org authored
      1. {big5,gb2312}han collation data is not used by anybody because they're
      useless as a sorting order.
      
        Add a function to trim_data.sh to remove them from zh.txt
      
      2. Remove remove_unihan.sh and add back unihan rules to coll/{zh,ja,ko}.txt.
      In ICU 52, tools/genrb does NOT include unihan collation by default so that 
      we don't have to bother to remove it from the rule files.
      
      3. Remove obsolete patch files (locale[23].patch)
      
      4. Add LICENSE file (converted from license.html)
      
      5. Update README.chromium accordingly.
      
      6. Check in the updated data file/assembly files.
      
      The net saving in icudtl.dat is ~ 220kB.
      
      
      BUG=132145
      TEST=icudtl.dat is 10576480
      TBR=mark
      
      Review URL: https://codereview.chromium.org/243763002
      
      git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@264857 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
      991d1f1e
    • jshin@chromium.org's avatar
      Trim ICU data to reduce the download size/memory usage · 4e493261
      jshin@chromium.org authored
      Add a shell script to trim the ICU data further : trim_data.sh along with
      locale list files.  The script does the following:
      
      1. Remove the display names of languages NOT listed in Chrome's Accept-Language
         list. (800kB)
      2. Minimize the locale data for locales listed in the A-L list that are
         not a UI locale in Chrome. For those locales, exemplar characters,
         the display name in the native language and layout direction are included.
         (640kB)
      3. Filter the region data to drop numeric region display names other than 419
         (Latin-America). (50kB)
      4. Filter the currency data (display name and plurals) for historic currencies.
         (200kB)
      
      This CL also checks in icudtl.dat (source/data/in) and
      icudt_dat.S (mac and linux). Note that I dropped '52' (the version number)
      in the assembly source file name and icu.gyp was adjusted accordingly.
      
      With all these changes, icudtl.dat is ~ 800kB larger than that in ICU 4.6.
      The 7z compression (as used by the installer) makes the size difference
      go down to ~ 130kB.
      
      BUG=132145
      TEST=The icudtl.dat (uncompressed) is about 10.7MB instead of 12.4MB without this CL.
      R=mark@chromium.org
      
      Review URL: https://codereview.chromium.org/239543018
      
      git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@264811 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
      4e493261
Loading