Skip to content
Snippets Groups Projects
README.chromium 11.1 KiB
Newer Older
Name: icu
URL: http://site.icu-project.org/
License: MIT
Security Critical: yes

Description:
This directory contains the source code of ICU 54.1 for C/C++.


1. It was obtained with the following:

    $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-54-1 icu

  The following directories we don't use are removed:

   - as_is
   - packaging
   - source/layout
   - source/layoutex
   - source/data/xml
  patches/configure.patch is applied to get runConfigureICU work in the
  icudata generation step without layout and layoutex directory by removing the
  corresponding Makefile's from ac_config variable.

2. Apply the following patch for platform.h for NaCl.

 - patches/platform_nacl.patch to add U_PF_NATIVE_CLIENT
 - upstream bug (fixed in the upstream 55 RC)
   http://bugs.icu-project.org/trac/ticket/11033


3. Breakiterator patches

   - Apply patches/brkitr.patch
     * word.txt
       a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
          FQDN labels can be split at '.'
       b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
          See http://unicode.org/cldr/trac/ticket/6555
     * line.txt
       a. Use Japanese rules for all locales because Japanese tailoring only
          affects Japanese specific characters.
          See http://unicode.org/cldr/trac/ticket/3974
       b. Minor changes in CL, OP and IS definitions to handle 'comma-variants'
          more consistenly.
          See http://unicode.org/cldr/trac/ticket/6557
       c. Fix line breaking for Chinese characters and quotation marks
          See http://unicode.org/cldr/trac/ticket/4200 and
              http://crbug.com/39779

   - Add a new file brklocal.mk (copied from brkfiles.mk) with line_ja.txt
     and word_POSIX.txt dropped from the build list.

   - Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary
     (source/data/brkitr/khmerdict.txt) obtained from
     http://bugs.icu-project.org/trac/ticket/9451
   - Add several common Chinese words that were dropped previously to
     source/data/cjdict/brkitr/cjdict.txt
     patch: patches/cjdict.patch
     upstream bug: http://bugs.icu-project.org/trac/ticket/10888


   - android/brkitr.patch (to be applied for Android build only) :
       Do not use the C+J dictionary for Chinese/Japanese segmentation
       to reduce the data size. Adjust word.txt and a few other files.

   - source/data/brkitr/word_ja.txt (used only on Android)
       Added for Japanese-specific word-breaking without the C+J dictionary.
4. Converter changes :
  - convrtrs.txt : Replaced the original by our own that only lists encodings
    and aliases required by the WHATWG Encoding spec plus a few extra (see
    the file as to why).

  - Add source/data/mappings/ucmlocal.txt : to list only converters we need.

  - Add new tables per the WHATWG encoding standards for EUC-JP,
    Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
    They're generated with scripts :
      scripts/{eucjp,sjis,big5,single_byte}_gen.sh

  - uconv.patch
    a. ISO-2022-JP-[1-4] is dropped.
    b. SCSU, BOCU, ISCII, UTF-7, LMB, ibm42*, ISO-2022-{KR,CN*} and HZ-GB  :
       converters and detectors are dropped leading to the ~100kB reduction
       in the code size.

  - Upstream bugs
    http://www.icu-project.org/trac/ticket/11296 (uconv.patch)
    http://www.icu-project.org/trac/ticket/10303 (html5 encoding tables)

    uconv.patch was landed in the upstream and is in 55 RC with the build
    config changed to UCONFIG_ONLY_HTML_CONVERSION.

5. Locale changes

  - patches/locale_google.patch : Google's internal ICU locale changes

  - patches/locale1.patch :
      a. Exemplar character set changes for zh*, ja + 9 Indian locales
      b. Minor fixes for Korean and Turkish
  - Locale build configuration files: To include the full locale data
    for Chrome's UI languages and the minimum locale data for other locales,
    add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
    source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.

    This along with #8 (data.build.patch), #3 (brkiter) and #4 (converter)
    cuts down the data size by ~ 11MB.

  - Run scripts/trim_data.sh : About 2.1MB data size reduction.
      a. Trim the locale data for Chrome's UI langauges :
         locales, lang, region, currency
      b. Trim the locale data for non-UI languages to the bare minimum :
        ExemplarCharacters, LocaleScript, layout, and the name of the
        language for a locale in its native language.
      c. Remove the legacy Chinese character set-based collation
         (big5han/gb2312han) that don't make any sense and nobdoy uses.
  - Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang}
    with the minimal locale data necessary for spellchecker and
    and language menus. Also change the English display name
    for ckb to 'Kurdish (Arabic)'.

  - android/patch_locale.sh (to be run for Android build only):
      a. Make changes to source/data/{region,lang} to exclude these data
         except the language and script names of zh_Hans and zh_Hant.
      b. Remove exemplar cities in timezone data (data/zone).
      c. Keep only the minimal calendar data in data/locales.
      d. Include currency display names for a smaller subset of currencies.
      e. Minimize the locale data for 9 locales to which Chrome on Android
         is not localized.
      f. Also apply android/brkitr.patch
6. Timezone data update
  - Grab the latest version of the following timezone data files and
    put them in source/data/misc using scripts/update_tz.sh

     metaZones.txt
     timezoneTypes.txt
     windowsZones.txt
     zoneinfo64.txt

   As of April 1 2015, the latest version is 2015b and the above files
   are available at
   http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015b/44/
7. Transliterator customization
   - Also add css3transform.txt to source/data/trnslit.
   - Put the following line in trnslocal.mk

     TRANSLIT_SOURCE=css3transform.txt

8. Build-related changes

  - patches/wpo.patch
    upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
                    http://bugs.icu-project.org/trac/ticket/5701
  - patches/vscomp.patch for building with Visual Studio on Windows.
     a. do not use WINDOWS_LOCALE_API in locmap.c
     b. do not redefine stringpiece::npos
     c. Fix 'signed vs unsigned comparison' warning in
        collationfastlatin.cpp. The upstream ToT does not have these lines
        any more.
     d. Add static_cast to avoid a possible data truncatiion warning
        upstream bug (fixed in the upstream 55 RC)
        http://bugs.icu-project.org/trac/ticket/11104
  - patches/data.build.patch :
      Remove unnecessary resources : unames, collator rule source
  - patches/pkg_gen.patch :
    upstream bug (fixed in the upstream RC 55)
      http://bugs.icu-project.org/trac/ticket/10572
  - patches/data.build.win.patch :
      Windows-only data build patch.
  - patches/data_symb.patch :
      Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
      the icu data file or icudt.dll
9. Pre-built data files are checked in with the following steps on Linux:
    a. Make a icu data build directory outside the Chromium source tree
       and cd to that directory, $ICUBUILDIR.

      ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout

    c. Run 'make'
    d. 'make' will fail  when pkgdata looks for css3transform.res.
       See http://bugs.icu-project.org/trac/ticket/10570
    e. run
         ${CHROME_ICU_TREE_TOP}/scripts/make_n_copy_data.sh

       This will make and copy icudtl.dat and icudtl_dat.S for Linux and
       Mac as listed below. Renaming the data/assembly files to drop
       the ICU major version number as well as running make_mac_assembly.sh
       is done by this script.

       This script can be run again whenever you update the data.

    - source/data/in/icudtl.dat : Built on Linux with all the patches
      above applied. icudt54l.dat is generated in
      {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
      version number (54) dropped.
    - {mac,linux}/icudtl_dat.S : Built on Linux with all the
      patches above (except android/brkitr.patch) applied and checked in.
      This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp as
      icudt54l_dat.S, but '54' is dropped while copying.
      mac/icudtl_dat.S is identical to linux/icudtl_dat.S except for
      the header portion. With "linux/icudtl_dat.S" in its place,
    - android/icudtl_dat.S : Built on Linux with all the patches above and
      android/patch_locale.sh executed.
      '54' is dropped from the name generated in the build tree.
    - android/icudtl.dat : Generated as icudt54l.dat in
      {BUILD_DIR_ROOT}/data/out/tmp along with icudt54l_dat.S and
      copied to the above location with '54' dropped in its name.
    - windows/icudt.dll (by default, we set icu_use_icu_data_flag to 1
      and don't use this file.)
      a. check out a clean copy of icu54  from the upstream on Windows
         outside the Chrome tree.

        $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-54-1 ${SEPARATE_ICU_ROOT}/icu54
      b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
         ${SEPARATE_ICU_ROOT}/source/data/in/icudt54l.dat
      c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
         ${SEPARATE_ICU_ROOT}/source/data/makedata.mak
      c. In Visual Studio, open source/allinone/allinone.sln solution
         in ${SEPARATE_ICU_ROOT}
      d. Build 'makedata' target
      e. icudt54.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
      f. Copy that icudt54.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
         and check that in.

10. Apply the following patches for regex
   - patches/regex.patch (a combined patch of 3 revisions below)
   - upstream bugs (fixed in the upstream 55 RC)
     http://bugs.icu-project.org/trac/ticket/11370 (r36723:36724)
     http://bugs.icu-project.org/trac/ticket/11369 (r36726:36727)
     http://bugs.icu-project.org/trac/ticket/11371 (r36800:36801)
11. Fix bugs in locid (getBaseName / thread safety).
   - upstream bugs (fixed in the upstream 55 RC)
     http://bugs.icu-project.org/trac/ticket/11421
     http://bugs.icu-project.org/trac/ticket/11547
   - upstream bugs (fixed in the upstream 55 RC)
     http://bugs.icu-project.org/trac/ticket/11177
     http://bugs.icu-project.org/trac/ticket/11451

13. Fix a data race in cmemory
   - patches/cmemory.patch
   - upstream bug (fixed in the upstream 55 RC)
     http://www.icu-project.org/trac/ticket/11538

14. Fix a bug found by 'stack' (static analysis tool)
   - patches/uloc.patch
   - upstream bug
     http://www.icu-project.org/trac/ticket/11602

15. Add a timezone detection API
   - patches/tzdetect.patch (applied in the upstream 55)
   - patches/tzdetect2.patch
   - upstream bugs
     http://bugs.icu-project.org/trac/ticket/11358
     http://bugs.icu-project.org/trac/ticket/11623

16. Properly handle a converter name starting with 'x-'.
   - patches/ucnv_name.patch
   - upstream bug
     http://bugs.icu-project.org/trac/ticket/11696