Newer
Older
Name: icu
URL: http://site.icu-project.org/
License: MIT
Security Critical: yes
***NOTE***
ICU is in the middle of being updated to 56.1 and does not work, yet.
If you have an urgent fix to apply, contact jshin@chromium.org to
create a branch for 54.1 to apply a fix on top of.
This directory contains the source code of ICU 56.1 for C/C++.
1. Run "scripts/update.sh <version>" (e.g. 56-1).
This will download ICU from the upstream svn repository.
It does preserve Chrome-specific build files (*local.mk) and
converter files. (see section C)
2. Update the source file lists for i18n and common
in icu.gypi and BUILD.gn. See the comments in the files.
3. Review and apply patches/changes in "D. Local Modifications" if
necessary/applicable. Update patch files in patches/.
4. Follow the instructions in section B on building ICU data files
B. How to build ICU data files
Pre-built data files are generated and checked in with the following steps
1. icu data files for Chrome OS, Linux, Mac and Windows
a. Make a icu data build directory outside the Chromium source tree
and cd to that directory (say, $ICUBUILDIR).
${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
c. Run make
'make' will fail when pkgdata looks for css3transform.res. This
is expected. See http://bugs.icu-project.org/trac/ticket/10570
d. Run
${CHROME_ICU_TREE_TOP}/scripts/trim_data.sh
The full locale data for Chrome's UI languages and their select variants
and the bare minimum locale data for other locales will be kept.
e. Run
${CHROME_ICU_TREE_TOP}/scripts/make_data.sh
This will make icudt${version}l.dat and icudt${version}l_dat.S
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
f. Run
${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh
This will erase the result of step d (trim_data.sh).
g. Run
${CHROME_ICU_TREE_TOP}/scripts/copy_data.sh
This will revert the changes made in source/data by trim_data.sh.
It will also copy the ICU data file for non-Android platform
and the corresponding assembly source files for Linux and Mac to
the following places. Check them in.
source/data/in/icudtl.dat
source/{linux,mac}/icudtl_dat.S
h. Whenever data is updated (e.g timezone update), follow d ~ g as long
as the ICU build directory used in a ~ c is kept.
2. icu data files for Android
a. Follow a ~ d for non-Android platforms
b. Run
${CHROME_ICU_TREE_TOP}/android/patch_locale.sh
On top of trim_data.sh, further cut the data entries for Android.
c. Run
${CHROME_ICU_TREE_TOP}/scripts/make_data.sh
d. Run
${CHROME_ICU_TREE_TOP}/scripts/copy_data_android.sh
and check in the following files.
android/icudtl.dat
android/icudtl_dat.S
Jungshik Shin
committed
e. Run
${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh
This will erase the result of trim_data.sh and patch_locale.sh
3. icu data dll for Windows (non-default build option)
Follow these steps to build windows/icudt.dll. By default, we set
icu_use_icu_data_flag to 1 and don't use this file.
a. check out a clean copy of icu56 from the upstream on Windows
outside the Chrome tree.
$ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-56-1 ${SEPARATE_ICU_ROOT}/icu56
b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
${SEPARATE_ICU_ROOT}/source/data/in/icudt56l.dat
c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
${SEPARATE_ICU_ROOT}/source/data/makedata.mak
c. In Visual Studio, open source/allinone/allinone.sln solution
in ${SEPARATE_ICU_ROOT}
d. Build 'makedata' target
e. icudt56.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
f. Copy that icudt56.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
and check that in.
4. Note on the locale data customization
- scripts/trim_data.sh
a. Trim the locale data for Chrome's UI langauges :
locales, lang, region, currency, zone
b. Trim the locale data for non-UI languages to the bare minimum :
ExemplarCharacters, LocaleScript, layout, and the name of the
language for a locale in its native language.
c. Remove the legacy Chinese character set-based collation
(big5han/gb2312han) that don't make any sense and nobdoy uses.
a. Make changes to source/data/{region,lang} to exclude these data
except the language and script names of zh_Hans and zh_Hant.
b. Remove exemplar cities in timezone data (data/zone).
c. Keep only the minimal calendar data in data/locales.
d. Include currency display names for a smaller subset of currencies.
e. Minimize the locale data for 9 locales to which Chrome on Android
is not localized.
f. Also apply android/brkitr.patch
- android/brkitr.patch
Do not use the C+J dictionary for Chinese/Japanese segmentation
to reduce the data size. Adjust word.txt and a few other files.
C. Chromium-specific data build files and converters
They're preserved in step A.1 above. In general, there's no need to touch
them when updating ICU.
1. source/data/mappings
- convrtrs.txt : Lists encodings and aliases required by the WHATWG
Encoding spec plus a few extra (see the file as to why).
- ucmlocal.txt : to list only converters we need.
- *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.
- gb18030.ucm and windows-936.ucm
gb_table.patch was applied for the following changes.
a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
the encoding spec (one-way mapping in toUnicode direction).
b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
from U+1E3F to \xA8\xBC (windows-936/GBK).
See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
2. source/data/*/*local.mk
- List locales of interest to Chromium
a. Chrome's UI languages
b. Variants of UI languages
c. Other locales in Accept-Language list : will only have bare minimum
locale data
- brklocal.mk drops all *loose.brk to save space ( ~370kB) for now.
3. source/data/brkitr
- khmerdict.txt: Abridged Khmer dictionary. See
http://bugs.icu-project.org/trac/ticket/9451
- word_ja.txt (used only on Android)
Added for Japanese-specific word-breaking without the C+J dictionary.
4. source/data/trnslit/css3transform.txt
- Handle Greek case conversion with a transliterator
5. Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang}
with the minimal locale data necessary for spellchecker and
and language menus. Also change the English display name
for ckb to 'Kurdish (Arabic)'.
1. Applied locale data patches from Google obtained by diff'ing
the upstream copy and Google's internal copy for source/data
- patches/locale_google.patch:
* Google's internal ICU locale changes
* Simpler region names for Hong Kong and Macau in all locales
* Currency signs in ru, uk and tr locales
* AM/PM, midnight, noon formatting for a few Indian locales
* Timezone name changes in Korean and Chinese locales
- patches/locale1.patch: Minor fixes for Korean
2. Applied post-56 fixes from the upstream for measure/date format bugs
- patches/measure_format.patch: combined patch of 12 CLs taken
from bugs below.
- upstream bugs
http://bugs.icu-project.org/trac/ticket/11986
http://bugs.icu-project.org/trac/ticket/12031
http://bugs.icu-project.org/trac/ticket/12030
http://bugs.icu-project.org/trac/ticket/12041
- patches/relative_date.patch from Android
https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21
3. Breakiterator patches
- patches/linebrk.patch
a. Drop *_loose.txt for all locales and use the corresponding normal.txt
b. Drop local patches we used to have for the following issues. They'll
be dealt with in the upstream (Unicode/CLDR).
http://unicode.org/cldr/trac/ticket/6557
http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)
- patches/wordbrk.patch for word.txt
a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
FQDN labels can be split at '.'
b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
See http://unicode.org/cldr/trac/ticket/6555
- patches/khmer-dictbe.patch
Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
http://bugs.icu-project.org/trac/ticket/9451
- Add several common Chinese words that were dropped previously to
source/data/cjdict/brkitr/cjdict.txt
patch: patches/cjdict.patch
upstream bug: http://bugs.icu-project.org/trac/ticket/10888
4. Timezone data update
Run scripts/update_tz.sh to grab the latest version of the
following timezone data files and put them in source/data/misc
metaZones.txt
timezoneTypes.txt
windowsZones.txt
zoneinfo64.txt
As of Jan 20 2016, the latest version is 2015g and the above files
are available at
http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/
- patches/wpo.patch
upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
http://bugs.icu-project.org/trac/ticket/5701
- patches/vscomp.patch for building with Visual Studio on Windows.
a. do not use WINDOWS_LOCALE_API in locmap.c
b. do not redefine stringpiece::npos
c. fix http://bugs.icu-project.org/trac/ticket/12129 (C4138 warning)
- patches/data.build.patch :
Remove unnecessary resources : unames, collator rule source
- patches/data.build.win.patch :
Windows-only data build patch.
- patches/data_symb.patch :
Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
the icu data file or icudt.dll
6. Apply a timezone detection API fix
- patches/tzdetect.patch
- upstream bugs
http://bugs.icu-project.org/trac/ticket/11623
7. Fix 'bad cast' found in Transliterator with a cfi build
- patches/xlit_badcast.patch
- upstream bug (yet to be resolved)
http://bugs.icu-project.org/trac/ticket/11937
8. TODO: If removing UTF-32 from Blink is more involved than expected,
add back UTF-32 temporarily even when UCONFIG_ONLY_HTML_CONVERSION is
defined See
http://www.icu-project.org/trac/ticket/11296