Skip to content
Snippets Groups Projects
  1. Jun 04, 2015
  2. Mar 19, 2015
    • Jungshik Shin (jungshik at google)'s avatar
      Update CJK converters and their generating scripts · dafa8443
      Jungshik Shin (jungshik at google) authored
      1. Update ucmlocal.mk and convertrs.txt to refer to euc-kr-html.ucm
      instead of windows-949.ucm
      
      2. Tighten up the valid code range for the following converters:
      
         EUC-KR, Shift_JIS, Big5
      
      This is to add back an ASCII range byte to the stream per
      the encoding spec when they're either illegal as a 'trail byte' or
      there's no assigned code point for a "lead + trail" sequence.
      For instance, with this change, '0xF3 0x41' in EUC-KR is converted to
      'U+FFFD U+0041' instead of 'U+FFFD'.
      
      This change requires adding 2 ~ 8 new states to the conversion
      table of each converter mentioned above leading to 6.5kB net increase
      in the final data size.
      
      3. Tighten the trail byte range for 2-byte sequences starting with 0x8E
      from [A1,E2] to [A1,DF] in EUC-JP and update the corresponding generating
      script.
      
      4. Change the substitution characters for EUC-JP and Shift_JIS to
      match other converters. i.e. make them produce U+FFFD when encountering
      an invalid input. Before this chaange, they emitted U+001A.
      
      5. Enable 'U_CHARSET_IS_UTF8' configuration flag.
      Chromium/Blink does not rely on ICU for the code conversion between
      the 'system native encoding' (if it's one of legacy encodings)
      and Unicode. With this configuration, we can cut down the code size
      a bit.
      
      6. Update the icudtl.dat (all platforms) and assembly files (mac,linux)
         and the icudata dll (windows)
      
      See https://codereview.chromium.org/1026453002 for a new blink test
      added ( fast/encoding/char-decoding-invalid-trail.html )
      
      BUG=450312,430823
      TEST=Blink: fast/encoding/char-decoding-{truncated,invalid-trail}.html
      TEST=base_unittests --gtest_filter=*Conv*, browser_tests --gtest_filter=*ncoding*
      R=jsbell@chromium.org, mark@chromium.org
      
      Review URL: https://codereview.chromium.org/984233002
      dafa8443
  3. Jan 21, 2015
  4. Apr 29, 2014
  5. Apr 07, 2014
    • jshin@chromium.org's avatar
      ICU 52 local changes part1 · 4dfa619c
      jshin@chromium.org authored
      1. Remove all the obsolete patches. There are lots of them because most of
      local patches to ICU 4.6.1 have either been accepted or become obsolete.
      The largest local patch removed is our patches for CJ word breaker because
      they were upstreamed.
      
      Android didn't apply the CJK word breaker patch to ICU 4.6 to reduce the
      data size. In a follow-up CL, we'll have an Android-specific change for this issue.
      
      Besides, we don't include patches for files we locally add because the
      patches for new files are redundant. Instead, they're mentioned in
      README.chromium.
      
      2. We don't need platform-specific headers any more (pmac, plinux, pwin, etc).
      They're combined into a single file and all platforms we care about are
      well-supported except for one issue on Android/QNX. putil.patch takes care
      of it.
      
      
      3. Breakiterator patches for a few remaining issues. We also use
      a much smaller Khmer dictionary (upstream fix pending).
      
      4. Converter
        - Introduced two WHATWG-encoding-standard-compliant mapping tables
          are added (derived directly from the spec with a script) for EUC-JP
          and CP866
        - Disabled various non-HTML5-encodings such as SCSU,BOCU, UTF-7, CESU-8
          saving ~30kB in the code size. Even though we link statically, they're
          still pulled in as a part of uconv.
        - Disabled ISO-2022-JP-[1-4] in ucnv2022.c
        - Removed a number of encoding alias entries in the alias table
          leading to ~40kB data size reduction.
      
      5. Locale data : Haven't yet updated. We need to trim them substantially.
      
      6. Unihan collation removal is now done with a script (scripts/remove_unihan.sh)
      
      7. Updated timezone data to the latest (2014b) as of today.
      
      8. Customized transliterator for Greek uppercasing
      
      9. Updated data build related patches. The windows data build patch has yet
         to be updated.
      
      10. The updated ICU data file/assembly source files are not included in this
          CL. They'll be updated in a separate CL.
          With all the size reduction changes applied, the data size went down
          from > 23MB to 12.4MB. However, it's still 2.5MB larger than ICU 4.6.1
          data. The locale data trimming will bring it down further.
      
      11. Update README.chromium accordingly. The only exceptions are
      item #5 and the android entry in item #3 (breakiterator. see #1 above)
      
      
      
      BUG=259715,76328
      TEST=Following the procedure outlined in README.chromium, one can build
      the icu data file.
      
      R=jsbell@chromium.org, mark@chromium.org
      
      Review URL: https://codereview.chromium.org/224943002
      
      git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@262192 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
      4dfa619c
Loading