Skip to content
Snippets Groups Projects
  • Jungshik Shin (jungshik at google)'s avatar
    Update CJK converters and their generating scripts · dafa8443
    Jungshik Shin (jungshik at google) authored
    1. Update ucmlocal.mk and convertrs.txt to refer to euc-kr-html.ucm
    instead of windows-949.ucm
    
    2. Tighten up the valid code range for the following converters:
    
       EUC-KR, Shift_JIS, Big5
    
    This is to add back an ASCII range byte to the stream per
    the encoding spec when they're either illegal as a 'trail byte' or
    there's no assigned code point for a "lead + trail" sequence.
    For instance, with this change, '0xF3 0x41' in EUC-KR is converted to
    'U+FFFD U+0041' instead of 'U+FFFD'.
    
    This change requires adding 2 ~ 8 new states to the conversion
    table of each converter mentioned above leading to 6.5kB net increase
    in the final data size.
    
    3. Tighten the trail byte range for 2-byte sequences starting with 0x8E
    from [A1,E2] to [A1,DF] in EUC-JP and update the corresponding generating
    script.
    
    4. Change the substitution characters for EUC-JP and Shift_JIS to
    match other converters. i.e. make them produce U+FFFD when encountering
    an invalid input. Before this chaange, they emitted U+001A.
    
    5. Enable 'U_CHARSET_IS_UTF8' configuration flag.
    Chromium/Blink does not rely on ICU for the code conversion between
    the 'system native encoding' (if it's one of legacy encodings)
    and Unicode. With this configuration, we can cut down the code size
    a bit.
    
    6. Update the icudtl.dat (all platforms) and assembly files (mac,linux)
       and the icudata dll (windows)
    
    See https://codereview.chromium.org/1026453002 for a new blink test
    added ( fast/encoding/char-decoding-invalid-trail.html )
    
    BUG=450312,430823
    TEST=Blink: fast/encoding/char-decoding-{truncated,invalid-trail}.html
    TEST=base_unittests --gtest_filter=*Conv*, browser_tests --gtest_filter=*ncoding*
    R=jsbell@chromium.org, mark@chromium.org
    
    Review URL: https://codereview.chromium.org/984233002
    dafa8443