| * Copyright (C) 2004-2008, International Business Machines |
| * Corporation and others. All Rights Reserved. |
| * |
| * file name: changes.txt |
| * encoding: US-ASCII |
| * tab size: 8 (not used) |
| * indentation:4 |
| * |
| * created on: 2004may06 |
| * created by: Markus W. Scherer |
| * |
| * change log for Unicode updates |
| |
| ---------------------------------------------------------------------------- *** |
| |
| Unicode 5.1 update |
| |
| *** related ICU Trac tickets |
| |
| 5696 Update to Unicode 5.1 |
| |
| *** Unicode version numbers |
| - makedata.mak |
| - uchar.h |
| - configure.in & configure |
| - update ucdVersion in gennames.c if an algorithmic range changes |
| |
| *** data files & enums & parser code |
| |
| * file preparation |
| - ucdstrip: |
| DerivedCoreProperties.txt |
| DerivedNormalizationProps.txt |
| NormalizationTest.txt |
| PropList.txt |
| Scripts.txt |
| GraphemeBreakProperty.txt |
| SentenceBreakProperty.txt |
| WordBreakProperty.txt |
| - ucdstrip and ucdmerge: |
| EastAsianWidth.txt |
| LineBreak.txt |
| |
| * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) |
| copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ |
| copy 5.1.0\ucd\Blocks.txt ..\unidata\ |
| copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ |
| copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ |
| copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ |
| copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ |
| copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ |
| copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ |
| copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ |
| copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ |
| copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ |
| copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ |
| copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ |
| |
| ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt |
| ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt |
| ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt |
| ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt |
| ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt |
| ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt |
| ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt |
| ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt |
| ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt |
| ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt |
| |
| * genpname |
| - run preparse.pl |
| + cd \svn\icuproj\icu\uni51\source\tools\genpname |
| + make sure that data.h is writable |
| + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt |
| + preparse.pl complains with errors like the following: |
| Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30. |
| This is because ICU 3.8 had scripts from ISO 15924 which are now |
| added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt |
| and PropertyValueAliases.txt. |
| -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: |
| Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii |
| + PropertyValueAliases.txt now explicitly contains values for boolean properties: |
| N/Y, No/Yes, F/T, False/True |
| -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. |
| It will use further values from the file if present. |
| |
| * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
| - new block & script values |
| + 17 new blocks |
| + 11 new script values already added in ICU 3.8 for ISO 15924 coverage |
| (removed from SyntheticPropertyValueAliases.txt) |
| + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) |
| (added to SyntheticPropertyValueAliases.txt) |
| - uprops.icu (uprops.h) only provides 7 bits for script codes. |
| In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. |
| There is none above 127 yet which is the script code for an |
| assigned Unicode character, so ICU 4.0 uprops.icu does not store any |
| script code values greater than 127. |
| However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129 |
| in a parallel bit field, and that overflows now. |
| Also, future values >=128 would be incompatible anyway. |
| uprops.h is modified to move around several of the bit fields |
| in the properties vector words, and now uses 8 bits for the script code. |
| Two other bit fields also grow to accommodate future growth: |
| Block (current count: 172) grows from 8 to 9 bits, |
| and Word_Break grows from 4 to 5 bits. |
| - renamed property Simple_Case_Folding (sfc->scf) |
| + nothing to be done: handled as normal alias |
| - new property JSN Jamo_Short_Name |
| + no new API: only contributes to the Name property |
| - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark |
| - new Joining Group (JG) value: Burushashki_Yeh_Barree |
| - new Sentence_Break (SB) values: |
| SB ; CR ; CR |
| SB ; EX ; Extend |
| SB ; LF ; LF |
| SB ; SC ; SContinue |
| - new Word_Break (WB) values: |
| WB ; CR ; CR |
| WB ; Extend ; Extend |
| WB ; LF ; LF |
| WB ; MB ; MidNumLet |
| |
| * Further changes in the 2008-02-29 update: |
| - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP |
| because they should not normally be invisible. |
| - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed) |
| - new Grapheme_Cluster_Break (GCB) value: PP=Prepend |
| - new Word_Break (WB) value: NL=Newline |
| |
| * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) |
| - Unihan range end moves from 9FBB to 9FC3 |
| search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) |
| + do change gennames.c |
| |
| * build Unicode data source code for hardcoding core data |
| C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data |
| |
| ICU data make path is \svn\icuproj\icu\uni51\source\data\ |
| ICU root path is \svn\icuproj\icu\uni51 |
| Information: cannot find "ucmlocal.mk". Not building user-additional converter files. |
| Information: cannot find "brklocal.mk". Not building user-additional break iterator files. |
| Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. |
| Information: cannot find "collocal.mk". Not building user-additional resource bundle files. |
| Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. |
| Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. |
| Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. |
| Creating data file for Unicode Character Properties |
| Creating data file for Unicode Case Mapping Properties |
| Creating data file for Unicode BiDi/Shaping Properties |
| Creating data file for Unicode Normalization |
| Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l" |
| Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" |
| |
| - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common |
| and rebuild the common library |
| |
| *** Break iterators |
| |
| * Update break iterator rules to new UAX versions and new property values |
| |
| *** UCA |
| |
| * update FractionalUCA.txt and UCARules.txt with new canonical closure |
| |
| *** Test suites |
| - Test that APIs using Unicode property value aliases (like UnicodeSet) |
| support all of the boolean values N/Y, No/Yes, F/T, False/True |
| -> TestBinaryValues() tests in both cintltst and intltest |
| |
| *** LayoutEngine script information |
| * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, |
| ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates |
| ScriptRunData.cpp, which is no longer needed.) |
| |
| The generated files have a current copyright date and "@draft" statement. |
| |
| * copy the above files into <icu>/source/layout, replacing the old files. |
| |
| Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp |
| and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) |
| |
| * rebuild the layout and layoutex libraries. |
| |
| *** Documentation |
| - Update User Guide |
| + Jamo_Short_Name, sfc->scf, binary property value aliases |
| |
| ---------------------------------------------------------------------------- *** |
| |
| Unicode 5.0 update |
| |
| *** related Jitterbugs |
| |
| 5084 RFE: Update to Unicode 5.0 |
| |
| *** data files & enums & parser code |
| |
| * file preparation |
| - ucdstrip: |
| DerivedCoreProperties.txt |
| DerivedNormalizationProps.txt |
| NormalizationTest.txt |
| PropList.txt |
| Scripts.txt |
| GraphemeBreakProperty.txt |
| SentenceBreakProperty.txt |
| WordBreakProperty.txt |
| - ucdstrip and ucdmerge: |
| EastAsianWidth.txt |
| LineBreak.txt |
| |
| * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) |
| copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ |
| copy 5.0.0\ucd\Blocks.txt ..\unidata\ |
| copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ |
| copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ |
| copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ |
| copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ |
| copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ |
| copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ |
| copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ |
| copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ |
| copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ |
| copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ |
| copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ |
| |
| ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt |
| ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt |
| ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt |
| ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt |
| ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt |
| ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt |
| ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt |
| ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt |
| ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt |
| ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt |
| |
| * update FractionalUCA.txt and UCARules.txt with new canonical closure |
| |
| * genpname |
| - run preparse.pl |
| + make sure that data.h is writable |
| + perl preparse.pl \cvs\oss\icu > out.txt |
| |
| * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
| - new block & script values |
| + script values already added in ICU 3.6 because all of ISO 15924 is now covered |
| |
| * build Unicode data source code for hardcoding core data |
| C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data |
| |
| ICU data make path is \cvs\oss\icu\source\data\ |
| ICU root path is \cvs\oss\icu |
| Information: cannot find "ucmlocal.mk". Not building user-additional converter files. |
| [etc.] |
| Creating data file for Unicode Character Properties |
| Creating data file for Unicode Case Mapping Properties |
| Creating data file for Unicode BiDi/Shaping Properties |
| Creating data file for Unicode Normalization |
| Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" |
| Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" |
| |
| - copy the .c source files to C:\cvs\oss\icu\source\common |
| and rebuild the common library |
| |
| *** Unicode version numbers |
| - makedata.mak |
| - uchar.h |
| - configure.in |
| |
| *** LayoutEngine script information |
| * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, |
| ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates |
| ScriptRunData.cpp, which is no longer needed.) |
| |
| The generated files have a current copyright date and "@draft" statement. |
| |
| * copy the above files into <icu>/source/layout, replacing the old files. |
| |
| Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp |
| and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) |
| |
| * rebuild the layout and layoutex libraries. |
| |
| ---------------------------------------------------------------------------- *** |
| |
| Unicode 4.1 update |
| |
| *** related Jitterbugs |
| |
| 4332 RFE: Update to Unicode 4.1 |
| 4157 RBBI, TR29 4.1 updates |
| |
| *** data files & enums & parser code |
| |
| * file preparation |
| - ucdstrip: |
| DerivedCoreProperties.txt |
| DerivedNormalizationProps.txt |
| NormalizationTest.txt |
| GraphemeBreakProperty.txt |
| SentenceBreakProperty.txt |
| WordBreakProperty.txt |
| - ucdstrip and ucdmerge: |
| EastAsianWidth.txt |
| LineBreak.txt |
| |
| * add new files to the repository |
| GraphemeBreakProperty.txt |
| SentenceBreakProperty.txt |
| WordBreakProperty.txt |
| |
| * update FractionalUCA.txt and UCARules.txt with new canonical closure |
| |
| * genpname |
| - handle new enumerated properties in sub read_uchar |
| - run preparse.pl |
| |
| * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
| - new binary properties |
| + Pattern_Syntax |
| + Pattern_White_Space |
| - new enumerated properties |
| + Grapheme_Cluster_Break |
| + Sentence_Break |
| + Word_Break |
| - new block & script & line break values |
| |
| * gencase |
| - case-ignorable changes |
| see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods |
| now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk |
| |
| *** Unicode version numbers |
| - makedata.mak |
| - uchar.h |
| - configure.in |
| |
| *** tests |
| - verify that u_charMirror() round-trips |
| - test all new properties and some new values of old properties |
| |
| *** other code |
| |
| * hardcoded Unihan range end/limit |
| - Unihan range end moves from 9FA5 to 9FBB |
| search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) |
| + do not modify BOCU/BOCSU code because that would change the encoding |
| and break binary compatibility! |
| + similarly, do not change the GB 18030 range data (ucnvmbcs.c), |
| NamePrepProfile.txt |
| + ignore trietest.c: test data is arbitrary |
| + ignore tstnorm.cpp: test optimization, not important |
| + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF |
| + do change line_th.txt and word_th.txt |
| by replacing hardcoded ranges with the new property values |
| + do change gennames.c |
| |
| source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 |
| source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 |
| source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, |
| |
| * case mappings |
| - compare new special casing context conditions with previous ones |
| see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods |
| |
| * genpname |
| - consider storing only the short name if it is the same as the long name |
| |
| *** other reviews |
| - UAX #29 changes (grapheme/word/sentence breaks) |
| - UAX #14 changes (line breaks) |
| - Pattern_Syntax & Pattern_White_Space |
| |
| ---------------------------------------------------------------------------- *** |
| |
| Unicode 4.0.1 update |
| |
| *** related Jitterbugs |
| |
| 3170 RFE: Update to Unicode 4.0.1 |
| 3171 Add new Unicode 4.0.1 properties |
| 3520 use Unicode 4.0.1 updates for break iteration |
| |
| *** data files & enums & parser code |
| |
| * file preparation |
| - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt |
| - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt |
| |
| * file fixes |
| - fix UnicodeData.txt general categories of Ethiopic digits Nd->No |
| according to PRI #26 |
| http://www.unicode.org/review/resolved-pri.html#pri26 |
| - undone again because no corrigendum in sight; |
| instead modified tests to not check consistency on this for Unicode 4.0.1 |
| |
| * ucdterms.txt |
| - update from http://www.unicode.org/copyright.html |
| formatted for plain text |
| |
| * uchar.h & uprops.h & uprops.c & genprops |
| - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed |
| - add U_LB_INSEPARABLE due to a spelling fix |
| + put short name comment only on line with new constant |
| for genpname perl script parser |
| - new binary properties |
| + STerm |
| + Variation_Selector |
| |
| * genpname |
| - fix genpname perl script so that it doesn't choke on more than 2 names per property value |
| - perl script: correctly calculate the maximum number of fields per row |
| |
| * uscript.h |
| - new script code Hrkt=Katakana_Or_Hiragana |
| |
| * gennorm.c track changes in DerivedNormalizationProps.txt |
| - "FNC" -> "FC_NFKC" |
| - single field "NFD_NO" -> two fields "NFD_QC; N" etc. |
| |
| * genprops/props2.c track changes in DerivedNumericValues.txt |
| - changed from 3 columns to 2, dropping the numeric type |
| + assume that the type is always numeric for Han characters, |
| and that only those are added in addition to what UnicodeData.txt lists |
| |
| *** Unicode version numbers |
| - makedata.mak |
| - uchar.h |
| - configure.in |
| |
| *** tests |
| - update test of default bidi classes according to PRI #28 |
| /tsutil/cucdtst/TestUnicodeData |
| http://www.unicode.org/review/resolved-pri.html#pri28 |
| - bidi tests: change exemplar character for ES depending on Unicode version |
| - change hardcoded expected property values where they change |
| |
| *** other code |
| |
| * name matching |
| - read UCD.html |
| |
| * scripts |
| - use new Hrkt=Katakana_Or_Hiragana |
| |
| * ZWJ & ZWNJ |
| - are now part of combining character sequences |
| - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ |