source/data/unidata/changes.txt - external/github.com/unicode-org/icu - Git at Google

 * Copyright (C) 2004-2008, International Business Machines
 * Corporation and others.  All Rights Reserved.
 *
 *   file name:  changes.txt
 *   encoding:   US-ASCII
 *   tab size:   8 (not used)
 *   indentation:4
 *
 *   created on: 2004may06
 *   created by: Markus W. Scherer
 *
 * change log for Unicode updates

 ---------------------------------------------------------------------------- ***

 Unicode 5.1 update

 *** related ICU Trac tickets

 5696 Update to Unicode 5.1

 *** Unicode version numbers
 - makedata.mak
 - uchar.h
 - configure.in & configure
 - update ucdVersion in gennames.c if an algorithmic range changes

 *** data files & enums & parser code

 * file preparation
 - ucdstrip:
     DerivedCoreProperties.txt
     DerivedNormalizationProps.txt
     NormalizationTest.txt
     PropList.txt
     Scripts.txt
     GraphemeBreakProperty.txt
     SentenceBreakProperty.txt
     WordBreakProperty.txt
 - ucdstrip and ucdmerge:
     EastAsianWidth.txt
     LineBreak.txt

 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
 copy 5.1.0\ucd\Blocks.txt ..\unidata\
 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\

 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt

 * genpname
 - run preparse.pl
   + cd \svn\icuproj\icu\uni51\source\tools\genpname
   + make sure that data.h is writable
   + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
   + preparse.pl complains with errors like the following:
       Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
     This is because ICU 3.8 had scripts from ISO 15924 which are now
     added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
     and PropertyValueAliases.txt.
     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
        Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
   + PropertyValueAliases.txt now explicitly contains values for boolean properties:
       N/Y, No/Yes, F/T, False/True
     -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
        It will use further values from the file if present.

 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
 - new block & script values
   + 17 new blocks
   + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
     (removed from SyntheticPropertyValueAliases.txt)
   + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
     (added to SyntheticPropertyValueAliases.txt)
 - uprops.icu (uprops.h) only provides 7 bits for script codes.
   In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
   There is none above 127 yet which is the script code for an
   assigned Unicode character, so ICU 4.0 uprops.icu does not store any
   script code values greater than 127.
   However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
   in a parallel bit field, and that overflows now.
   Also, future values >=128 would be incompatible anyway.
   uprops.h is modified to move around several of the bit fields
   in the properties vector words, and now uses 8 bits for the script code.
   Two other bit fields also grow to accommodate future growth:
   Block (current count: 172) grows from 8 to 9 bits,
   and Word_Break grows from 4 to 5 bits.
 - renamed property Simple_Case_Folding (sfc->scf)
   + nothing to be done: handled as normal alias
 - new property JSN Jamo_Short_Name
   + no new API: only contributes to the Name property
 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
 - new Joining Group (JG) value: Burushashki_Yeh_Barree
 - new Sentence_Break (SB) values:
     SB ; CR        ; CR
     SB ; EX        ; Extend
     SB ; LF        ; LF
     SB ; SC        ; SContinue
 - new Word_Break (WB) values:
     WB ; CR        ; CR
     WB ; Extend    ; Extend
     WB ; LF        ; LF
     WB ; MB        ; MidNumLet

 * Further changes in the 2008-02-29 update:
 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
   because they should not normally be invisible.
 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
 - new Word_Break (WB) value: NL=Newline

 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
 - Unihan range end moves from 9FBB to 9FC3
   search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
   + do change gennames.c

 * build Unicode data source code for hardcoding core data
 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data

 ICU data make path is \svn\icuproj\icu\uni51\source\data\
 ICU root path is \svn\icuproj\icu\uni51
 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
 Creating data file for Unicode Character Properties
 Creating data file for Unicode Case Mapping Properties
 Creating data file for Unicode BiDi/Shaping Properties
 Creating data file for Unicode Normalization
 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"

 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
   and rebuild the common library

 *** Break iterators

 * Update break iterator rules to new UAX versions and new property values

 *** UCA

 * update FractionalUCA.txt and UCARules.txt with new canonical closure

 *** Test suites
 - Test that APIs using Unicode property value aliases (like UnicodeSet)
   support all of the boolean values N/Y, No/Yes, F/T, False/True
   -> TestBinaryValues() tests in both cintltst and intltest

 *** LayoutEngine script information
 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
 ScriptRunData.cpp, which is no longer needed.)

 The generated files have a current copyright date and "@draft" statement.

 * copy the above files into <icu>/source/layout, replacing the old files.

 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)

 * rebuild the layout and layoutex libraries.

 *** Documentation
 - Update User Guide
   + Jamo_Short_Name, sfc->scf, binary property value aliases

 ---------------------------------------------------------------------------- ***

 Unicode 5.0 update

 *** related Jitterbugs

 5084 RFE: Update to Unicode 5.0

 *** data files & enums & parser code

 * file preparation
 - ucdstrip:
     DerivedCoreProperties.txt
     DerivedNormalizationProps.txt
     NormalizationTest.txt
     PropList.txt
     Scripts.txt
     GraphemeBreakProperty.txt
     SentenceBreakProperty.txt
     WordBreakProperty.txt
 - ucdstrip and ucdmerge:
     EastAsianWidth.txt
     LineBreak.txt

 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
 copy 5.0.0\ucd\Blocks.txt ..\unidata\
 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\

 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt

 * update FractionalUCA.txt and UCARules.txt with new canonical closure

 * genpname
 - run preparse.pl
   + make sure that data.h is writable
   + perl preparse.pl \cvs\oss\icu > out.txt

 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
 - new block & script values
   + script values already added in ICU 3.6 because all of ISO 15924 is now covered

 * build Unicode data source code for hardcoding core data
 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data

 ICU data make path is \cvs\oss\icu\source\data\
 ICU root path is \cvs\oss\icu
 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
 [etc.]
 Creating data file for Unicode Character Properties
 Creating data file for Unicode Case Mapping Properties
 Creating data file for Unicode BiDi/Shaping Properties
 Creating data file for Unicode Normalization
 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"

 - copy the .c source files to C:\cvs\oss\icu\source\common
   and rebuild the common library

 *** Unicode version numbers
 - makedata.mak
 - uchar.h
 - configure.in

 *** LayoutEngine script information
 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
 ScriptRunData.cpp, which is no longer needed.)

 The generated files have a current copyright date and "@draft" statement.

 * copy the above files into <icu>/source/layout, replacing the old files.

 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)

 * rebuild the layout and layoutex libraries.

 ---------------------------------------------------------------------------- ***

 Unicode 4.1 update

 *** related Jitterbugs

 4332 RFE: Update to Unicode 4.1
 4157 RBBI, TR29 4.1 updates

 *** data files & enums & parser code

 * file preparation
 - ucdstrip:
     DerivedCoreProperties.txt
     DerivedNormalizationProps.txt
     NormalizationTest.txt
     GraphemeBreakProperty.txt
     SentenceBreakProperty.txt
     WordBreakProperty.txt
 - ucdstrip and ucdmerge:
     EastAsianWidth.txt
     LineBreak.txt

 * add new files to the repository
     GraphemeBreakProperty.txt
     SentenceBreakProperty.txt
     WordBreakProperty.txt

 * update FractionalUCA.txt and UCARules.txt with new canonical closure

 * genpname
 - handle new enumerated properties in sub read_uchar
 - run preparse.pl

 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
 - new binary properties
   + Pattern_Syntax
   + Pattern_White_Space
 - new enumerated properties
   + Grapheme_Cluster_Break
   + Sentence_Break
   + Word_Break
 - new block & script & line break values

 * gencase
 - case-ignorable changes
   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
   now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk

 *** Unicode version numbers
 - makedata.mak
 - uchar.h
 - configure.in

 *** tests
 - verify that u_charMirror() round-trips
 - test all new properties and some new values of old properties

 *** other code

 * hardcoded Unihan range end/limit
 - Unihan range end moves from 9FA5 to 9FBB
   search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
   + do not modify BOCU/BOCSU code because that would change the encoding
     and break binary compatibility!
   + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
     NamePrepProfile.txt
   + ignore trietest.c: test data is arbitrary
   + ignore tstnorm.cpp: test optimization, not important
   + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
   + do change line_th.txt and word_th.txt
     by replacing hardcoded ranges with the new property values
   + do change gennames.c

 source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
 source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
 source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,

 * case mappings
 - compare new special casing context conditions with previous ones
   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods

 * genpname
 - consider storing only the short name if it is the same as the long name

 *** other reviews
 - UAX #29 changes (grapheme/word/sentence breaks)
 - UAX #14 changes (line breaks)
 - Pattern_Syntax & Pattern_White_Space

 ---------------------------------------------------------------------------- ***

 Unicode 4.0.1 update

 *** related Jitterbugs

 3170 RFE: Update to Unicode 4.0.1
 3171 Add new Unicode 4.0.1 properties
 3520 use Unicode 4.0.1 updates for break iteration

 *** data files & enums & parser code

 * file preparation
 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt

 * file fixes
 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
   according to PRI #26
   http://www.unicode.org/review/resolved-pri.html#pri26
 - undone again because no corrigendum in sight;
   instead modified tests to not check consistency on this for Unicode 4.0.1

 * ucdterms.txt
 - update from http://www.unicode.org/copyright.html
   formatted for plain text

 * uchar.h & uprops.h & uprops.c & genprops
 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
 - add U_LB_INSEPARABLE due to a spelling fix
   + put short name comment only on line with new constant
     for genpname perl script parser
 - new binary properties
   + STerm
   + Variation_Selector

 * genpname
 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
 - perl script: correctly calculate the maximum number of fields per row

 * uscript.h
 - new script code Hrkt=Katakana_Or_Hiragana

 * gennorm.c track changes in DerivedNormalizationProps.txt
 - "FNC" -> "FC_NFKC"
 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.

 * genprops/props2.c track changes in DerivedNumericValues.txt
 - changed from 3 columns to 2, dropping the numeric type
   + assume that the type is always numeric for Han characters,
     and that only those are added in addition to what UnicodeData.txt lists

 *** Unicode version numbers
 - makedata.mak
 - uchar.h
 - configure.in

 *** tests
 - update test of default bidi classes according to PRI #28
   /tsutil/cucdtst/TestUnicodeData
   http://www.unicode.org/review/resolved-pri.html#pri28
 - bidi tests: change exemplar character for ES depending on Unicode version
 - change hardcoded expected property values where they change

 *** other code

 * name matching
 - read UCD.html

 * scripts
 - use new Hrkt=Katakana_Or_Hiragana

 * ZWJ & ZWNJ
 - are now part of combining character sequences
 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
	* Copyright (C) 2004-2008, International Business Machines
	* Corporation and others. All Rights Reserved.
	*
	* file name: changes.txt
	* encoding: US-ASCII
	* tab size: 8 (not used)
	* indentation:4
	*
	* created on: 2004may06
	* created by: Markus W. Scherer
	*
	* change log for Unicode updates

	---------------------------------------------------------------------------- ***

	Unicode 5.1 update

	*** related ICU Trac tickets

	5696 Update to Unicode 5.1

	*** Unicode version numbers
	- makedata.mak
	- uchar.h
	- configure.in & configure
	- update ucdVersion in gennames.c if an algorithmic range changes

	*** data files & enums & parser code

	* file preparation
	- ucdstrip:
	DerivedCoreProperties.txt
	DerivedNormalizationProps.txt
	NormalizationTest.txt
	PropList.txt
	Scripts.txt
	GraphemeBreakProperty.txt
	SentenceBreakProperty.txt
	WordBreakProperty.txt
	- ucdstrip and ucdmerge:
	EastAsianWidth.txt
	LineBreak.txt

	* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
	copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
	copy 5.1.0\ucd\Blocks.txt ..\unidata\
	copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
	copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
	copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
	copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
	copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
	copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
	copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
	copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
	copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
	copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
	copy 5.1.0\ucd\UnicodeData.txt ..\unidata\

	ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
	ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
	ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
	ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
	ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
	ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
	ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
	ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
	ucdstrip < 5.1.0\ucd\EastAsianWidth.txt \| ucdmerge > ..\unidata\EastAsianWidth.txt
	ucdstrip < 5.1.0\ucd\LineBreak.txt \| ucdmerge > ..\unidata\LineBreak.txt

	* genpname
	- run preparse.pl
	+ cd \svn\icuproj\icu\uni51\source\tools\genpname
	+ make sure that data.h is writable
	+ perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
	+ preparse.pl complains with errors like the following:
	Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
	This is because ICU 3.8 had scripts from ISO 15924 which are now
	added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
	and PropertyValueAliases.txt.
	-> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
	Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
	+ PropertyValueAliases.txt now explicitly contains values for boolean properties:
	N/Y, No/Yes, F/T, False/True
	-> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
	It will use further values from the file if present.

	* uchar.h & uscript.h & uprops.h & uprops.c & genprops
	- new block & script values
	+ 17 new blocks
	+ 11 new script values already added in ICU 3.8 for ISO 15924 coverage
	(removed from SyntheticPropertyValueAliases.txt)
	+ 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
	(added to SyntheticPropertyValueAliases.txt)
	- uprops.icu (uprops.h) only provides 7 bits for script codes.
	In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
	There is none above 127 yet which is the script code for an
	assigned Unicode character, so ICU 4.0 uprops.icu does not store any
	script code values greater than 127.
	However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
	in a parallel bit field, and that overflows now.
	Also, future values >=128 would be incompatible anyway.
	uprops.h is modified to move around several of the bit fields
	in the properties vector words, and now uses 8 bits for the script code.
	Two other bit fields also grow to accommodate future growth:
	Block (current count: 172) grows from 8 to 9 bits,
	and Word_Break grows from 4 to 5 bits.
	- renamed property Simple_Case_Folding (sfc->scf)
	+ nothing to be done: handled as normal alias
	- new property JSN Jamo_Short_Name
	+ no new API: only contributes to the Name property
	- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
	- new Joining Group (JG) value: Burushashki_Yeh_Barree
	- new Sentence_Break (SB) values:
	SB ; CR ; CR
	SB ; EX ; Extend
	SB ; LF ; LF
	SB ; SC ; SContinue
	- new Word_Break (WB) values:
	WB ; CR ; CR
	WB ; Extend ; Extend
	WB ; LF ; LF
	WB ; MB ; MidNumLet

	* Further changes in the 2008-02-29 update:
	- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
	because they should not normally be invisible.
	- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
	- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
	- new Word_Break (WB) value: NL=Newline

	* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
	- Unihan range end moves from 9FBB to 9FC3
	search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
	+ do change gennames.c

	* build Unicode data source code for hardcoding core data
	C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data

	ICU data make path is \svn\icuproj\icu\uni51\source\data\
	ICU root path is \svn\icuproj\icu\uni51
	Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
	Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
	Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
	Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
	Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
	Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
	Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
	Creating data file for Unicode Character Properties
	Creating data file for Unicode Case Mapping Properties
	Creating data file for Unicode BiDi/Shaping Properties
	Creating data file for Unicode Normalization
	Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
	Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"

	- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
	and rebuild the common library

	*** Break iterators

	* Update break iterator rules to new UAX versions and new property values

	*** UCA

	* update FractionalUCA.txt and UCARules.txt with new canonical closure

	*** Test suites
	- Test that APIs using Unicode property value aliases (like UnicodeSet)
	support all of the boolean values N/Y, No/Yes, F/T, False/True
	-> TestBinaryValues() tests in both cintltst and intltest

	*** LayoutEngine script information
	* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
	ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
	ScriptRunData.cpp, which is no longer needed.)

	The generated files have a current copyright date and "@draft" statement.

	* copy the above files into <icu>/source/layout, replacing the old files.

	Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
	and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)

	* rebuild the layout and layoutex libraries.

	*** Documentation
	- Update User Guide
	+ Jamo_Short_Name, sfc->scf, binary property value aliases

	---------------------------------------------------------------------------- ***

	Unicode 5.0 update

	*** related Jitterbugs

	5084 RFE: Update to Unicode 5.0

	*** data files & enums & parser code

	* file preparation
	- ucdstrip:
	DerivedCoreProperties.txt
	DerivedNormalizationProps.txt
	NormalizationTest.txt
	PropList.txt
	Scripts.txt
	GraphemeBreakProperty.txt
	SentenceBreakProperty.txt
	WordBreakProperty.txt
	- ucdstrip and ucdmerge:
	EastAsianWidth.txt
	LineBreak.txt

	* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
	copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
	copy 5.0.0\ucd\Blocks.txt ..\unidata\
	copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
	copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
	copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
	copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
	copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
	copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
	copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
	copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
	copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
	copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
	copy 5.0.0\ucd\UnicodeData.txt ..\unidata\

	ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
	ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
	ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
	ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
	ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
	ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
	ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
	ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
	ucdstrip < 5.0.0\ucd\EastAsianWidth.txt \| ucdmerge > ..\unidata\EastAsianWidth.txt
	ucdstrip < 5.0.0\ucd\LineBreak.txt \| ucdmerge > ..\unidata\LineBreak.txt

	* update FractionalUCA.txt and UCARules.txt with new canonical closure

	* genpname
	- run preparse.pl
	+ make sure that data.h is writable
	+ perl preparse.pl \cvs\oss\icu > out.txt

	* uchar.h & uscript.h & uprops.h & uprops.c & genprops
	- new block & script values
	+ script values already added in ICU 3.6 because all of ISO 15924 is now covered

	* build Unicode data source code for hardcoding core data
	C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data

	ICU data make path is \cvs\oss\icu\source\data\
	ICU root path is \cvs\oss\icu
	Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
	[etc.]
	Creating data file for Unicode Character Properties
	Creating data file for Unicode Case Mapping Properties
	Creating data file for Unicode BiDi/Shaping Properties
	Creating data file for Unicode Normalization
	Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
	Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"

	- copy the .c source files to C:\cvs\oss\icu\source\common
	and rebuild the common library

	*** Unicode version numbers
	- makedata.mak
	- uchar.h
	- configure.in

	*** LayoutEngine script information
	* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
	ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
	ScriptRunData.cpp, which is no longer needed.)

	The generated files have a current copyright date and "@draft" statement.

	* copy the above files into <icu>/source/layout, replacing the old files.

	Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
	and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)

	* rebuild the layout and layoutex libraries.

	---------------------------------------------------------------------------- ***

	Unicode 4.1 update

	*** related Jitterbugs

	4332 RFE: Update to Unicode 4.1
	4157 RBBI, TR29 4.1 updates

	*** data files & enums & parser code

	* file preparation
	- ucdstrip:
	DerivedCoreProperties.txt
	DerivedNormalizationProps.txt
	NormalizationTest.txt
	GraphemeBreakProperty.txt
	SentenceBreakProperty.txt
	WordBreakProperty.txt
	- ucdstrip and ucdmerge:
	EastAsianWidth.txt
	LineBreak.txt

	* add new files to the repository
	GraphemeBreakProperty.txt
	SentenceBreakProperty.txt
	WordBreakProperty.txt

	* update FractionalUCA.txt and UCARules.txt with new canonical closure

	* genpname
	- handle new enumerated properties in sub read_uchar
	- run preparse.pl

	* uchar.h & uscript.h & uprops.h & uprops.c & genprops
	- new binary properties
	+ Pattern_Syntax
	+ Pattern_White_Space
	- new enumerated properties
	+ Grapheme_Cluster_Break
	+ Sentence_Break
	+ Word_Break
	- new block & script & line break values

	* gencase
	- case-ignorable changes
	see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
	now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk

	*** Unicode version numbers
	- makedata.mak
	- uchar.h
	- configure.in

	*** tests
	- verify that u_charMirror() round-trips
	- test all new properties and some new values of old properties

	*** other code

	* hardcoded Unihan range end/limit
	- Unihan range end moves from 9FA5 to 9FBB
	search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
	+ do not modify BOCU/BOCSU code because that would change the encoding
	and break binary compatibility!
	+ similarly, do not change the GB 18030 range data (ucnvmbcs.c),
	NamePrepProfile.txt
	+ ignore trietest.c: test data is arbitrary
	+ ignore tstnorm.cpp: test optimization, not important
	+ ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
	+ do change line_th.txt and word_th.txt
	by replacing hardcoded ranges with the new property values
	+ do change gennames.c

	source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
	source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
	source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,

	* case mappings
	- compare new special casing context conditions with previous ones
	see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods

	* genpname
	- consider storing only the short name if it is the same as the long name

	*** other reviews
	- UAX #29 changes (grapheme/word/sentence breaks)
	- UAX #14 changes (line breaks)
	- Pattern_Syntax & Pattern_White_Space

	---------------------------------------------------------------------------- ***

	Unicode 4.0.1 update

	*** related Jitterbugs

	3170 RFE: Update to Unicode 4.0.1
	3171 Add new Unicode 4.0.1 properties
	3520 use Unicode 4.0.1 updates for break iteration

	*** data files & enums & parser code

	* file preparation
	- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
	- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt

	* file fixes
	- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
	according to PRI #26
	http://www.unicode.org/review/resolved-pri.html#pri26
	- undone again because no corrigendum in sight;
	instead modified tests to not check consistency on this for Unicode 4.0.1

	* ucdterms.txt
	- update from http://www.unicode.org/copyright.html
	formatted for plain text

	* uchar.h & uprops.h & uprops.c & genprops
	- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
	- add U_LB_INSEPARABLE due to a spelling fix
	+ put short name comment only on line with new constant
	for genpname perl script parser
	- new binary properties
	+ STerm
	+ Variation_Selector

	* genpname
	- fix genpname perl script so that it doesn't choke on more than 2 names per property value
	- perl script: correctly calculate the maximum number of fields per row

	* uscript.h
	- new script code Hrkt=Katakana_Or_Hiragana

	* gennorm.c track changes in DerivedNormalizationProps.txt
	- "FNC" -> "FC_NFKC"
	- single field "NFD_NO" -> two fields "NFD_QC; N" etc.

	* genprops/props2.c track changes in DerivedNumericValues.txt
	- changed from 3 columns to 2, dropping the numeric type
	+ assume that the type is always numeric for Han characters,
	and that only those are added in addition to what UnicodeData.txt lists

	*** Unicode version numbers
	- makedata.mak
	- uchar.h
	- configure.in

	*** tests
	- update test of default bidi classes according to PRI #28
	/tsutil/cucdtst/TestUnicodeData
	http://www.unicode.org/review/resolved-pri.html#pri28
	- bidi tests: change exemplar character for ES depending on Unicode version
	- change hardcoded expected property values where they change

	*** other code

	* name matching
	- read UCD.html

	* scripts
	- use new Hrkt=Katakana_Or_Hiragana

	* ZWJ & ZWNJ
	- are now part of combining character sequences
	- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ