ICU-20088 fix pages as part of userguide migration to markdown format.
diff --git a/docs/userguide/icudata.md b/docs/userguide/icudata.md
index 473ae77..ded599a 100644
--- a/docs/userguide/icudata.md
+++ b/docs/userguide/icudata.md
@@ -1,3 +1,8 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
# ICU Data
## Overview
@@ -22,7 +27,7 @@
Tool](https://github.com/unicode-org/icu/blob/master/docs/userguide/icu_data/buildtool.md),
to give you more control over what goes into your ICU locale data file.
-***Note**** that ICU for C by default comes with pre-built data. The source data
+> :point_right: **Note**: ICU for C by default comes with pre-built data. The source data
files are included as an "icu\*data.zip" file starting in ICU4C 49. Previously,
they were not included unless ICU is downloaded from the [source
repository](http://site.icu-project.org/repository).*
@@ -53,21 +58,21 @@
The ICU data directory is determined as follows:
-1. If the application has called the function u_setDataDirectory(), use the
+1. If the application has called the function `u_setDataDirectory()`, use the
directory specified there, otherwise:
-2. If the environment variable ICU_DATA is set, use that, otherwise:
+2. If the environment variable `ICU_DATA` is set, use that, otherwise:
-3. If the C preprocessor variable ICU_DATA_DIR was set at the time ICU was
+3. If the C preprocessor variable `ICU_DATA_DIR` was set at the time ICU was
built, use its compiled-in value.
4. Otherwise, the ICU data directory is an empty string. This is the default
behavior for ICU using a shared library for its data and provides the
highest data loading performance.
-*u_setDataDirectory() is not thread-safe. Call it before calling ICU APIs from
-multiple threads. If you use both u_setDataDirectory() and u_init(), then use
-u_setDataDirectory() first.*
+:point_right: **Note**: #`u_setDataDirectory()` is not thread-safe. Call it
+before calling ICU APIs from multiple threads. If you use both
+`u_setDataDirectory()` and `u_init()`, then use `u_setDataDirectory()` first.*
*Earlier versions of ICU supported two additional schemes: setting a data
directory relative to the location of the ICU shared libraries, and on Windows,
taking a location from the registry. These have both been removed to make the
@@ -100,7 +105,7 @@
Here are the steps followed by ICU to locate its default data. This procedure
happens only once per process, at the time an ICU data item is first requested.
-1. If the application has called the function udata_setCommonData(), use the
+1. If the application has called the function `udata_setCommonData()`, use the
data that was provided. The application specifies the address in memory of
an image of an ICU common format data file (either in shared-library format
or .dat package file format).
@@ -112,40 +117,43 @@
actual ICU common data is to be provided from another source).
3. Dynamically load (memory map, typically) a common format (.dat) file
- containing the default ICU data. Loading is described in the section How
- Data Loading Works (§). The path to the data is of the form
- "icudt<version><flag>", where <version> is the two-digit ICU version number,
- and <flag> is a letter indicating the internal format of the file (see the
- Sharing ICU Data Between Platforms section (§)).
+ containing the default ICU data. Loading is described in the section
+ [How Data Loading Works](icudata.md#how-data-loading-works). The path to
+ the data is of the form "icudt\<version\>\<flag\>", where \<version\> is
+ the two-digit ICU version number, and \<flag\> is a letter indicating the
+ internal format of the file (see the
+ [Sharing ICU Data Between Platforms](icudata.md#sharing-icu-data-between-platforms)
+ section).
Once the default ICU data has been located, loading of individual data items
-proceeds as described in the section How Data Loading Works (§).
+proceeds as described in the section
+[How Data Loading Works](icudata.md#how-data-loading-works).
## Building and Linking against ICU data
When using ICU's configure or runConfigureICU tool to build, several different
methods of packging are available.
-Note that in all cases, you **must** link all ICU tools and applications against
-a "data library": either a data library containing the ICU data, or against the
-"stubdata" library located in icu/source/stubdata. For example, even if ICU is
-built in "files" mode, you must still link against the "stubdata" library or an
-undefined symbol error occurs.
+> :point_right: Note that in all cases, you **must** link all ICU tools and
+applications against a "data library": either a data library containing the ICU
+data, or against the "stubdata" library located in icu/source/stubdata. For
+example, even if ICU is built in "files" mode, you must still link against the
+"stubdata" library or an undefined symbol error occurs.
-* `--with-data-packaging=``library`
+* `--with-data-packaging=library`
This mode builds a shared library (DLL or .so). This is the simplest mode to
use, and is the default.
- To use: link your application against the common and data librariews.
+ To use: link your application against the common and data libraries.
This is the only directly supported behavior on Windows builds.
-* `--with-data-packaging=``static`
+* `--with-data-packaging=static`
This option builds ICU data as a single (large) static library. This mode is
more complex to use. If you encounter errors, you may need to build ICU
multiple times.
-* `--with-data-packaging=``files`
+* `--with-data-packaging=files`
With this option, ICU outputs separate individual files (.res, .cnv, etc)
which will be loaded at runtime. Read the rest of this document, especially
the sections that discuss the ICU directory path.
-* `--with-data-packaging=``archive`
+* `--with-data-packaging=archive`
With this option, ICU outputs a single "icudt__.dat" file containing ICU
data. Read the rest of this document, especially the sections that discuss
the ICU directory path.
@@ -164,8 +172,8 @@
custom conversion tables, etc. Each data item file must have a package name as a
prefix, and this package name must match the basename of a .dat package file, if
one is used. The package name must be used in ICU APIs, for example in
-udata_setAppData() (instead of udata_setCommonData() which is only used for
-ICU's own data) and in the pathname argument of ures_open().
+`udata_setAppData()` (instead of `udata_setCommonData()` which is only used for
+ICU's own data) and in the pathname argument of `ures_open()`.
The only real difference to ICU's own data is that application data cannot be
simply loaded by specifying a NULL value for the path arguments of ICU APIs, and
@@ -174,18 +182,20 @@
The most important APIs that allow application data to be used are for Resource
Bundles, which are most often used for localized strings and other data. There
-are also functions like ucnv_openPackage() that allow to specify application
-data, and the udata.h API can be used to load any data with minimum requirements
-on the binary format, and without ICU interpreting the contents of the data.
+are also functions like `ucnv_openPackage()` that allow to specify application
+data, and the `udata.h` API can be used to load any data with minimum
+requirements on the binary format, and without ICU interpreting the contents of
+the data.
-The pkgdata tool, which is used to package the data into various formats (e.g.
-shared library), has an option (--without-assembly or -w) to not use assembly
-code when building and packaging the application specific data into a shared
-library. Building the data with assembly code, which is enabled by default, is
-faster and more efficient; however, there are some platform specific issues that
-may arise. The --without-assembly option may be necessary on certain platforms
-(e.g. Linux) which have trouble properly loading application data when it was
-built with assembly code and is packaged as a shared library.
+The `pkgdata` tool, which is used to package the data into various formats (e.g.
+shared library), has an option (`--without-assembly` or `-w`) to not use
+assembly code when building and packaging the application specific data into a
+shared library. Building the data with assembly code, which is enabled by
+default, is faster and more efficient; however, there are some platform
+specific issues that may arise. The `--without-assembly` option may be
+necessary on certain platforms (e.g. Linux) which have trouble properly loading
+application data when it was built with assembly code and is packaged as a
+shared library.
## Alignment
@@ -200,13 +210,13 @@
Some of the ICU code explicitly checks for proper alignment.
-The icupkg tool places data items into the .dat file at start offsets that are
+The `icupkg` tool places data items into the .dat file at start offsets that are
multiples of 16 bytes.
-When using genccode to directly write a .o/.obj file, or to write assembler
-code, it specifies at least 16-alignment. When using genccode to write C code,
+When using `genccode` to directly write a .o/.obj file, or to write assembler
+code, it specifies at least 16-alignment. When using `genccode` to write C code,
it prepends the data with a double value which should yield at least 8-alignment
-on most platforms (usually sizeof(double)=8).
+on most platforms (usually `sizeof(double)=8`).
## Flexibility vs. Installation vs. Performance
@@ -224,34 +234,34 @@
Packaging data into .dat files (`--with-data-packaging=archive`) allows them to
be shared across platforms, but they must either be loaded by the application
-and set with udata_setCommonData() or udata_setAppData(), or they must be in a
-known location that is included in the ICU data directory string. This requires
-the application installer, or the application itself at runtime, to locate the
-ICU and/or application data by setting the ICU data directory (see the ICU Data
-Directory (§) section above) or by loading the data and providing it to one of
-the udata_setXYZData() functions.
+and set with `udata_setCommonData()` or `udata_setAppData()`, or they must be
+in a known location that is included in the ICU data directory string. This
+requires the application installer, or the application itself at runtime, to
+locate the ICU and/or application data by setting the ICU data directory (see
+the [ICU Data Directory](icudata.md#icu-data-directory) section above) or by
+loading the data and providing it to one of the `udata_setXYZData()` functions.
Unlike shared libraries, .dat package files can be taken apart into separate
data item files with the decmn ICU tool. This allows post-installation
-modification of a package file. The gencmn and pkgdata ICU tools can then be
+modification of a package file. The `gencmn` and `pkgdata` ICU tools can then be
used to reassemble the .dat package file.
-For more information about .dat package files see the section Sharing ICU Data
-Between Platforms (§) below.
+For more information about .dat package files see the section [Sharing ICU Data
+Between Platforms](icudata.md#sharing-icu-data-between-platforms) below.
### Data Overriding vs. Loading Performance
If the ICU data directory string is empty, then ICU will not attempt to load
data from the file system. It is then only possible to load data from the
-linked-in shared library or via udata_setCommonData() and udata_setAppData().
-This is inflexible but provides the highest performance.
+linked-in shared library or via `udata_setCommonData()` and
+`udata_setAppData()`. This is inflexible but provides the highest performance.
If the ICU data directory string is not empty, then data items are searched in
all directories and matching .dat files mentioned before checking in
already-loaded package files. This allows overriding of packaged data items with
single files after installation but costs some time for filesystem accesses.
-This is usually done only once per data item; see [User Data
-Caching](http://icu-project.org/userguide/icudata.html#data_caching) below.
+This is usually done only once per data item; see
+[User Data Caching](icudata.md#user-data-caching) below.
### Single Data Files vs. Packages
@@ -265,11 +275,13 @@
ICU data items are referenced by three names - a path, a name and a type. The
following are some examples:
-path name type
-cnvalias icu
-cp1252 cnv
-en res
-uprops icu c:\\some\\path\\dataLibName test dat
+path | name | type
+-----------------------------|----------|-----
+ | cnvalias | icu
+ | cp1252 | cnv
+ | en | res
+ | uprops | icu
+ c:\\some\\path\\dataLibName | test | dat
Items with no path specified are loaded from the default ICU data.
@@ -282,16 +294,18 @@
argument can contain at most one directory. (Path separators like semicolon (;)
are not handled here.)
-*The ICU data directory string itself may contain multiple directories and
-path/filenames to .dat package files. See the ICU Data Directory (§) section.*
+> :point_right: *The ICU data directory string itself may contain multiple
+directories and path/filenames to .dat package files. See the
+[ICU Data Directory](icudata.md#icu-data-directory) section.*
It is recommended to not include the directory in the path argument but to make
sure via setting the application data or the ICU data directory string that the
data can be located. This simplifies program maintenance and improves
robustness.
-See the API descriptions for the functions udata_open() and udata_openChoice()
-for additional information on opening ICU data from within an application.
+See the API descriptions for the functions `udata_open()` and
+`udata_openChoice()` for additional information on opening ICU data from within
+an application.
Data items can exist as individual files, or a number of them can be packaged
together in a single file for greater efficiency in loading and convenience of
@@ -301,9 +315,11 @@
when opening data. To make things more concrete in the following descriptions,
the following values of path, name and type are used:
+```
path = "c:\\some\\path\\dataLibName"
name = "test"
type = "res"
+```
In this case, "dataLibName" is the "package name" part of the path argument, and
"c:\\some\\path\\" is the directory part of it.
@@ -334,10 +350,10 @@
package has been loaded before, then the cached package is used immediately
instead of searching through the filesystem.
-*ICU versions before 2.2 always searched data packages before looking for
-individual files, which made it impossible to override packaged data items. See
-the ICU 2.2 download page and the readme for more information about the
-changes.*
+> :point_right: *ICU versions before 2.2 always searched data packages before
+looking for individual files, which made it impossible to override packaged
+data items. See the ICU 2.2 download page and the readme for more information
+about the changes.*
## User Data Caching
@@ -353,8 +369,8 @@
matter what directory path is specified, will resolve to the cached data.
Data can be explicitly added to the cache of common format data by means of the
-udata_setAppData() function. This function takes as input the path (name) and a
-pointer to a memory image of a .dat file. The data is added to the cache,
+`udata_setAppData()` function. This function takes as input the path (name) and
+a pointer to a memory image of a .dat file. The data is added to the cache,
causing any subsequent requests for data items from that file name to be routed
to the cache.
@@ -365,7 +381,7 @@
However, most ICU services (Resource Bundles, conversion, etc.) themselves cache
loaded data, so that data is usually loaded only once until the end of the
-process (or until u_cleanup() or ucnv_flushCache() or similar are called.)
+process (or until `u_cleanup()` or `ucnv_flushCache()` or similar are called.)
There is no mechanism for removing or updating cached data files.
@@ -375,9 +391,9 @@
use the form that is native to the platform. The ICU header "putil.h" defines
U_FILE_SEP_CHAR appropriately for the platform.
-*On Windows, the directory separator must be '\\' for any paths passed to ICU
-APIs. This is different from native Windows APIs, which generally allow either
-'/' or '\\'.*
+> :point_right: *On Windows, the directory separator must be '\\' for any paths
+passed to ICU APIs. This is different from native Windows APIs, which generally
+allow either '/' or '\\'.*
## Sharing ICU Data Between Platforms
@@ -407,9 +423,11 @@
The single letter following the version number in the file name of the default
ICU data file encodes the properties of the file as follows:
+```
icudt19l.dat Little Endian, ASCII
icudt19b.dat Big Endian, ASCII
icudt19e.dat Big Endian, EBCDIC
+```
(There are no little endian EBCDIC systems. All non-EBCDIC encodings include an
invariant subset of ASCII that is sufficient to enable these files to
@@ -420,11 +438,11 @@
Nothing is required at run time; ICU finds and uses whatever form of the data is
available.
-*When the ICU data is built in the form of shared libraries, the library names
-have platform-specific prefixes and suffixes. On Unix-style platforms, all the
-libraries have the "lib" prefix and one of the usual (".dll", ".so", ".sl",
-etc.) suffixes. Other than these prefixes and suffixes, the library names are
-the same as the above .dat files.*
+> :point_right: *When the ICU data is built in the form of shared libraries, the
+library names have platform-specific prefixes and suffixes. On Unix-style
+platforms, all the libraries have the "lib" prefix and one of the usual
+(".dll", ".so", ".sl", etc.) suffixes. Other than these prefixes and suffixes,
+the library names are the same as the above .dat files.*
## Customizing ICU's Data Library
@@ -433,69 +451,17 @@
normally placed into a single shared library.
Update: as of ICU 64, the standard data library is over 20 MB in size. We have
-introduced a new tool, the [ICU Data Build
-Tool](https://github.com/unicode-org/icu/blob/master/docs/userguide/icu_data/buildtool.md),
+introduced a new tool, the [ICU Data Build Tool](icu_data/buildtool.md),
to replace the makefiles explained below and give you more control over what
goes into your ICU locale data file.
-~~ The ICU data library can be easily customized, either by adding additional converters or locales, or by removing some of the standard ones for the purpose of saving space.~~
-
-~~ ***Note**** that ICU for C by default comes with pre-built data. The source data files are included as an "icu\*data.zip" file starting in ICU4C 49. Previously, they were not included unless ICU is downloaded from the [source repository](http://site.icu-project.org/repository). Alternatively, the [Data Customizer](http://apps.icu-project.org/datacustom/) may be used to customize the pre-built data.*~~
-
-~~ ICU can load data from individual data files as well as from its default library, so building a customized library when adding additional data is not strictly necessary. Adding to ICU's library can simplify application installation by eliminating the need to include separate files with an application distribution, and the need to tell ICU where they are installed.~~
-
-~~ Reducing the size of ICU's data by eliminating unneeded resources can make sense on small systems with limited or no disk, but for desktop or server systems there is no real advantage to trimming. ICU's data is memory mapped into an application's address space, and only those portions of the data actually being used are ever paged in, so there are no significant RAM savings. As for disk space, with the large size of today's hard drives, saving a few MB is not worth the bother.~~
-
-~~ By default, ICU builds with a large set of converters and with all available locales. This means that any extra items added must be provided by the application developer. There is no extra ICU-supplied data that could be specified.~~
-
-### Details
-
-~~ The converters and resources that ICU builds are in the following configuration files. They are only available when building from ICU's source code repository. Normally, the standard ICU distribution do not include these files.~~
-
-~~ source/data/locales/resfiles.mk The standard set of locale data resource bundles source/data/locales/reslocal.mk User-provided file with additional resource bundles source/data/coll/colfiles.mk The standard set of collation data resource bundles source/data/coll/collocal.mk User-provided file with additional collation resource bundles source/data/brkitr/brkfiles.mk The standard set of break iterator data resource bundles source/data/brkitr/brklocal.mk User-provided file with additional break iterator resource bundles source/data/translit/trnsfiles.mk The standard set of transliterator resource files source/data/translit/trnslocal.mk User-provided file with a set of additional transliterator resource files source/data/mappings/ucmcore.mk Core set of conversion tables for MIME/Unix/Windows source/data/mappings/ucmfiles.mk Additional, large set of conversion tables for a wide range of uses source/data/mappings/ucmebcdic.mk Large set of EBCDIC conversion tables source/data/mappings/ucmlocal.mk User-provided file with additional conversion tables source/data/misc/miscfiles.mk Miscellaneous data, like timezone information ~~
-
-~~These files function identically for both Windows and UNIX builds of ICU. ICU will automatically update the list of installed locales returned by uloc_getAvailable() whenever resfiles.mk or reslocal.mk are updated and the ICU data library is rebuilt. These files are only needed while building ICU. If any of these files are removed or renamed, the size of the ICU data library will be reduced.~~
-
-~~ The optional files reslocal.mk and ucmlocal.mk are not included as part of a standard ICU distribution. Thus these customization files do not need to be merged or updated when updating versions of ICU.~~
-
-~~ Both reslocal.mk and ucmlocal.mk are makefile includes. So the usual rules for makefiles apply. Lines may be continued by preceding the end of the line to be continued with a back slash. Lines beginning with a # are comments. See ucmfiles.mk and resfiles.mk for additional information.~~
-
-### Reducing the Size of ICU's Data: Conversion Tables
-
-~~ The size of the ICU data file in the standard build configuration is about 8 MB. The majority of this is used for conversion tables. ICU comes with so many conversion tables because many ICU users need to support many encodings from many platforms. There are conversion tables for EBCDIC and DOS codepages, for ISO 2022 variants, and for small variations of popular encodings.~~
-
-~~ **Important:** ICU provides full internationalization functionality without **any** conversion table data. The common library contains code to handle several important encodings algorithmically: US-ASCII, ISO-8859-1, UTF-7/8/16/32, SCSU, BOCU-1, CESU-8, and IMAP-mailbox-name (i.e., US-ASCII, ISO-8859-1, and all Unicode charsets; see source/data/mappings/convrtrs.txt for the current list).~~
-
-~~ Therefore, the easiest way to reduce the size of ICU's data by a lot (without limitation of I18N support) is to reduce the number of conversion tables that are built into the data file.~~
-
-~~ The conversion tables are listed for the build process in several makefiles source/data/mappings/ucm\*.mk, roughly grouped by how commonly they are used. If you remove or rename any of these files, then the ICU build will exclude the conversion tables that are listed in that file. Beginning with ICU 2.0, all of these makefiles including the main one are optional. If you remove all of them, then ICU will include only very few conversion tables for "fallback" encodings (see note below).~~
-
-~~ If you remove or rename all ucm\*.mk files, then ICU's data is reduced to about 3.6 MB. If you remove all these files except for ucmcore.mk, then ICU's data is reduced to about 4.7 MB, while keeping support for a core set of common MIME/Unix/Windows encodings.~~
-
-~~ *If you remove the conversion table for an encoding that could be a default encoding on one of your platforms, then ICU will not be able to instantiate a default converter. In this case, ICU 2.0 and up will automatically fall back to a "lowest common denominator" and load a converter for US-ASCII (or, on EBCDIC platforms, for codepages 37 or 1047). This will be good enough for converting strings that contain only "ASCII" characters (see the comment about "invariant characters" in utypes.h).* ~~
-~~ *When ICU is built with a reduced set of conversion tables, then some tests will fail that test the behavior of the converters based on known features of some encodings. Also, building the testdata will fail if you remove some conversion tables that are necessary for that (to test non-ASCII/Unicode resource bundle source files, for example). You can ignore these failures. Build with the standard set of conversion tables, if you want to run the tests.* ~~
-
-### Reducing the Size of ICU's Data: Locale Data
-
-~~ If you need to reduce the size of ICU's data even further, then you need to remove other files or parts of files from the build as well.~~
-
-~~ There are a number of different subdirectories of 'data' containing locale data split out by section. Each subdirectory has its own **.mk** file listing the locales which will be built. Subdirectories include **lang** for language names and **curr** for currency names.~~
-
-~~ You can remove data for entire locales by removing their files from source/data/locales/resfiles.mk or the appropriate other .mk file. ICU will then use the data of the parent locale instead, which is root.txt. If you remove all resource bundles for a given language and its country/region/variant sublocales, **do not remove root.txt!** Also, do not remove a parent locale if child locales exist. For example, do not remove "en" while retaining "en_US".~~
-
-### Reducing the Size of ICU's Data: Collation Data
-
-~~ Collation data (for sorting, searching and alphabetic indexes) is also large, especially the collation data for East Asian languages because they define multiple orderings of tens of thousands of Han characters. You can remove the collation data for those languages by removing references to those locales from source/data/coll/colfiles.mk files. When you do that, the collation for those languages will fall back to the root collator, that is, you lose language-specific behavior.~~
-
-~~ A much less radical approach is to keep the collation data tables but remove the tailoring rule strings from which they were built. Those rule strings are rarely used at runtime. For documentation about their use and how to remove them see the section "Building on Existing Locales" in the [Collation Customization chapter](collation/customization/index.md).~~
-
### Adding Converters to ICU
The first step is to obtain or create a .ucm (source) mapping data file for the
desired converter. A large archive of converter data is maintained by the ICU
team at <http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/>
-We will use solaris-eucJP-2.7.ucm, available from the repository mentioned
+We will use `solaris-eucJP-2.7.ucm`, available from the repository mentioned
above, as an example.
#### Build the Converter
@@ -503,49 +469,56 @@
Converter source files are compiled into binary converter files (.cnv files) by
using the icu tool makeconv. For the example, you can use this command
+```
makeconv -v solaris-eucJP-2.7.ucm
+```
Some of the .ucm files from the repository will need additional header
information before they can be built. Use the error messages from the makeconv
tool, .ucm files for similar converters, and the ICU user guide documentation of
-.ucm files as a guide when making changes. For the solaris-eucJP-2.7.ucm
+.ucm files as a guide when making changes. For the `solaris-eucJP-2.7.ucm`
example, we will borrow the missing header fields from
-source/data/mappings/ibm-33722_P12A-2000.ucm, which is the standard ICU eucJP
+`source/data/mappings/ibm-33722_P12A-2000.ucm`, which is the standard ICU eucJP
converter data.
-The ucm file format is described in the ["Conversion Data"
-chapter](conversion/data.md) of this user guide.
+The ucm file format is described in the
+["Conversion Data" chapter](conversion/data.md) of this user guide.
-After adjustment, the header of the solaris-eucJP-2.7.ucm file contains these
+After adjustment, the header of the `solaris-eucJP-2.7.ucm` file contains these
items:
-<code_set_name> "solaris-eucJP-2.7"
-<subchar> \\x3F
-<uconv_class> "MBCS"
-<mb_cur_max> 3
-<mb_cur_min> 1
-<icu:state> 0-8d, 8e:2, 8f:3, 90-9f, a1-fe:1
-<icu:state> a1-fe
-<icu:state> a1-e4
-<icu:state> a1-fe:1, a1:4, a3-af:4, b6:4, d6:4, da-db:4, ed-f2:4
-<icu:state> a1-fe
+```
+<code_set_name> "solaris-eucJP-2.7"
+<subchar> \\x3F
+<uconv_class> "MBCS"
-The binary converter file produced by the makeconv tool is solaris-eucJP-2.7.cnv
+<mb_cur_max> 3
+<mb_cur_min> 1
+
+<icu:state> 0-8d, 8e:2, 8f:3, 90-9f, a1-fe:1
+<icu:state> a1-fe
+<icu:state> a1-e4
+<icu:state> a1-fe:1, a1:4, a3-af:4, b6:4, d6:4, da-db:4, ed-f2:4
+<icu:state> a1-fe
+```
+
+The binary converter file produced by the `makeconv` tool is
+`solaris-eucJP-2.7.cnv`.
#### Installation
Copy the new .cnv file to the desired location for use. Set the environment
-variable ICU_DATA to the directory containing the data, or, alternatively, from
-within an application, tell ICU the location of the new data with the function
-u_setDataDirectory() before using the new converter.
+variable `ICU_DATA` to the directory containing the data, or, alternatively,
+from within an application, tell ICU the location of the new data with the
+function `u_setDataDirectory()` before using the new converter.
If ICU is already obtaining data from files rather than a shared library,
install the new file in the same location as the existing ICU data file(s), and
don't change/set the environment variable or data directory.
If you do not want to add a converter to ICU's base data, you can also generate
-a conversion table with makeconv, use pkgdata to generate your own package and
-use the ucnv_openPackage() to open up a converter with that conversion table
+a conversion table with `makeconv`, use pkgdata to generate your own package and
+use the `ucnv_openPackage()` to open up a converter with that conversion table
from the generated package.
#### Building the new converter into ICU
@@ -555,29 +528,33 @@
the procedure for doing so:
1. Move the .ucm file(s) for the converter(s) to be added (
- solaris-eucJP-2.7.ucm for our example) into the directory
- source/data/mappings/
+ `solaris-eucJP-2.7.ucm` for our example) into the directory
+ `source/data/mappings/`
2. Create, or edit, if it already exists, the file
- source/data/mappings/ucmlocal.mk Add this line:
+ `source/data/mappings/ucmlocal.mk`. Add this line:
+
+ ```
UCM_SOURCE_LOCAL = solaris-eucJP-2.7.ucm
+ ```
+
Any number of converters can be listed. Extend the list to new lines with a
- back slash at the end of the line. The ucmlocal.mk file is described in more
- detail in source/data/mappings/ucmfiles.mk (Even though they use very
- different build systems, ucmlocal.mk is used for both the Windows and UNIX
+ back slash at the end of the line. The `ucmlocal.mk` file is described in
+ more detail in `source/data/mappings/ucmfiles.mk` (Even though they use very
+ different build systems, `ucmlocal.mk` is used for both the Windows and UNIX
builds.)
-3. Add the converter name and aliases to source/data/mappings/convrtrs.txt.
+3. Add the converter name and aliases to `source/data/mappings/convrtrs.txt`.
This will allow your converter to be shown in the list of available
- converters when you call the ucnv_getAvailableName() function. The file
+ converters when you call the `ucnv_getAvailableName(`) function. The file
syntax is described within the file.
4. Rebuild the ICU data.
For Windows, from MSVC choose the makedata project from the GUI, then build
the project.
- For UNIX, "cd icu/source/data; gmake"
+ For UNIX, `cd icu/source/data; gmake`
-When opening an ICU converter (ucnv_open()), the converter name can not be
+When opening an ICU converter (`ucnv_open()`), the converter name can not be
qualified with a path that indicates the directory or common data file
containing the corresponding converter data. The required data must be present
either in the main ICU data library or as a separate .cnv file located in the
@@ -594,19 +571,174 @@
on the web. Most of this locale data is derived from the CLDR ([Common Locale
Data Repository](http://www.unicode.org/cldr/) ) project.
-As of ICU 64, dropping the txt file into the correct place in the source tree is
-sufficient to add it to your ICU build. You will need to re-configure in order
-to pick it up.
+**As of ICU 64**, dropping the txt file into the correct place in the source
+tree is sufficient to add it to your ICU build. You will need to re-configure in
+order to pick it up.
-~~ You need to write a resource bundle file for it with a structure like the existing locale resource bundles (e.g. source/data/locales/ja.txt, ru_RU.txt, kok_IN.txt) and add it by writing a file source/data/locales/reslocal.mk just like above. In this file, define the list of additional resource bundles as ~~
-~~ GENRB_SOURCE_LOCAL=myLocale.txt other.txt ... ~~
+**As of ICU 63 or earlier**,You need to write a resource bundle file for it
+with a structure like the existing locale resource bundles (e.g.
+`source/data/locales/ja.txt, ru_RU.txt`, `kok_IN.txt`) and add it by writing a
+file `source/data/locales/reslocal.mk` just like above. In this file, define
+the list of additional resource bundles as
-~~ Starting in ICU 2.2, these added locales are automatically listed by uloc_getAvailable().~~
+```
+GENRB_SOURCE_LOCAL=myLocale.txt other.txt ...
+```
+
+Starting in ICU 2.2, these added locales are automatically listed by
+`uloc_getAvailable()`.
+
+## Customizing ICU's Data Library for ICU 63 or earlier
+The ICU data library can be easily customized, either by adding additional converters or locales, or by removing some of the standard ones for the purpose of saving space.
+
+> :point_right: ***Note**** that ICU for C by default comes with pre-built data.
+The source data files are included as an "icu\*data.zip" file starting in ICU4C
+49. Previously, they were not included unless ICU is downloaded from the
+[source repository](https://github.com/unicode-org/icu). Alternatively, the
+[Data Customizer](http://apps.icu-project.org/datacustom/) may be used to
+customize the pre-built data.*
+
+ICU can load data from individual data files as well as from its default
+library, so building a customized library when adding additional data is not
+strictly necessary. Adding to ICU's library can simplify application
+installation by eliminating the need to include separate files with an
+application distribution, and the need to tell ICU where they are installed.
+
+Reducing the size of ICU's data by eliminating unneeded resources can make
+sense on small systems with limited or no disk, but for desktop or server
+systems there is no real advantage to trimming. ICU's data is memory mapped
+into an application's address space, and only those portions of the data
+actually being used are ever paged in, so there are no significant RAM savings.
+As for disk space, with the large size of today's hard drives, saving a few MB
+is not worth the bother.
+
+By default, ICU builds with a large set of converters and with all available
+locales. This means that any extra items added must be provided by the
+application developer. There is no extra ICU-supplied data that could be
+specified.
+
+### Details
+
+The converters and resources that ICU builds are in the following configuration
+files. They are only available when building from ICU's source code repository.
+Normally, the standard ICU distribution do not include these files.
+
+File | Description
+----------------------------------|--------------
+source/data/locales/resfiles.mk | The standard set of locale data resource bundles
+source/data/locales/reslocal.mk | User-provided file with additional resource bundles
+source/data/coll/colfiles.mk | The standard set of collation data resource bundles
+source/data/coll/collocal.mk | User-provided file with additional collation resource bundles
+source/data/brkitr/brkfiles.mk | The standard set of break iterator data resource bundles
+source/data/brkitr/brklocal.mk | User-provided file with additional break iterator resource bundles
+source/data/translit/trnsfiles.mk | The standard set of transliterator resource files
+source/data/translit/trnslocal.mk | User-provided file with a set of additional transliterator resource files
+source/data/mappings/ucmcore.mk | Core set of conversion tables for MIME/Unix/Windows
+source/data/mappings/ucmfiles.mk | Additional, large set of conversion tables for a wide range of uses
+source/data/mappings/ucmebcdic.mk | Large set of EBCDIC conversion tables
+source/data/mappings/ucmlocal.mk | User-provided file with additional conversion tables
+source/data/misc/miscfiles.mk | Miscellaneous data, like timezone information
+
+These files function identically for both Windows and UNIX builds of ICU. ICU
+will automatically update the list of installed locales returned by
+`uloc_getAvailable()` whenever `resfiles.mk` or `reslocal.mk` are updated and
+the ICU data library is rebuilt. These files are only needed while building ICU.
+If any of these files are removed or renamed, the size of the ICU data library
+will be reduced.
+
+The optional files `reslocal.mk` and `ucmlocal.mk` are not included as part of
+a standard ICU distribution. Thus these customization files do not need to be
+merged or updated when updating versions of ICU.
+
+Both `reslocal.mk` and `ucmlocal.mk` are makefile includes. So the usual rules
+for makefiles apply. Lines may be continued by preceding the end of the line to
+be continued with a back slash. Lines beginning with a # are comments. See
+`ucmfiles.mk` and `resfiles.mk` for additional information.
+
+### Reducing the Size of ICU's Data: Conversion Tables
+
+The size of the ICU data file in the standard build configuration is about 8 MB.
+The majority of this is used for conversion tables. ICU comes with so many
+conversion tables because many ICU users need to support many encodings from
+many platforms. There are conversion tables for EBCDIC and DOS codepages, for
+ISO 2022 variants, and for small variations of popular encodings.
+
+> :point_right: **Important:** ICU provides full internationalization
+functionality without **any** conversion table data. The common library
+contains code to handle several important encodings algorithmically: US-ASCII,
+ISO-8859-1, UTF-7/8/16/32, SCSU, BOCU-1, CESU-8, and IMAP-mailbox-name (i.e.,
+US-ASCII, ISO-8859-1, and all Unicode charsets; see
+source/data/mappings/convrtrs.txt for the current list).
+
+Therefore, the easiest way to reduce the size of ICU's data by a lot (without
+limitation of I18N support) is to reduce the number of conversion tables that
+are built into the data file.
+
+The conversion tables are listed for the build process in several makefiles
+`source/data/mappings/ucm\*.mk`, roughly grouped by how commonly they are used.
+If you remove or rename any of these files, then the ICU build will exclude the
+conversion tables that are listed in that file. Beginning with ICU 2.0, all of
+these makefiles including the main one are optional. If you remove all of them,
+then ICU will include only very few conversion tables for "fallback" encodings
+(see note below).
+
+If you remove or rename all `ucm\*.mk` files, then ICU's data is reduced to
+about 3.6 MB. If you remove all these files except for `ucmcore.mk`, then ICU's
+data is reduced to about 4.7 MB, while keeping support for a core set of common
+MIME/Unix/Windows encodings.
+
+> :point_right: *If you remove the conversion table for an encoding that could be
+a default encoding on one of your platforms, then ICU will not be able to
+instantiate a default converter. In this case, ICU 2.0 and up will automatically
+fall back to a "lowest common denominator" and load a converter for US-ASCII
+(or, on EBCDIC platforms, for codepages 37 or 1047). This will be good enough
+for converting strings that contain only "ASCII" characters (see the comment
+about "invariant characters" in `utypes.h`).*
+*When ICU is built with a reduced set of conversion tables, then some tests will
+fail that test the behavior of the converters based on known features of some
+encodings. Also, building the testdata will fail if you remove some conversion
+tables that are necessary for that (to test non-ASCII/Unicode resource bundle
+source files, for example). You can ignore these failures. Build with the
+standard set of conversion tables, if you want to run the tests.*
+
+### Reducing the Size of ICU's Data: Locale Data
+
+If you need to reduce the size of ICU's data even further, then you need to
+remove other files or parts of files from the build as well.
+
+There are a number of different subdirectories of 'data' containing locale data
+split out by section. Each subdirectory has its own **.mk** file listing the
+locales which will be built. Subdirectories include **lang** for language names
+and **curr** for currency names.
+
+You can remove data for entire locales by removing their files from
+`source/data/locales/resfiles.mk` or the appropriate other .mk file. ICU will
+then use the data of the parent locale instead, which is root.txt. If you
+remove all resource bundles for a given language and its country/region/variant
+sublocales, **do not remove root.txt!** Also, do not remove a parent locale if
+child locales exist. For example, do not remove "en" while retaining "en_US".
+
+### Reducing the Size of ICU's Data: Collation Data
+
+Collation data (for sorting, searching and alphabetic indexes) is also large,
+especially the collation data for East Asian languages because they define
+multiple orderings of tens of thousands of Han characters. You can remove the
+collation data for those languages by removing references to those locales from
+`source/data/coll/colfiles.mk` files. When you do that, the collation for those
+languages will fall back to the root collator, that is, you lose
+language-specific behavior.
+
+A much less radical approach is to keep the collation data tables but remove the
+tailoring rule strings from which they were built. Those rule strings are
+rarely used at runtime. For documentation about their use and how to remove
+them see the section "Building on Existing Locales" in the
+[Collation Customization chapter](collation/customization/index.md).
+
## ICU Data File Formats
ICU uses several kinds of data files with specific source (plain text) and
-binary data formats. The following table provides links to descriptions of those
+binary data formats. The following lists provides links to descriptions of those
formats.
Each ICU data object begins with a header before the actual, specific data. The
@@ -618,188 +750,192 @@
(This is not the case for the trie structures, which are not stand-alone,
loadable data objects.)
-Files Source format Binary format Generator tool **Public Data Files** ICU .dat
-package files (list of files provided as input to the icupkg tool, or on the
-gencmn tool command line) .dat:
-[source/tools/toolutil/pkg_gencmn.c](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/toolutil/pkg_gencmn.c)
-[icupkg](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/icupkg/)
-or
-[gencmn](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gencmn/)
-Resource bundles .txt:
-[icuhtml/design/bnf_rb.txt](http://source.icu-project.org/repos/icu/icuhtml/trunk/design/bnf_rb.txt)
-.res:
-[source/common/uresdata.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/uresdata.h)
-[genrb](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/genrb/)
-Unicode conversion mapping tables .ucm: [Conversion Data
-chapter](conversion/data.md) .cnv:
-[source/common/ucnvmbcs.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/ucnvmbcs.h)
-[makeconv](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/makeconv/)
-Conversion (charset) aliases
-[source/data/mappings/convrtrs.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt)
-: contains format description
-The command "uconv -l --canon" will also generate the alias table from the
-currently used copy of ICU. cnvalias.icu:
-[source/common/ucnv_io.c](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/ucnv_io.c)
-[gencnval](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gencnval/)
-Unicode Character Data
-(Properties; for Java only: hardcoded in C common library)
-[source/data/unidata/ppucd.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/ppucd.txt)
-: [Preparsed UCD](http://site.icu-project.org/design/props/ppucd) uprops.icu:
-[tools/unicode/c/genprops/corepropsbuilder.cpp](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops/corepropsbuilder.cpp)
-[genprops](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops)
-Unicode Character Data
-(Case mappings; for Java only: hardcoded in C common library)
-[source/data/unidata/\*.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/)
-: [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
-ucase.icu:
-[tools/unicode/c/genprops/casepropsbuilder.cpp](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops/casepropsbuilder.cpp)
-[genprops](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops)
-Unicode Character Data
-(BiDi, and Arabic shaping; for Java only: hardcoded in C common library)
-[source/data/unidata/\*.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/)
-: [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
-ubidi.icu:
-[tools/unicode/c/genprops/bidipropsbuilder.cpp](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops/bidipropsbuilder.cpp)
-[genprops](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops)
-Unicode Character Data
-(Normalization since ICU 4.4) & custom normalization data
-[source/data/unidata/norm2/\*.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/norm2/)
-: Files derived from the [Unicode Character
-Database](http://www.unicode.org/onlinedat/online.html), or custom data .nrm:
-[source/common/normalizer2impl.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/normalizer2impl.h)
-[gennorm2](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gennorm2/)
-Unicode Character Data
-(Character names)
-[source/data/unidata/UnicodeData.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/UnicodeData.txt)
-: [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
-unames.icu:
-[tools/unicode/c/genprops/namespropsbuilder.cpp](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops/namespropsbuilder.cpp)
-[genprops](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops)
-Unicode Character Data
-(Property \[value\] aliases since ICU 4.8; for Java only: hardcoded in C common
-library since ICU 4.8) [UCD
-Property\*Aliases.txt](http://www.unicode.org/Public/UNIDATA/) : [Unicode
-Character Database](http://www.unicode.org/onlinedat/online.html) pnames.icu:
-[source/common/propname.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/propname.h)
-[genprops](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops)
-Unicode Character Data
-(Text layout properties since ICU 64)
-[source/data/unidata/ppucd.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/ppucd.txt)
-: [Preparsed UCD](http://site.icu-project.org/design/props/ppucd) ulayout.icu:
-[tools/unicode/c/genprops/layoutpropsbuilder.cpp](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops/layoutpropsbuilder.cpp)
-[genprops](https://github.com/unicode-org/icu/blob/master/tools/unicode/c/genprops)
-Collation data
-(root collation & tailorings; ICU 53 & later) Original data from
-allkeys_CLDR.txt in [CLDR Root Collation Data
-Files](http://www.unicode.org/reports/tr35/tr35-collation.html#Root_Data_Files)
-processed into
-[source/data/unidata/FractionalUCA.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/FractionalUCA.txt)
-by [tool at unicode.org maintained by Mark
-Davis](https://sites.google.com/site/unicodetools/#TOC-UCA) (call the Main class
-with option writeFractionalUCA);
-source tailorings (text rules) in
-[source/data/coll/\*.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/coll/)
-resource bundles: [Collation Customization
-chapter](collation/customization/index.md) ucadata.icu & binary tailorings in
-resource bundles:
-[source/i18n/collationdatareader.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/i18n/collationdatareader.h)
-[genuca](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/genuca/),
-[genrb](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/genrb/)
-Rule-based break iterator data .txt: [Boundary Analysis
-chapter](http://icu-project.org/userguide/boundaryAnalysis.html) .brk:
-[source/common/rbbidata.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/rbbidata.h)
-[genbrk](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/genbrk/)
-Dictionary-based break iterator data (ICU 50 & later) .txt: [gendict.cpp
-comments](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gendict/gendict.cpp)
-.dict: see
-[source/common/dictionarydata.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/dictionarydata.h)
-[gendict](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gendict/)
-Rule-based transform (transliterator) data .txt (in resource bundles):
-[Transform Rule Tutorial chapter](transforms/general/rules.md) Uses genrb to
-make binary format Does not apply Time zone data (ICU 4.4 & later)
-[source/data/misc/zoneinfo64.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/misc/zoneinfo64.txt)
-: [ftp://elsie.nci.nih.gov/pub/](ftp://elsie.nci.nih.gov/pub/)
-tzdata<year><rev>.tar.gz zoneinfo64.res (generated by genrb and [tzcode
-tools](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/tzcode/readme.txt))
-Does not apply
-StringPrep profile data
-[source/data/misc/NamePrepProfile.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/misc/NamePrepProfile.txt)
-.spp:
-[source/tools/gensprep/store.c](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gensprep/store.c)
-[gensprep](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gensprep/)
-Confusables data
-[source/data/unidata/confusables.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/confusables.txt),
-[source/data/unidata/confusablesWholeScript.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/confusablesWholeScript.txt)
-confusables.cfu:
-[source/i18n/uspoof_impl.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/i18n/uspoof_impl.h)
-[gencfu](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gencfu/)
-**Public Data Files (old versions)** Unicode Character Data
-(Normalization before ICU 4.4; for Java only: was hardcoded in C common library)
-[source/data/unidata/\*.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unidata/)
-: [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
-unorm.icu:
-[source/common/unormimp.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unormimp.h)
-gennorm Unicode Character Data
-(Property \[value\] aliases before ICU 4.8)
-source/data/unidata/Property\*Aliases.txt : [Unicode Character
-Database](http://www.unicode.org/onlinedat/online.html) pnames.icu:
-[source/common/propname.h (ICU
-4.6)](http://bugs.icu-project.org/trac/browser/icu/tags/release-4-6/source/common/propname.h)
-genpname Collation data
-(UCA, code points to weights; ICU 52 & earlier) Same as in ICU 53 ucadata.icu &
-binary tailorings in resource bundles: [source/i18n/ucol_imp.h (ICU
-52)](http://bugs.icu-project.org/trac/browser/icu/tags/release-52-1/source/i18n/ucol_imp.h)
-[genuca](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/genuca/),
-[genrb](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/genrb/)
-Collation data
-(Inverse UCA, weights->code points; ICU 52 & earlier) Processed from
-FractionalUCA.txt like ICU 52 ucadata.icu invuca.icu: [source/i18n/ucol_imp.h
-(ICU
-52)](http://bugs.icu-project.org/trac/browser/icu/tags/release-52-1/source/i18n/ucol_imp.h)
-[genuca](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/genuca/)
-Dictionary-based break iterator data (ICU 49 & earlier) .txt: [genctd.cpp
-comments](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/genctd/genctd.cpp?rev=26466)
-.ctd: see CompactTrieHeader in
-[source/common/triedict.cpp](http://bugs.icu-project.org/trac/log/trunk/icu4c/source/common/triedict.cpp?rev=32183)
-[genctd](http://bugs.icu-project.org/trac/log/trunk/icu4c/source/tools/genctd?rev=32183)
-Time zone data (Before ICU 4.4)
-[source/data/misc/zoneinfo.txt](http://bugs.icu-project.org/trac/browser/icu/branches/maint/maint-4-2/source/data/misc/zoneinfo.txt)
-: [ftp://elsie.nci.nih.gov/pub/](ftp://elsie.nci.nih.gov/pub/)
-tzdata<year><rev>.tar.gz
-zoneinfo.res (generated by genrb and [tzcode
-tools](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/tzcode/readme.txt))
-Does not apply **Non-File API Binary Data** Converter selector data
-none
-[source/common/ucnvsel.cpp](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/ucnvsel.cpp)
-[ucnvsel_open()](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/ucnvsel.h)
-**Test-Only Data Files** test.icu (for udata API testing)
-none (fixed output from gentest when not using -r or -j options)
-test.icu: see createData() in
-[source/tools/gentest/gentest.c](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gentest/gentest.c)
-[gentest](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/gentest/)
-**Other Data Structures** UCPTrie (C)/CodePointTrie (Java) (maps code points to
-integers)
-(public builder API)
-[ICU Code Point Tries design
-doc](http://site.icu-project.org/design/struct/utrie),
-[icu4c/source/common/ucptrie_impl.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/ucptrie_impl.h)
-(builder class) UTrie2 (C)/Trie2 (Java) (maps code points to integers)
-(internal builder API)
-[ICU Code Point Tries design
-doc](http://site.icu-project.org/design/struct/utrie),
-[icu4c/source/common/utrie2_impl.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/utrie2_impl.h)
-(builder class) BytesTrie (maps byte sequences to 32-bit integers)
-(public builder API)
-[BytesTrie design
-doc](http://site.icu-project.org/design/struct/tries/bytestrie),
-[icu4c/source/common/bytestrie.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/bytestrie.h)
-(builder class) UCharsTrie (C++)/CharsTrie (Java) (maps 16-bit-Unicode strings
-to 32-bit integers)
-(public builder API)
-[UCharsTrie design
-doc](http://site.icu-project.org/design/struct/tries/ucharstrie),
-[icu4c/source/tools/toolutil/ucharstrie.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/tools/toolutil/ucharstrie.h)
-(builder class)
+### Public Data Files
+
+#### ICU.dat package files
+* Source format: (list of files provided as input to the icupkg tool, or
+ on the gencmn tool command line)
+* Binary format: .dat: [source/tools/toolutil/pkg_gencmn.c](../../icu4c/source/tools/toolutil/pkg_gencmn.cpp)
+* Generator tool: [icupkg](../../icu4c/source/tools/icupkg) or
+ [gencmn](../../icu4c/source/tools/gencmn)
+
+#### Resource bundles
+* Source format: .txt: [icuhtml/design/bnf_rb.txt](https://github.com/unicode-org/icu-docs/blob/master/design/bnf_rb.txt)
+* Binary format: .res: [source/common/uresdata.h](../../icu4c/source/common/uresdata.h)
+* Generator tool: [genrb](../../icu4c/source/tools/genrb)
+
+#### Unicode conversion mapping tables
+* Source format: .ucm: [Conversion Data chapter](conversion/data.md)
+* Binary format: .cnv: [source/common/ucnvmbcs.h](../../icu4c/source/common/ucnvmbcs.h)
+* Generator tool: [makeconv](../../icu4c/source/tools/makeconv)
+
+#### Conversion (charset) aliases
+* Source format: [source/data/mappings/convrtrs.txt](../../icu4c/source/data/mappings/convrtrs.txt):
+ contains format description. The command "uconv -l --canon"
+ will also generate the alias table from the currently used
+ copy of ICU.
+* Binary format: cnvalias.icu: [source/common/ucnv_io.cpp](../../icu4c/source/common/ucnv_io.cpp)
+* Generator tool: [gencnval](../../icu4c/source/tools/gencnval)
+
+#### Unicode Character Data (Properties; for Java only: hardcoded in C common library)
+* Source format: [source/data/unidata/ppucd.txt](../../icu4c/source/data/unidata/ppucd.txt):
+ [Preparsed UCD](http://site.icu-project.org/design/props/ppucd)
+* Binary format: uprops.icu: [tools/unicode/c/genprops/corepropsbuilder.cpp](../../tools/unicode/c/genprops/corepropsbuilder.cpp)
+* Generator tool: [genprops](../../tools/unicode/c/genprops)
+
+#### Unicode Character Data (Case mappings; for Java only: hardcoded in C common library)
+* Source format: [source/data/unidata/*.txt](../../icu4c/source/data/unidata):
+ [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
+* Binary format: ucase.icu: [tools/unicode/c/genprops/casepropsbuilder.cpp](../../tools/unicode/c/genprops/casepropsbuilder.cpp)
+* Generator tool: [genprops](../../tools/unicode/c/genprops)
+
+#### Unicode Character Data (BiDi, and Arabic shaping; for Java only: hardcoded in C common library)
+* Source format: [source/data/unidata/*.txt](../../icu4c/source/data/unidata):
+ [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
+* Binary format: ubidi.icu: [tools/unicode/c/genprops/bidipropsbuilder.cpp](../../tools/unicode/c/genprops/bidipropsbuilder.cpp)
+* Generator tool: [genprops](../../tools/unicode/c/genprops)
+
+#### Unicode Character Data (Normalization since ICU 4.4) & custom normalization data
+* Source format: [source/data/unidata/norm2/*.tx](../../icu4c/source/data/unidata/norm2):
+ Files derived from the [Unicode Character Database](http://www.unicode.org/onlinedat/online.html),
+ or custom data.
+* Binary format: .nrm: [source/common/normalizer2impl.h](../../icu4c/source/common/normalizer2impl.h)
+* Generator tool: [gennorm2](../../icu4c/source/tools/gennorm2)
+
+#### Unicode Character Data (Character names)
+* Source format: [source/data/unidata/UnicodeData.txt](../../icu4c/source/data/unidata/UnicodeData.txt):
+ [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
+* Binary format: unames.icu: [tools/unicode/c/genprops/namespropsbuilder.cpp](../../tools/unicode/c/genprops/namespropsbuilder.cpp)
+* Generator tool: [genprops](../../tools/unicode/c/genprops)
+
+#### Unicode Character Data (Property [value] aliases since ICU 4.8; for Java only: hardcoded in C common library since ICU 4.8)
+* Source format: [UCD Property*Aliases.txt](http://www.unicode.org/Public/UNIDATA/):
+ [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
+* Binary format: pnames.icu: [source/common/propname.h](../../icu4c/source/common/propname.h)
+* Generator tool: [genprops](../../tools/unicode/c/genprops)
+
+#### Unicode Character Data (Text layout properties since ICU 64)
+* Source format: [source/data/unidata/ppucd.txt](../../icu4c/source/data/unidata/ppucd.txt):
+ [Preparsed UCD](http://site.icu-project.org/design/props/ppucd)
+* Binary format: ulayout.icu: [tools/unicode/c/genprops/layoutpropsbuilder.cpp](../../tools/unicode/c/genprops/layoutpropsbuilder.cpp)
+* Generator tool: [genprops](../../tools/unicode/c/genprops)
+
+#### Collation data (root collation & tailorings; ICU 53 & later)
+* Source format: Original data from allkeys_CLDR.txt in [CLDR Root Collation Data Files](http://www.unicode.org/reports/tr35/tr35-collation.html#Root_Data_Files)
+ processed into [source/data/unidata/FractionalUCA.txt](../../icu4c/source/data/unidata/FractionalUCA.txt) by
+ [tool at unicode.org maintained by Mark Davis](https://sites.google.com/site/unicodetools/#TOC-UCA)
+ (call the Main class with option writeFractionalUCA);
+ source tailorings (text rules) in [source/data/coll/*.txt](../../icu4c/source/data/coll) resource bundles:
+ [Collation Customization chapter](collation/customization/index.md).
+* Binary format: ucadata.icu & binary tailorings in resource bundles:
+ [source/i18n/collationdatareader.h](../../icu4c/source/i18n/collationdatareader.h)
+* Generator tool: [genuca](../../tools/unicode/c/genuca), [genrb](../../icu4c/source/tools/genrb)
+
+#### Rule-based break iterator data
+* Source format: .txt: [Boundary Analysis chapter](boundaryanalysis/index.md)
+* Binary format: .brk: [source/common/rbbidata.h](../../icu4c/source/common/rbbidata.h)
+* Generator tool: [genbrk](../../icu4c/source/tools/genbrk)
+
+#### Dictionary-based break iterator data (ICU 50 & later)
+* Source format: txt: [gendict.cpp comments](../../icu4c/source/tools/gendict/gendict.cpp)
+* Binary format: .dict: see [source/common/dictionarydata.h](../../icu4c/source/common/dictionarydata.h
+* Generator tool: [gendict](../../icu4c/source/tools/gendict)
+
+#### Rule-based transform (transliterator) data
+* Source format: .txt (in resource bundles): [Transform Rule Tutorial chapter](transforms/general/rules.md)
+* Binary format: Uses genrb to make binary format
+* Generator tool: Does not apply
+
+#### Time zone data (ICU 4.4 & later)
+* Source format: [source/data/misc/zoneinfo64.txt](../../icu4c/source/data/misc/zoneinfo64.txt):
+ ftp://elsie.nci.nih.gov/pub/ tzdata<year><rev>.tar.gz
+* Binary format: zoneinfo64.res (generated by genrb and [tzcode tools](../../icu4c/source/tools/tzcode/readme.txt)).
+* Generator tool: Does not apply
+
+#### StringPrep profile data
+* Source format: [source/data/sprep/rfc3491.txt](../../icu4c/source/data/sprep/rfc3491.txt):
+* Binary format: .spp: [source/tools/gensprep/store.c](../../icu4c/source/tools/gensprep/store.c)
+* Generator tool: [gensprep](../../icu4c/source/tools/gensprep)
+
+#### Confusables data
+* Source format: [source/data/unidata/confusables.txt](../../icu4c/source/data/unidata/confusables.txt),
+ [source/data/unidata/confusablesWholeScript.txt](../../icu4c/source/data/unidata/confusablesWholeScript.txt)
+* Binary format: .spp: [confusables.cfu: source/i18n/uspoof_impl.h](../../icu4c/source/i18n/uspoof_impl.h)
+* Generator tool: [gencfu](../../icu4c/source/tools/gencfu)
+
+### Public Data Files (old versions)
+
+#### Unicode Character Data (Normalization before ICU 4.4; for Java only: was hardcoded in C common library)
+* Source format: [source/data/unidata/*.txt]((../../icu4c/source/data/unidata):
+ [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
+* Binary format: unorm.icu: [source/common/unormimp.h](../../icu4c/source/common/unormimp.h)
+* Generator tool: gennorm
+
+#### Unicode Character Data (Property [value] aliases before ICU 4.8)
+* Source format: source/data/unidata/Property*Aliases.txt: [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
+* Binary format: pnames.icu: source/common/propname.h (ICU 4.6)
+* Generator tool: genpname
+
+#### Collation data (UCA, code points to weights; ICU 52 & earlier)
+* Source format: Same as in ICU 53
+* Binary format: ucadata.icu & binary tailorings in resource bundles: source/i18n/ucol_imp.h (ICU 52)
+* Generator tool: [genuca](../../tools/unicode/c/genuca), [genrb](../../icu4c/source/tools/genrb)
+
+#### Collation data (Inverse UCA, weights->code points; ICU 52 & earlier)
+* Source format: Processed from FractionalUCA.txt like ICU 52 ucadata.icu
+* Binary format: invuca.icu: source/i18n/ucol_imp.h (ICU 52)
+* Generator tool: [genuca](../../tools/unicode/c/genuca)
+
+#### Dictionary-based break iterator data (ICU 49 & earlier)
+* Source format: .txt: genctd.cpp comments
+* Binary format: ctd: see CompactTrieHeader in source/common/triedict.cpp
+* Generator tool: genctd
+
+#### Time zone data (Before ICU 4.4)
+* Source format: .source/data/misc/zoneinfo.txt (ICU 4.2): ftp://elsie.nci.nih.gov/pub/ tzdata<year><rev>.tar.gz
+* Binary format: zoneinfo64.res (generated by genrb and [tzcode tools](../../icu4c/source/tools/tzcode/readme.txt)).
+* Generator tool: Does not apply
+
+### Non-File API Binary Data
+
+#### Converter selector data
+* Source format: none
+* Binary format: [source/common/ucnvsel.cpp](../../icu4c/source/common/ucnvsel.cpp)
+* Generator tool: [ucnvsel_open()](../../icu4c/source/common/ucnvsel.cpp)
+
+### Test-Only Data Files
+
+#### test.icu (for udata API testing)
+* Source format: none (fixed output from gentest when not using -r or -j options)
+* Binary format: test.icu: see `createData()`
+ in [source/tools/gentest/gentest.c](../../icu4c/source/tools/gentest/gentest.c)
+* Generator tool: [gentest](../../icu4c/source/tools/gentest/gentest.c)
+
+### Other Data Structures
+
+#### UCPTrie (C)/CodePointTrie (Java) (maps code points to integers)
+* Source format: (public builder API)
+* Binary format: [ICU Code Point Tries design doc](http://site.icu-project.org/design/struct/utrie),
+ [icu4c/source/common/ucptrie_impl.h](../../icu4c/source/common/ucptrie_impl.h)
+* Generator tool: (builder class)
+
+#### UTrie2 (C)/Trie2 (Java) (maps code points to integers)
+* Source format: (internal builder API)
+* Binary format: [ICU Code Point Tries design doc](http://site.icu-project.org/design/struct/utrie),
+ [icu4c/source/common/utrie2_impl.h](../../icu4c/source/common/utrie2_impl.h)
+* Generator tool: (builder class)
+
+#### BytesTrie (maps byte sequences to 32-bit integers)
+* Source format: (public builder API)
+* Binary format: [BytesTrie design doc](http://site.icu-project.org/design/struct/tries/bytestrie),
+ [icu4c/source/common/unicode/bytestrie.h](../../icu4c/source/common/unicode/bytestrie.h)
+* Generator tool: (builder class)
+
+#### UCharsTrie (C++)/CharsTrie (Java) (maps 16-bit-Unicode strings to 32-bit integers)
+* Source format: (public builder API)
+* Binary format: [UCharsTrie design doc](http://site.icu-project.org/design/struct/tries/ucharstrie),
+ [icu4c/source/common/unicode/ucharstrie.h](../../icu4c/source/common/unicode/ucharstrie.h)
+* Generator tool: (builder class)
## ICU4J Resource Information
@@ -810,25 +946,26 @@
same binary files that ICU4C uses.
By default the ICU4J distribution includes all of the standard resource
-information. It is located under the directory com/ibm/icu/impl/data. Depending
-on the service, the data is in different locations and in different formats.
-Note: This will continue to change from release to release, so clients should
-not depend on the exact organization of the data in ICU4J.
+information. It is located under the directory `com/ibm/icu/impl/data`.
+Depending on the service, the data is in different locations and in different
+formats. Note: This will continue to change from release to release, so clients
+should not depend on the exact organization of the data in ICU4J.
1. The primary **locale data** is under the directory icudt38b, as a set of
".res" files whose names are the locale identifiers. Locale naming is
- documented the com.ibm.icu.util.ULocale class, and the use of these names in
- searching for resources is documented in com.ibm.icu.util.UResourceBundle.
+ documented in the `com.ibm.icu.util.ULocale` class, and the use of these
+ names in searching for resources is documented in
+ `com.ibm.icu.util.UResourceBundle`.
-2. The **collation data** is under the directory icudt38b/coll, as a set of
+2. The **collation data** is under the directory `icudt38b/coll`, as a set of
".res" files.
3. The **rule-based transliterator data** is under the directory
- icudt38b/translit as a set of ".res" files. (**Note:** the Han
+ `icudt38b/translit` as a set of ".res" files. (**Note:** the Han
transliterator test data is no longer included in the core icu4j.jar file by
default.)
-4. The **rule-based number format data** is under the directory icudt38b/rbnf
+4. The **rule-based number format data** is under the directory `icudt38b/rbnf`
as a set of ".res" files.
5. The **break iterator data** is directly under the data directory, as a set
@@ -842,11 +979,12 @@
and default **unicode collation algorithm (UCA) data** is found under the
data directory as a set of ".icu" files.
-8. The **character set converter data** is under the directory icudt38b, as a
- set of ".cnv" files. These files are currently included only in
+8. The **character set converter data** is under the directory `icudt38b/`, as
+ a set of ".cnv" files. These files are currently included only in
icu-charset.jar.
-9. The **time zone data** is named zoneinfo.res under the directory icudt38b.
+9. The **time zone data** is named `zoneinfo.res` under the directory
+ `icudt38b`.
Some of the data files alias or otherwise reference data from other data files.
One reason for this is because some locale names have changed. For example,
@@ -855,18 +993,19 @@
a file may alias a portion of another file's data in order to save space.
Currently ICU4J provides no tool for revealing these dependencies.
-*Java's Locale class silently converts the language code "he" to "iw" when you
-construct the Locale (for versions of Java through Java 5). Thus Java cannot be
-used to locate resources that use the "he" language code. ICU, on the other
-hand, does not perform this conversion in ULocale, and instead uses aliasing in
-the locale data to represent the same set of data under different locale ids.*
+> :point_right: *Java's Locale class silently converts the language code "he" to
+"iw" when you construct the Locale (for versions of Java through Java 5). Thus
+Java cannot be used to locate resources that use the "he" language code. ICU,
+on the other hand, does not perform this conversion in ULocale, and instead uses
+aliasing in the locale data to represent the same set of data under different
+locale ids.*
Resource files that use locale ids form a hierarchy, with up to four levels: a
root, language, region (country), and variant. Searches for locale data attempt
to match as far down the hierarchy as possible, for example, "he_IL" will match
he_IL, but "he_US" will match he (since there is no US variant for he, and
"xx_YY will match root (the default fallback locale) since there is no xx
-language code in the locale hierarchy. Again, see java.util.ResourceBundle for
+language code in the locale hierarchy. Again, see `java.util.ResourceBundle` for
more information.
Currently ICU4J provides no tool for revealing these dependencies between data
@@ -879,14 +1018,14 @@
Unfortunately, the jar tool in the JDK provides no way to remove items from a
jar file. Thus you have to extract the resources, remove the ones you don't
-want, and then create a new jar file with the remining resources. See the jar
+want, and then create a new jar file with the remaining resources. See the jar
tool information for how to do this. Before 'rejaring' the files, be sure to
thoroughly test your application with the remaining resources, making sure each
required resource is present.
#### Using additional resource files with ICU4J
-*Resource file formats can change across releases of ICU4J!*
+> :point_right: *Resource file formats can change across releases of ICU4J!*
*The format of ICU4J resources is not part of the API. Clients who develop their
own resources for use with ICU4J should be prepared to regenerate them when they
move to new releases of ICU4J.*
diff --git a/docs/userguide/icufaq/icu4j-faq.md b/docs/userguide/icufaq/icu4j-faq.md
index 75f6d99..9c54885 100644
--- a/docs/userguide/icufaq/icu4j-faq.md
+++ b/docs/userguide/icufaq/icu4j-faq.md
@@ -1,18 +1,23 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
# ICU4J FAQ
This page contains frequently asked questions about the content provided with
the International Components for Unicode for Java as well as basics on
internationalization. It is organized into the following sections:
-### ****Common Questions****
+### Common Questions
-#### **What version of Java is required for ICU4J?**
+#### What version of Java is required for ICU4J?
ICU4J 4.4 or later versions utilize Java 5 language features and only run on JRE
5 or later. The ICU4J Locale SPI module depends on JDK 6 Locale Service Provider
framework, therefore, it requires JRE 6 or later.
-#### **Comparison between ICU and JDK: What's the difference?**
+#### Comparison between ICU and JDK: What's the difference?
This is one of our most popular question. Please refer to [our comparison
chart](http://icu-project.org/charts/comparison/).
@@ -20,77 +25,71 @@
#### How can I get the version information of ICU4J library on my system?
You can get the ICU4J version information by public API class
-[com.ibm.icu.util.VersionInfo](http://icu-project.org/apiref/icu4j/com/ibm/icu/util/VersionInfo.html).
+[com.ibm.icu.util.VersionInfo](https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/util/VersionInfo.html).
The static field
-[VersionInfo.ICU_VERSION](http://icu-project.org/apiref/icu4j/com/ibm/icu/util/VersionInfo.html#ICU_VERSION)
+[VersionInfo.ICU_VERSION](https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/util/VersionInfo.html#ICU_VERSION)
contains the current ICU4J library version information.
Since ICU4J 4.6, ICU4J jar file includes Main-Class that prints out the ICU
-version information like below -
+version information like below:
+
+```
$ java -jar icu4j.jar
International Component for Unicode for Java 4.8
+
Implementation Version: 4.8
Unicode Data Version: 6.0
CLDR Data Version: 2.0
Time Zone Data Version: 2011g
+```
-#### I'm using ICU4J 3.X, but planning to upgrade ICU4J version to 4.X soon. What should I do for the migration?
+#### I'm using ICU4J X, but planning to upgrade ICU4J version to X+1 soon. What should I do for the migration?
-The first two digits of ICU version number represent a reference release
-version. Also all reference releases have even number at the second digit, such
-as 3.8, 4.4 and 4.8 (odd number at the second digit is reserved for development
-milestones, such as 4.7). The first digit alone has no special meanings,
-therefore, upgrade from 3.8 to 4.0 is nothing much different from upgrading from
-4.0 to 4.2. (See the user guide section [Version Numbers in
-ICU](http://userguide.icu-project.org/design#TOC-Version-Numbers-in-ICU) for the
-details)
+See the user guide section
+[Version Numbers in ICU](../design.md#version-numbers-in-icu)
+for the details about the meaning of the version number parts and how the ICU
+version number changes.
+
In general, two different reference releases are not binary compatible (i.e.
drop-in jar file replacement would not work). To use a new reference version of
ICU4J, you should rebuild your application with the new ICU4J library. ICU
-project has the [API compatibility
-policy](http://userguide.icu-project.org/design#TOC-ICU-API-compatibility). As
+project has the
+[API compatibility policy](../design.md#icu-api-compatibility)
long as you're using ICU APIs marked as @stable in the API reference
documentation, your application should successfully compile with the new
reference version of ICU4J library without any source code modifications. (Note:
ICU project team may retract APIs previously marked as @stable by well-defined
-process. But this is very rare case.) However, you might still need to review
+process. But this is a very rare case.) However, you might still need to review
the usage of ICU4J APIs especially when your application set a certain
assumption on the behavior of APIs driven by Unicode or locale data. For
-example, a date format pattern used for locale X might not be exactly same with
-the pattern in a new version.
+example, a date format pattern used for locale X might not be exactly the same
+with the pattern in a new version.
#### How can I see all API changes between two different ICU versions?
-For every ICU4J release, we publish APIChangeReport.html which captures all API
-changes since previous reference release. For example, ICU4J 4.8 release
-includes the [API change
-report](http://source.icu-project.org/repos/icu/icu4j/tags/release-4-8/APIChangeReport.html)
-against ICU4J 4.6. However, someone may want to see the changes between the
+For every ICU4J release, we publish
+[APIChangeReport.html](https://github.com/unicode-org/icu/blob/master/icu4j/APIChangeReport.html)
+which captures all API changes since previous reference release.
+However, someone may want to see the changes between the
current release and much older ICU4J version. For example, you're currently
-using ICU4J 4.4 and considering to upgrade to ICU4J 4.8. In this case, you can
+using ICU4J 60 and considering to upgrade to ICU4J 64. In this case, you can
generate a change report page by following steps.
-1. Download [ICU4J 4.8 source package
- archive](http://download.icu-project.org/files/icu4j/4.8/icu4j-4_8.tgz) from
- ICU 4.8 download page and extract files to your local system.
- (Alternatively, you can [export ICU4J 4.8 from the ICU SVN
- repository](http://source.icu-project.org/repos/icu/icu4j/tags/release-4-8/).
- For ICU SVN repository access, please refer [this
- page](http://site.icu-project.org/repository))
+1. Download [ICU4J 64 source package
+ archive](http://site.icu-project.org/download/64#TOC-ICU4J-Download)
+ from the ICU 64 download page and extract files to your local system.
2. Set up ICU4J build environment as explained in
- [readme.html](http://source.icu-project.org/repos/icu/icu4j/tags/release-4-8/readme.html#HowToInstallJavac)
- included in the root directory of the source package archive.
+ [readme.html](https://github.com/unicode-org/icu/blob/release-64-2/icu4j/readme.html)
+ included in the root directory of the ICU4J source package archive.
3. Edit
- [build.properties](http://source.icu-project.org/repos/icu/icu4j/tags/release-4-8/build.properties)
+ [build.properties](https://github.com/unicode-org/icu/blob/release-64-2/icu4j/build.properties)
in the root directory and change the property value api.report.prev.version
- from 46 to 44.
-4. Invoke ant target "apireport" (Note: If you compare ICU4J 4.4 or newer
- version against ICU4J 4.2 or older version, use target "apireportOld"
- instead)
-5. The output is generated at out/icu4j_compare_44_48.html.
+ from 63 to 60.
+4. Invoke ant target "apireport".
+5. The output is generated at out/icu4j_compare_60_64.html.
-### **International Calendars**
+### International Calendars
-#### **Why do I need these classes?**
+#### Why do I need these classes?
If your application displays or manipulates dates and times, and if you want
your application to run in countries outside of North America and western
@@ -99,7 +98,7 @@
conforming to the standard Java Calendar API, allowing you to code your
application once and have it work with any international calendar.
-#### **Which Japanese calendar do you support?**
+#### Which Japanese calendar do you support?
Currently, our JapaneseCalendar is almost identical to the Gregorian calendar,
except that it follows the traditional conventions for year and era names. In
@@ -117,7 +116,7 @@
calendar, please let us know by posting a note on the [mailing
list](http://icu-project.org/contacts.html).
-#### **Do you *really* support the true lunar Islamic calendar?**
+#### Do you *really* support the true lunar Islamic calendar?
The Islamic calendar is strictly lunar, and a month begins at the moment when
the crescent of the new moon is visible above the horizon at sunset. It is
@@ -142,21 +141,21 @@
#### Does ICU4J have its own time zone rule data?
Yes. ICU4J library contains time zone rule data generated from the [tz
-database](http://www.twinsun.com/tz/tz-link.htm).
+database](https://www.iana.org/time-zones).
#### Why does ICU4J carry the time zone rule data while my JRE also has the data?
There are several reasons. Bundling our own time zone data allow us to provide
quick updates to users. ICU project team usually release the latest time zone
rule data patch as soon as the new tz database release is published (usually
-same day or next day). Having own rule data also allow ICU4J library to provide
-some advanced TimeZone features (see [com.ibm.icu.util.BasicTimeZone API
+within 1 to 3 days). Having our own rule data also allows the ICU4J library to
+provide some advanced TimeZone features (see [com.ibm.icu.util.BasicTimeZone API
documentation](http://icu-project.org/apiref/icu4j/com/ibm/icu/util/BasicTimeZone.html)).
#### How can I get the latest time zone rule data patch?
You can use [ICU4J Time Zone Update
-Utility](http://icu-project.org/download/icutzu.html) to update the time zone
+Utility](http://site.icu-project.org/download/icutzu) to update the time zone
rule data to the latest.
#### I do not want to maintain yet another time zone rule data. Are there any way to configure ICU4J to use the JRE's time zone data?
@@ -168,29 +167,26 @@
documentation](http://icu-project.org/apiref/icu4j/com/ibm/icu/util/TimeZone.html)
for the details.
-### **StringSearch**
+### StringSearch
-#### **Do I have to know anything about Collators to use StringSearch?**
+#### Do I have to know anything about Collators to use StringSearch?
Since StringSearch uses a RuleBasedCollator to handle the language-sensitive
aspects of searching, understanding how collation works certainly helps. But the
only parts of the Collator API that you really need to know about are the
-collation strength values, PRIMARY, SECONDARY, and TERTIARY, that determine
-whether case and accents are ignored during a search.
+collation strength values, `PRIMARY`, `SECONDARY`, and `TERTIARY`, that
+determine whether case and accents are ignored during a search.
-#### **What algorithm are you using to perform the search?**
+#### What algorithm are you using to perform the search?
StringSearch uses a version of the Boyer-Moore search algorithm that has been
modified for use with Unicode. Rather than using raw Unicode character values in
its comparisons and shift tables, the algorithm uses collation elements that
have been "hashed" down to a smaller range to make the tables a reasonable size.
-An article explaining this algorithm in a fair amount of detail is schedule for
-publication in the February, 1999 issue of [Java
-Report.](http://www.javareport.com/)
-### **RuleBasedBreakIterator**
+### RuleBasedBreakIterator
-#### **Why did you bother to rewrite BreakIterator? Wasn't the old version working?**
+#### Why did you bother to rewrite BreakIterator? Wasn't the old version working?
It was working, but we were too constrained by the design. The break-data tables
were hard-coded, and there was only one set of them. This meant you couldn't
@@ -199,12 +195,12 @@
different word-breaking rules.) The hard-coded tables were also very
complicated, difficult to maintain, and easy to mess up, leading to mysterious
bugs. And in the original version, there was no way to subclass BreakIterator
-and get any implementation at all-- if you wanted different behavior, you had to
+and get any implementation at all -- if you wanted different behavior, you had to
rewrite the whole thing from scratch. We undertook this project to fix all these
problems and give us a better platform for future development. In addition, we
managed to get some significant performance improvements out of the new version.
-#### **What do you mean, performance improvements? It seems WAY slower to me!**
+#### What do you mean, performance improvements? It seems WAY slower to me!
The one thing that's significantly slower is construction. This is because it
actually builds the tables at runtime by parsing a textual description. In the
@@ -216,9 +212,9 @@
creation out of the inner loop and create a new BreakIterator only once per
word-wrapping operation, or once per document, you'll find that your performance
improves dramatically. If you still have problems after doing this, let us
-know-- there may be bugs we need to fix.
+know -- there may be bugs we need to fix.
-#### **This still has all the same bugs that the old BreakIterator did! Why would I want to use this one instead?**
+#### This still has all the same bugs that the old BreakIterator did! Why would I want to use this one instead?
Because now you can fix it. The resource data in this package was designed to
mimic as closely as possible the behavior of the original BreakIterator class
@@ -230,12 +226,12 @@
the wrong places. In the meantime, you can customize the resource data to modify
things to work the way you want them to.
-#### **Why is there no demo?**
+#### Why is there no demo?
We haven't had time to write a good demo for this new functionality yet. We'll
add one later.
-#### **What's this DictionaryBasedBreakIterator thing?**
+#### What's this DictionaryBasedBreakIterator thing?
This is a new feature that isn't in the JDK. DictionaryBasedBreakIterator is
intended for use with languages that don't put spaces between words (such as
@@ -246,14 +242,14 @@
boundaries should go. The algorithm we use for this is fast, accurate, and
error-tolerant.
-#### **Why do you have a Thai dictionary, but no resource data that actually lets me use it?**
+#### Why do you have a Thai dictionary, but no resource data that actually lets me use it?
We're not quite done doing the necessary research. We don't currently have good
test cases we can use to verify it's working correctly with Thai, nor are we
completely confident in our dictionary. If you can help us with this, we'd like
to hear from you!
-#### **What's this BreakIteratorRules_en_US_TEST thing?**
+#### What's this BreakIteratorRules_en_US_TEST thing?
This is a resource file that, in conjunction with the "english.dict" dictionary,
we used to test the dictionary-based break iterator. It allows you to locate
@@ -262,7 +258,7 @@
industrial-strength, however: we included enough words to make for a reasonable
test, but it's by no means complete or anywhere near it.
-#### **How can I create my own dictionary file?**
+#### How can I create my own dictionary file?
Right now, you can't. We didn't include the tool we used to create dictionary
files because it's very rough and extremely slow. There's also a strong
diff --git a/docs/userguide/icufaq/index.md b/docs/userguide/icufaq/index.md
index 8250951..7a4bec0 100644
--- a/docs/userguide/icufaq/index.md
+++ b/docs/userguide/icufaq/index.md
@@ -1,3 +1,8 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
# ICU FAQs
## Introduction to ICU
@@ -28,7 +33,7 @@
* **Windows**:
* The DLLs you may need for your application are located in
- **bin\\icu*XX##.*dll**, where "XX" are two letters (such as "uc" for the
+ **bin\\icuXX##.dll**, where "XX" are two letters (such as "uc" for the
"common" library, "in" for the "i18n" library, etc.) and ## is the major
and the minor version number (such as **42** for **4.2** / **4.2**.0.1
or **4.2**.4 ).
@@ -47,9 +52,9 @@
Makefile include fragment **/usr/local/lib/icu/current/Makefile.inc**
can be used in building applications.
-#### Can you help me build ICU4C for...
+#### Can you help me build ICU4C for ...
-We can try.. make sure you read the latest "readme" and also the [ICU
+We can try ... make sure you read the latest "readme" and also the [ICU
Data](../icudata.md) section. You might also [searching the icu-support
archives](http://site.icu-project.org/contacts), and then posting a question
there. Additionally, sites such as
@@ -67,16 +72,17 @@
#### What is the ICU binary compatibility policy?
-Please see the section on binary compatibility (§) in the [design
-chapter](../design.md) .
+Please see the section on
+[binary compatibility](../design.md#icu-binary-compatibility-using-icu-as-an-operating-system-level-library)
+in the [design chapter](../design.md).
-#### How is the ICU licensed?
+#### How is ICU licensed?
The ICU license is intended to allow ICU to be included both in free software
projects and in proprietary or commercial products.
-Since ICU 58, ICU is covered by the [Unicode
-license](http://www.unicode.org/copyright.html#License) which is very similar to
+Since ICU 58, ICU is covered by the
+[Unicode license](http://www.unicode.org/copyright.html#License) which is very similar to
the previous ICU license.
ICU 1.8.1–ICU 57 and ICU4J 1.3.1–ICU4J 57 are covered by the [ICU
@@ -88,8 +94,8 @@
#### Can I use ICU from other languages besides C/C++ and Java?
-There are a number of wrappers available, please see the [Related
-Projects](http://site.icu-project.org/related) page.
+There are a number of wrappers available, please see the
+[Related Projects](http://site.icu-project.org/related) page.
#### How do I upgrade to a new version of ICU? Should I be concerned about API changes, a new Unicode version or a new CLDR version)?
@@ -98,9 +104,10 @@
upgrade-friendly.
* **API:** ensure that you are not using draft APIs which may have changed in
- a future release. See the section on API compatibility (§) in the [design
- chapter](../design.md) .
-* **Unicode: S**ee the release notes for particular versions of Unicode to
+ a future release. See the section on
+ [API compatibility](../design.md#icu-api-compatibility) in the
+ [design chapter](../design.md).
+* **Unicode:** See the release notes for particular versions of Unicode to
ensure that your code is not affected by property changes or other
specification changes.
* **CLDR:** If your application has test cases which depend on specific
@@ -110,13 +117,15 @@
cases. Also, a newer version may support additional translations,
currencies, types of calenders
* **Building/Deploying your Application (ICU4C):** ICU4C usually builds with
- symbol renaming ( See: binary compatibility (§) in the [design
- chapter](../design.md) ). Be sure that you build your application with the
- updated ICU header files, so that it will link against the current ICU.
- Also, don't hard-code the names of ICU libraries in your build scripts and
- projects. Where possible, link against just the 'base name' such as
- `libicuuc.so `or `icuuc.lib `rather than a name containing the version
- number such as `libicuuc.so.`**`46`** or `icuuc`**`46`**`.dll`.
+ symbol renaming (See:
+ [binary compatibility](../design.md#icu-binary-compatibility-using-icu-as-an-operating-system-level-library)
+ in the [design chapter](../design.md)). Be sure that you build your
+ application with the updated ICU header files, so that it will link against
+ the current ICU. Also, don't hard-code the names of ICU libraries in your
+ build scripts and projects. Where possible, link against just the
+ 'base name' such as `libicuuc.so` or `icuuc.lib` rather than a name
+ containing the version number such as `libicuuc.so.**46**` or
+ `icuuc**46**.dll`.
## Building and Testing ICU
@@ -128,8 +137,9 @@
From ICU version 4.2 on, the configure script will build with the default bit
width of your platform. You can request 64 or 32 bits with the
-**--with-library-bits=** option, (e.g. runConfigureICU Linux**
---with-library-bits=64** or runConfigureICU MacOSX **--with-library-bits=32**).
+**--with-library-bits=** option, (e.g. `runConfigureICU Linux
+**--with-library-bits=64**` or `runConfigureICU MacOSX
+**--with-library-bits=32**`).
(For the behavior of attempting 64 bits if possible, use
**--with-library-bits=64else32**).
@@ -148,8 +158,8 @@
If you are using a compiler that hasn't been tested with ICU before, you may
have encountered an optimization bug with the compiler. On Unix platforms you
-can specify --disable-release when you are using runConfigureICU (e.g.
-runConfigureICU --disable-release LinuxRedHat). If this fixes your problem, it
+can specify **--disable-release** when you are using runConfigureICU (e.g.
+`runConfigureICU --disable-release LinuxRedHat`). If this fixes your problem, it
is recommended that you report the optimization bug to the compiler
manufacturer.
@@ -158,9 +168,10 @@
#### How can I reduce the size of the ICU data library?
-Use the [Data Customizer](http://apps.icu-project.org/datacustom) or see
-"Customizing ICU's Data Library" (§) in the [ICU Data Management](../icudata.md)
-chapter of this User's Guide.
+Use the [Data Customizer](https://unicode-org.atlassian.net/browse/ICU-12835)
+or see
+[Customizing ICU's Data Library](../icudata.md#customizing-icus-data-library)
+in the [ICU Data Management](../icudata.md) chapter of this User's Guide.
#### Why am I seeing a small ( only a few K ) instead of a large ( several megabytes ) data shared library (icudt)?
#### Opening ICU services fails with U_MISSING_RESOURCE_ERROR and u_init() returns failure.
@@ -177,18 +188,19 @@
#### Can I add or remove a converter from ICU?
-Yes. Please see "Customizing ICU's Data Library" (§) in the [ICU Data
-Management](../icudata.md) chapter of this User's Guide. You can also get extra
-converters from <http://www.icu-project.org/charts/charset/> or use the [ICU
-Data Customizer](http://apps.icu-project.org/datacustom/) tool.
+Yes. Please see [Customizing ICU's Data Library](../icudata.md#customizing-icus-data-library)
+in the [ICU Data Management](../icudata.md) of this User's Guide. You can also
+get extra converters from <http://www.icu-project.org/charts/charset/> or use
+the [ICU Data Customizer](https://unicode-org.atlassian.net/browse/ICU-12835)
+tool.
#### Why don't the makefiles work?
You need GNU's make program version 3.8 or later, and you need to run the
-runConfigureICU script, which is located in the icu/source directory. You may be
-using a platform that ICU does not support. If the first two answers do not
-apply to you, then you should send an e-mail to the [ICU4C Support
-List](http://www.icu-project.org/contacts.html) .
+runConfigureICU script, which is located in the `icu/source directory`. You may
+be using a platform that ICU does not support. If the first two answers do not
+apply to you, then you should send an e-mail to the
+[ICU4C Support List](http://www.icu-project.org/contacts.html).
Here are some places you can find gmake:
@@ -222,18 +234,16 @@
functions and type-safe constants instead of #define, etc. However, we do not
use exceptions, and we do not use the Standard Template Library (STL), so
ICU4C's dependencies on the C++ library are minimal. See the new
-[dependencies.txt](http://bugs.icu-project.org/trac/browser/icu/trunk/source/test/depstest/dependencies.txt)
+[dependencies.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/depstest/dependencies.txt)
and search for "group: cplusplus".
As ICU does not use exceptions, the GCC option `-fno-exceptions` will reduce or
remove the dependencies on the standard C++ library. In
[GCC](http://gcc.gnu.org) 4.5 there is an option `-static-libstdc++` which will
-remove C++ library dependencies. (Also see [this
-article](http://www.trilithium.com/johan/2005/06/static-libstdc/).) Visual
-Studio has the [/MT
-option](http://msdn.microsoft.com/en-us/library/2kzt1wy3(v=VS.100).aspx), and
-other compilers may have similar options. See the [How To Use
-ICU](../howtouseicu.md) page for related information on this topic.
+remove C++ library dependencies. Visual Studio has the
+[/MT option](http://msdn.microsoft.com/en-us/library/2kzt1wy3(v=VS.100).aspx),
+and other compilers may have similar options. See the
+[How To Use ICU](../howtouseicu.md) page for related information on this topic.
## Features of ICU
@@ -243,11 +253,12 @@
#### How are the APIs documented for deprecation?
-Please read the API lifecycle section in the [ICU Design](../design.md) chapter.
+Please read the [ICU API compatibility](../design.md#icu-api-compatibility)
+section in the [ICU Design](../design.md) chapter.
#### What version of Unicode standard does ICU support?
-ICU versions 4.0 supports Unicode version 5.1.
+ICU versions 65 supports Unicode version 12.
The Unicode versions for older versions of ICU are listed on the ICU download
page, <http://www.icu-project.org/download/>
@@ -263,7 +274,7 @@
#### How does ICU relate to Java's java.text.\* package?
-The International Components for Unicode is available both as a C/C++ library
+The International Components for Unicode are available both as a C/C++ library
and a Java class library. ICU provides internationalization utilities for
writing global applications in C, C++ or Java programming languages. ICU was
originally developed by the Unicode group at the IBM Globalization Center of
@@ -290,16 +301,17 @@
#### How do I declare a Unicode string in ICU?
-Use the U_STRING_DECL and U_STRING_INIT macros or use the UnicodeString class
-for C++. Strings are represented as UChar \* as the base string type.
+Use the `U_STRING_DECL` and `U_STRING_INIT` macros or use the UnicodeString
+class for C++. Strings are represented as `UChar \*` as the base string type.
-Even though most platforms declare wide strings as wchar_t \* or L"" as the base
-string type, that declaration is not portable because the sizeof(wchar_t) can be
-1, 2 or 4, and the encoding may not even be Unicode. On the platforms where
-sizeof(wchar_t) is 2 bytes, UChar is defined as wchar_t. In that case you can
-use ICU's strings with 3rd party legacy functions; however, we do not suggest
-using Unicode strings without the U_STRING_DECL and U_STRING_INIT macros or
-UnicodeString class because they are platform independent implementations.
+Even though most platforms declare wide strings as `wchar_t \*` or `L""` as the
+base string type, that declaration is not portable because the `sizeof(wchar_t)`
+can be 1, 2 or 4, and the encoding may not even be Unicode. On the platforms
+where `sizeof(wchar_t)` is 2 bytes, `UChar` is defined as `wchar_t`. In that
+case you can use ICU's strings with 3rd party legacy functions; however, we do
+not suggest using Unicode strings without the `U_STRING_DECL` and
+`U_STRING_INIT` macros or UnicodeString class because they are platform
+independent implementations.
#### How is a Unicode string represented in ICU4C?
@@ -314,20 +326,24 @@
deprecated and dead. Unicode 2.0 in 1996 changed its default encoding to UTF-16.
If you need to do a quick and easy conversion between UTF-16 and UTF-8, UTF-32
-or an encoding in wchar_t, you should take a look at unicode/ustring.h. In that
-header file you will find u_strToWCS, u_strFromWCS, u_strToUTF8, u_strFromUTF8,
-u_strToUTF32 and u_strFromUTF32 functions. These functions are provided for your
-convenience instead of using the ucnv_\* API.
+or an encoding in `wchar_t`, you should take a look at unicode/ustring.h. In
+that header file you will find `u_strToWCS`, `u_strFromWCS`, `u_strToUTF8`,
+`u_strFromUTF8`, `u_strToUTF32` and `u_strFromUTF32` functions. These
+functions are provided for your convenience instead of using the `ucnv_\*` API.
-You can also take a look at the UTF_\*, UTF8_\*, UTF16_\* and UTF32_\* macros,
-which are defined in unicode/utf.h, unicode/utf8.h, unicode/utf16.h and
-unicode/utf32.h. These macros are helpful for programmers that need to
-manipulate and process Unicode strings.
+You can also take a look at the `UTF_\*`, `UTF8_\*`, `UTF16_\*` and `UTF32_\*`
+macros, which are defined in
+[unicode/utf.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf.h),
+[unicode/utf8.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf8.h),
+[unicode/utf16.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf16.h)
+and [unicode/utf32.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf32.h).
+These macros are helpful for programmers that need to manipulate and process
+Unicode strings.
#### How do I index into a UTF-16 string?
Typically, indexes and offsets in strings count string units, not characters
-(although in c and java they have a char type).
+(although in C and Java they have a char type).
For example, in old-fashioned MBCS strings, you would count indexes and offsets
by bytes, not by the variable-width character count. In UTF-16, you do the same,
@@ -346,16 +362,16 @@
check for illegal bytes in UTF-8. Incomplete character sequences in UTF-16 are
less important and more benign. If you want to quickly convert small strings
between the different UTF encodings or get a UChar32 value, you can use the
-macros provided in utf.h and its siblings utf8.h and utf16.h. For larger or
-partial strings, please use the conversion API.
+macros provided in `utf.h` and its siblings `utf8.h` and `utf16.h`. For larger
+or partial strings, please use the conversion API.
#### How do the converters work?
The converters act like a data stream. This means that the state of the last
-character is saved in the converter after each call to the ucnv_fromUnicode()
-and ucnv_toUnicode() functions. So if the source buffer ends with part of a
-surrogate Unicode character pair, the next call to ucnv_toUnicode() will write
-out the equivalent character to the destination buffer. Please see the
+character is saved in the converter after each call to the `ucnv_fromUnicode()`
+and `ucnv_toUnicode()` functions. So if the source buffer ends with part of a
+surrogate Unicode character pair, the next call to `ucnv_toUnicode()` will
+write out the equivalent character to the destination buffer. Please see the
[Conversion](../conversion/index.md) chapter of the User's Guide for details.
#### What does a locale look like in ICU?
@@ -385,29 +401,34 @@
There is no relationship. ICU is not dependent on the operating system for the
locale data.
-This also means that uloc_setDefault() does not affect the operating system. The
-function uloc_setDefault() only sets ICU's default locale. Normally the default
-locale for ICU is whatever the operating system says is the default locale.
+This also means that `uloc_setDefault()` does not affect the operating system.
+The function `uloc_setDefault()` only sets ICU's default locale. Normally the
+default locale for ICU is whatever the operating system says is the default
+locale.
#### How are errors handled in ICU?
Since not all compilers can handle exceptions, we return an error from functions
-with a UErrorCode parameter. The UErrorCode parameter of a function will return
-any errors that occurred while it was executing. It's usually a good idea to
-check for errors after calling a function by using the U_SUCCESS and U_FAILURE
-macros. U_SUCCESS returns true when the function did run properly, and U_FAILURE
-returns true when the function did NOT run properly. You may handle specific
-errors from a function by checking the exact value of error. The possible values
-of UErrorCode are located in utypes.h of the common project. Before any function
-is called with a UErrorCode, it must be initialized to U_ZERO_ERROR.
+with a `UErrorCode` parameter. The `UErrorCode` parameter of a function will
+return any errors that occurred while it was executing. It's usually a good idea
+to check for errors after calling a function by using the `U_SUCCESS` and
+`U_FAILURE` macros. `U_SUCCESS` returns true when the function did run properly,
+and `U_FAILURE` returns true when the function did NOT run properly. You may
+handle specific errors from a function by checking the exact value of error. The
+possible values of `UErrorCode` are located in
+[utypes.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utypes.h)
+of the common project. Before any function is called with a `UErrorCode`, it
+must be initialized to `U_ZERO_ERROR`.
-Here is an example of UErrorCode being used.
+Here is an example of `UErrorCode` being used.
+```c++
UErrorCode err = U_ZERO_ERROR;
callMyFunction(&err);
if (U_FAILURE(err)) {
puts("callMyFunction() Failed!");
}
+```
Please see the [ICU Design](../design.md) chapter for details.
diff --git a/docs/userguide/index.md b/docs/userguide/index.md
index 44890f3..a37be87 100644
--- a/docs/userguide/index.md
+++ b/docs/userguide/index.md
@@ -1,3 +1,8 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
# Introduction to ICU
As companies integrate e-commerce on a global scale into their fundamental
@@ -8,8 +13,6 @@
## Meeting the Challenge of Globalization
-Internationalized software results in an increase in:
-
In today's business climate of globalization, companies must compete in a new
Internet-enabled business climate of constant change and compressed time frames.
Their customers expect reliable service and support.
@@ -84,10 +87,10 @@
Download ICU in one of the following ways:
-1. From the down load page, <http://www.icu-project.org/download/> , for
+1. From the download page, <http://www.icu-project.org/download/>, for
packaged stable releases of ICU.
-2. From the source code repository, <http://www.icu-project.org/repository/> ,
+2. From the source code repository, <http://www.icu-project.org/repository/>,
for the latest development versions.
After downloading, see the included README file for information on what is
@@ -95,6 +98,6 @@
## ICU License
-Current license: <http://source.icu-project.org/repos/icu/trunk/icu4c/LICENSE>
+Current license: <https://github.com/unicode-org/icu/blob/master/icu4c/LICENSE>
See also <https://github.com/unicode-org/icu/blob/userguide-migration/docs/userguide/icufaq/index.md#how-is-the-icu-licensed>
diff --git a/docs/userguide/io/ustdio.md b/docs/userguide/io/ustdio.md
index 912e89e..f5be6c8 100644
--- a/docs/userguide/io/ustdio.md
+++ b/docs/userguide/io/ustdio.md
@@ -1,7 +1,12 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
# C: ustdio
This API provides a `<stdio.h>`-like API wrapper around ICU's other [formatting
-and parsing](../index.md) APIs. It is meant to ease the transition of adding
+and parsing](../formatparse/index.md) APIs. It is meant to ease the transition of adding
Unicode support to a preexisting applications using stdio. The following is a
small list of noticable differences between stdio and ICU I/O's ustdio
implementation.
diff --git a/docs/userguide/io/ustream.md b/docs/userguide/io/ustream.md
index ac440e0..0730578 100644
--- a/docs/userguide/io/ustream.md
+++ b/docs/userguide/io/ustream.md
@@ -1,3 +1,8 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
# C++: ustream
The ustream interface provides a Unicode iostream-like API.