docs/userguide/design.md - external/github.com/unicode-org/icu - Git at Google

 ---
 layout: default
 title: ICU Design
 nav_order: 5
 parent: ICU
 ---
 <!--
 © 2020 and later: Unicode, Inc. and others.
 License & terms of use: http://www.unicode.org/copyright.html
 -->

 # ICU Architectural Design
 {: .no_toc }

 ## Contents
 {: .no_toc .text-delta }

 1. TOC
 {:toc}

 ---

 # Overview

 This chapter discusses the ICU design structure, the ICU versioning support, and
 the introduction of namespace in C++.

 ## Java and ICU Basic Design Structure

 The JDK internationalization components and ICU components both share the same
 common basic architectures with regard to the following:

 1. [Locales](#locales)
 2. [Data-driven services](#data-driven-services)
 3. [ICU threading models and the open and close model](#icu-threading-model-and-open-and-close-model)
 4. [Cloning customization](#cloning-customization)
 5. [Error handling](#error-handling)
 6. [Extensibility](#extensibility)
 7. [Resource bundle inheritance model](#resource-bundle-inheritance-model)

 There are design features in ICU4C that are not in the Java Development Kit
 (JDK) due
 to programming language restrictions. These features include the following:

 ### Locales

 Locale IDs are composed of language, country, and variant information. The
 following links provide additional useful information regarding ISO standards:
 [ISO-639](http://lcweb.loc.gov/standards/iso639-2/englangn.html), and an ISO
 Country Code,
 [ISO-3166](http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html).
 For example, Italian, Italy, and Euro are designated as: it_IT_EURO.

 ### Data-driven Services

 Data-driven services often use resource bundles for locale data. These services
 map a key to data. The resources are designed not only to manage system locale
 information but also to manage application-specific or general services data.
 ICU supports string, numeric, and binary data types and can be structured into
 nested arrays and tables.

 This results in the following:

 1. Data used by the services can be built at compile time or run time.
 2. For efficient loading, system data is pre-compiled to .dll files or files
    that can be mapped into memory.
 3. Data for services can be added and modified without source code changes.

 ### ICU Threading Model and Open and Close Model

 The "open and close" model supports multi-threading. It enables ICU users to use
 the same kind of service for different locales, either in the same thread or in
 different threads.

 For example, a thread can open many collators for different languages, and
 different threads can use different collators for the same locale
 simultaneously. Constant data can be shared so that only the current state is
 allocated for each editor.

 The ICU threading model is designed to avoid contention for resources, and
 enable you to use the services for multiple locales simultaneously within the
 same thread. The ICU threading model, like the rest of the ICU architecture, is
 the same model used for the international services in Java™.

 When you use a service such as collation, the client opens the service using an
 ID, typically a locale. This service allocates a small chunk of memory used for
 the state of the service, with pointers to shared, read-only data in support of
 that service. (In Java, you call `getInstance()` to create an object; in C++,
 `createInstance()`. ICU uses the open and close metaphor in C because it is more
 familiar to C programmers.)

 If no locale is supplied when a service is opened, ICU uses the default locale.
 Once a service is open, changing the default locale has no effect. Thus, there
 can not be any thread synchronization between the default locales and open
 services.

 When you open a second service for the same locale, another small chunk of
 memory is used for the state of the service, with pointers to the same shared,
 read-only data. Thus, the majority of the memory usage is shared. When any
 service is closed, then the chunk of memory is deallocated. Other connections
 that point to the same shared data stay valid.

 Any number of services, for the same locale or different locales, can be open
 within the same thread or in different threads.

 #### Thread-safe const APIs

 In recent ICU releases, we have worked to make any service object *thread-safe*
 (usable concurrently) *as long as all of the threads are using only const APIs*:
 APIs that are declared const in C++, take a const this-like service pointer in
 C, or are "logically const" in Java. This is an enhancement over the original
 Java/ICU threading model. (Originally, concurrent use of even only const APIs
 was not thread-safe.)

 However, you cannot use a reference to an open service object in two threads at
 the same time *if either of them calls any non-const API*. An individual open
 service object is not thread-safe for concurrent "writes". Rather, for non-const
 use, you must use the clone function to create a copy of the service you want
 and then pass this copy to the second thread. This procedure allows you to use
 the same service in different threads, but avoids any thread synchronization or
 deadlock problems.

 #### Freezable

 Some classes also implement the `Freezable` interface (or similar pattern in
 C++), for example `UnicodeSet` or `Collator`: An object that typically starts
 out mutable can be set up and then "frozen", which makes it immutable and thus
 usable concurrently because all non-const APIs are disabled. A frozen object can
 never be "thawed". For example, a `Collator` can be created, various attributes
 set, then frozen and then used from many threads for comparing strings and
 getting sort keys.

 #### Clone vs. open

 Clone operations are designed to be much faster than reopening the service with
 initial parameters and copying the source's state. (With objects in C++ and
 Java, the clone function is also much safer than trying to recreate a service,
 since you get the proper subclass.) Once a service is cloned, changes will not
 affect the original source service, or vice-versa.

 Thus, the normal mode of operation is to:

 1. Open a service with a given locale.
 2. Use the service as long as needed. However, do not keep opening and closing
    a service within a tight loop.
 3. Clone a service if it needs to be used in parallel in another thread.
 4. Close any clones that you open as well as any instances of the services that
    are owned.

 > :point_right: **Note**: These service instances may be closed in any sequence.
 The preceding steps are given as an example.

 ### Cloning Customization

 Typically, the services supplied with ICU cover the vast majority of usages.
 However, there are circumstances where the service needs to be customized for a
 new locale. ICU (and Java) enable you to create customized services. For
 example, you can create a `RuleBasedCollator` by merging the rules for French and
 Arabic to get a custom French-Arabic collation sequence. By merging these rules,
 the pointer does not point to a read-only table that is shared between threads.
 Instead, the pointer refers to a table that is specific to your particular open
 service. If you clone the open service, the table is copied. When you close the
 service, the table is destroyed.

 For some services, ICU supplies registration. You can register a customized open
 service under an ID; keeping a copy of that service even after you close the
 original. A client in that thread or in other threads can recreate a copy of the
 service by opening with that ID.

 ICU may cache service instances. Therefore, registration should be done during
 startup, before opening services by locale ID.

 These registrations are not persistent; once your program finishes, ICU flushes
 all the registrations. While you still might have multiple copies of data
 tables, it is faster to create a service from a registered ID than it is to
 create a service from rules.

 > :point_right: **Note**: To work around the lack of persistent registration,
 query the service for the parameters used to create it and then store those
 parameters in a file on a disk.

 For services whose IDs are locales, such as collation, the registered IDs must
 also be locales. For those services (like Transliteration or Timezones) that are
 cross-locale, the IDs can be any string.

 Prospective future enhancements for this model are:

 1. Having custom services share data tables, by making those tables reference
    counted. This will reduce memory consumption and speed clone operations (a
    performance enhancement chiefly useful for multiple threads using the same
    customized service).
 2. Expanding registration for all the international services.
 3. Allowing persistent registration of services.

 #### Per-client Locale ID vs Per-thread Locale ID

 Some application environments operate by setting a per thread (or per process)
 locale ID, and then not passing the locale ID as a parameter during processing.
 If this usage model were used with ICU in a multi-threaded server, it might
 result in ICU being requested to constantly open, use, and then close service
 objects. Instead, it is recommended that locale IDs be associated with each
 client be stored with other per-client data, along with any service objects
 (such as collators or formatters) that client might use. If operations involving
 a single client are short-lived, it might be more efficient to keep a pool of
 service objects, organized according to locale. Then, if a particular locale's
 formatter is in high demand, that formatter can be used, and then returned to
 the pool.

 ### ICU Memory Usage

 ICU4C APIs are designed to allow separate heaps for its libraries vs. the
 application. This is achieved by providing functions to allocate and release
 objects owned by ICU4C using only ICU4C library functions. For more details see
 the Memory Usage section in the [Coding Guidelines](dev/codingguidelines.md#memory-usage).

 ### ICU4C Initialization and Termination

 The ICU library does not normally require any explicit initialization prior to
 use. An application begins use simply by calling any ICU API in the usual way.
 There are, however, a few functions affecting ICU's configuration, that, if used,
 must be called first, before other use of ICU in a process. These are outlined below.

 1. `u_setMemoryFunctions()`. This function replaces the standard library heap
 allocation functions used by ICU with alternate versions, provided by the
 application. If it is needed, `u_setMemoryFunctions()` must be called first, before
 any other use of ICU. This functionality is not commonly used.

 2. ICU Data Locating Functions, `u_setCommonData()`, `u_setDataDirectory()`, and
 `u_setAppData()`. One or more of these functions will be required when ICU is
 configured to load its data directly from files rather than taking it from the
 default data DLL, and the files are not in the default location. Again, this is
 not common. See [ICU Data](icudata#icu-data-directory).

 3. Sanity check that ICU is functioning and able to access data. This is
 important because configuration or installation problems that leave ICU unable
 to load its data do occur, and the resulting failures can be confusing.
 Since not all ICU APIs have UErrorCode parameters, in the absence of data they
 may sometimes silently return incorrect results.

    The function `ulocdata_getCLDRVersion()` is suitable; it is small and light
 weight, requires data, and reports the error in the absence of data.


 When an application is terminating it should call the function `u_cleanup()`,
 which frees all heap storage and other system resources that are held internally
 by the ICU library. While the use of `u_cleanup()` is not strictly required,
 failure to call it will cause memory leak checking tools to report problems for
 resources being held by ICU library.

 Before calling `u_cleanup()`, all ICU objects that were created by the
 application must be deleted, and all ICU services (plain C APIs) must be closed.

 For some platforms the configure option `--enable-auto-cleanup`, or defining
 the option `UCLN_NO_AUTO_CLEANUP` to 0, will add code which automatically cleans
 up ICU when its shared library is unloaded. See comments in `ucln_imp.h`

 #### C++ Static Initialization and Destruction

 The ICU library itself does not rely on C++ static initializers, meaning that
 applications will not encounter order-of-initialization problems from the use of
 ICU.

 There are, however, some significant limitations for applications that make use
 of ICU at C++ static initialization time:

 1.  `u_setMemoryFunctions()` and the data locating functions, if needed, must
 still be called before any other use of ICU. Which includes any use during the
 construction of static objects.

 2.  `u_cleanup()` can only be called after all other ICU-using objects have been
 deleted. Finding a suitable time and place for the call to `u_cleanup()` may be
 difficult, however. Refer to the C++ literature on the order of static
 initialization and destruction.

 3.  Destruction of static objects that are scoped to a code block. These, by the
 conventions of C++, are lazily initialized when the code block is first entered,
 so there are no issues during static initialization. But object destruction
 happens when the program terminates, leaving the problem of where to call
 `u_cleanup()`, as discussed above.

 #### Dynamically Loading and Unloading ICU

 Applications may arrange to dynamically load the ICU library when it is needed,
 and unload it when through, repeating the process as required. The specific
 details for loading and unloading, and accessing such libraries, are operating
 system dependent.

 For ICU to be used in this way, before unloading, all ICU objects and services
 must be closed or deleted, and `u_cleanup()` must be called.

 On Windows, the loading and unloading of ICU should never be done inside
 [DllMain](https://docs.microsoft.com/en-us/windows/win32/dlls/dllmain). Loading
 one of the ICU libraries can cause other libraries or files to be loaded,
 leading to potential dead-lock.

 #### Initializing ICU in Multithreaded Environments

 There is one specialized case where extra care is needed to safely initialize
 ICU. This situation will arise only when ALL of the following conditions occur:

 1. The application main program is written in plain C, not C++.
 2. The application is multithreaded, with the first use of ICU within the
    process possibly occurring simultaneously in more than one thread.
 3. The application will be run on a platform that does not handle C++ static
    constructors from libraries when the main program is not in C++. Platforms
    known to exhibit this behavior are Mac OS X and HP/UX. Platforms that handle
    C++ libraries correctly include Windows, Linux and Solaris.

 To safely initialize the ICU library when all of the above conditions apply, the
 application must explicitly arrange for a first-use of ICU from a single thread
 before the multi-threaded use of ICU begins. A convenient ICU operation for this
 purpose is `uloc_getDefault()`, declared in the header file `unicode/uloc.h`.

 > :point_right: **Note**: The status of this situation needs further
 investigation. See issue
 [ICU-21380](https://unicode-org.atlassian.net/browse/ICU-21380)


 ### Error Handling

 In order for ICU to maximize portability, this version includes only the subset
 of the C++ language that compile correctly on older C++ compilers and provide a
 usable C interface. Thus, there is no use of the C++ exception mechanism in the
 code or Application Programming Interface (API).

 To communicate errors reliably and support multi-threading, this version uses an
 error code parameter mechanism. Every function that can fail takes an error-code
 parameter by reference. This parameter is always the last parameter listed for
 the function.

 The `UErrorCode` parameter is defined as an enumerated type. Zero represents no
 error, positive values represent errors, and negative values represent non-error
 status codes. Macros (`U_SUCCESS` and `U_FAILURE`) are provided to check the
 error code.

 The `UErrorCode` parameter is an input-output function. Every function tests the
 error code before performing any other task and immediately exits if it produces
 a FAILURE error code. If the function fails later on, it sets the error code
 appropriately and exits without performing any other work, except for any
 cleanup it needs to do. If the function encounters a non-error condition that it
 wants to signal, such as "encountered an unmapped character" in conversion, the
 function sets the error code appropriately and continues. Otherwise, the
 function leaves the error code unchanged.

 Generally, only the functions that do not take a `UErrorCode` parameter, but
 call functions that do, must declare a variable. Almost all functions that take
 a `UErrorCode` parameter, and also call other functions that do, merely have to
 propagate the error code that they were passed to the functions they call.
 Functions that declare a new `UErrorCode` parameter must initialize it to
 `U_ZERO_ERROR` before calling any other functions.

 ICU enables you to call several functions (that take error codes) successively
 without having to check the error code after each function. Each function
 usually must check the error code before doing any other processing, since it is
 supposed to stop immediately after receiving an error code. Propagating the
 error-code parameter down the call chain saves the programmer from having to
 declare the parameter in every instance and also mimics the C++ exception
 protocol more closely.

 ### Extensibility

 There are 3 major extensibility elements in ICU:

 1. **Data Extensibility**:
    The user installs new locales or conversion data to enhance the existing ICU
    support. For more details, refer to the package tool (:construction: **TODO**: need link)
    chapter for more information.
 2. **Code Extensibility**:
    The classes, data, and design are fully extensible. Examples of this
    extensibility include the BreakIterator , RuleBasedBreakIterator and
    DictionaryBasedBreakIterator classes.
 3. **Error Handling Extensibility**:
    There are mechanisms available to enhance the built-in error handling when
    it is necessary. For example, you can design and create your own conversion
    callback functions when an error occurs. Refer to the
    [Conversion](conversion/index.md) chapter callback section for more
    information.

 ### Resource Bundle Inheritance Model

 A resource bundle is a set of \<key,value> pairs that provide a mapping from key
 to value. A given program can have different sets of resource bundles; one set
 for error messages, one for menus, and so on. However, the program may be
 organized to combine all of its resource bundles into a single related set.

 The set is organized into a tree with "root" at the top, the language at the
 first level, the country at the second level, and additional variants below
 these levels. The set must contain a root that has all keys that can be used by
 the program accessing the resource bundles.

 Except for the root, each resource bundle has an immediate parent. For example,
 if there is a resource bundle `X_Y_Z`, then there must be the resource bundles:
 `X_Y`, and `X`. Each child resource bundle can omit any \<key,value> pair that is
 identical to its parent's pair. (Such omission is strongly encouraged as it
 reduces data size and maintenance effort). It must override any \<key,value> pair
 that is different from its parent's pair. If you have a resource bundle for the
 locale ID `language_country_variant`, you must also have
 a bundle for the ID `language_country` and one for the ID `language`.

 If a program doesn't find a key in a child resource bundle, it can be assumed
 that it has the same key as the parent. The default locale has no effect on
 this. The particular language used for the root is commonly English, but it
 depends on the developer's preference. Ideally, the language should contain
 values that minimize the need for its children to override it.

 The default locale is used only when there is not a resource bundle for a given
 language. For example, there may not be an Italian resource bundle. (This is
 very different than the case where there is an Italian resource bundle that is
 missing a particular key.) When a resource bundle is missing, ICU uses the
 parent unless that parent is the root. The root is an exception because the root
 language may be completely different than its children. In this case, ICU uses a
 modified lookup and the default locale. The following are different lookup
 methods available:

 **Lookup chain** : Searching for a resource bundle.

     en_US_<some-variant>
     en_US
     en
     <defaultLang>_<defaultCountry>
     <defaultLang>
     root

 **Lookup chain** : Searching for a \<key, value> pair after
 `en_US_<some-variant>` has ben loaded. ICU does not use the default locale in
 this case.

     en_US_<some-variant>
     en_US
     en
     root

 ## Other ICU Design Principles

 ICU supports extensive version code and data changes and introduces namespace
 usage.

 ### Version Numbers in ICU

 Version changes show clients when parts of ICU change. ICU; its components (such
 as `Collator`); each resource bundle, including all the locale data resource
 bundles; and individual tagged items within a resource bundle, have their own
 version numbers. Version numbers numerically and lexically increase as changes
 are made.

 All version numbers are used in Application Programming Interfaces (APIs) with a
 `UVersionInfo` structure. The `UVersionInfo` structure is an array of four
 unsigned bytes. These bytes are:

 1. Major version number
 2. Minor version number
 3. Milli version number
 4. Micro version number

 Two `UVersionInfo` structures may be compared using binary comparison (`memcmp`)
 to see which is larger or newer. Version numbers may be different for different
 services. For instance, do not compare the ICU library version number to the ICU
 collator version number.

 `UVersionInfo` structures can be converted to and from string representations as
 dotted integers (such as "1.4.5.0") using the `u_versionToString()` and
 `u_versionFromString()` functions. String representations may omit trailing zeros.

 The interpretation of version numbers depends on what is being described.

 #### ICU Release Version Number (ICU 49 and later)

 The first version number field contains the ICU release version number, for
 example 49. Each new version might contain new features, new locale data, and
 modified behavior. (See below for more information on
 [ICU Binary Compatibility](#icu-binary-compatibility)).

 The second field is 1 for the initial release (e.g., 49.1). The second and
 sometimes third fields are incremented for binary compatible maintenance
 releases.

 * For maintenance releases for only either C or J, the third field is
   incremented (e.g., ICU4C 49.1.1).
 * For shared updates for C & J, the second field is incremented to 2 and
   higher (e.g., ICU4C & ICU4J 49.2).

 (The second field is 0 during development, with milestone numbers in the third
 field during that time. For example, 49.0.1 for 49 milestone 1.)

 #### ICU Release Version Number (ICU 1.4 to ICU 4.8)

 In earlier releases, the first two version fields together indicated the ICU
 release, for example 4.8. The third field was 0 for the initial release, and 1
 and higher for binary compatible (bug fixes only) maintenance releases (e.g.,
 4.8.1). The fourth field was used for updates specific to only one of Java, C++,
 or ICU-in-Eclipse.

 The second version field was *even* for formal releases ("reference releases")
 (e.g., 1.6 or 4.8) and *odd* during their development (unreleased unstable
 snapshot versions; e.g., 4.7). During development, the third field contained the
 milestone number (e.g., 4.7.1 for 4.8 milestone 1). For very old ICU code, we
 published semi-formal “enhancement” releases with odd second-field numbers
 (e.g., 1.7).

 Library filenames and some other internal uses already used a concatenation of
 the first two fields ("48" for 4.8).

 #### Resource Bundles and Elements

 The data stored in resource bundles is tagged with version numbers. A resource
 bundle can contain a tagged string named "Version" that declares the version
 number in dotted-integer format. For example,

 ```text
 en {
     Version { "1.0.3.5" }
     ...
 }
 ```

 A resource bundle may omit the "version" element and thus, will inherit a
 version along the usual chain. For example, if the resource bundle **en_US**
 contained no "version" element, it would inherit "1.0.3.5" from the parent en
 element. If inheritance passes all the way to the root resource bundle and it
 contains no "version" resource, then the resource bundle receives the default
 version number 0.

 Elements within a resource bundle may also contain version numbers. For example:

 ```text
 be {
     CollationElements {
         Version { "1.0.0.0" }
         ...
     }
 }
 ```

 In this example, the CollationElements data is version 1.0.0.0. This element
 version is not related to the version of the bundle.

 #### Internal version numbers

 Internally, data files carry format and other version numbers. These version
 numbers ensure that ICU can use the data file. The interpretation depends
 entirely on the data file type. Often, the major number in the format version
 stays the same for backwards-compatible changes to a data file format. The minor
 format version number is incremented for additions that do not violate the
 backwards compatibility of the data file.

 #### Component Version Numbers

 ICU component version numbers may be found using:

 1. `u_getVersion()` returns the version number of ICU as a whole in C++. In C,
    `ucol_getVersion()` returns the version number of ICU as a whole.
 2. `ures_getVersion()` and `ResourceBundle::getVersion()` return the version
    number of a ResourceBundle. This is a data version number for the bundle as a
    whole and subject to inheritance.
 3. `u_getUnicodeVersion()` and `Unicode::getUnicodeVersion()` return the version
    number of the Unicode character data that underlies ICU. This version
    reflects the numbering of the Unicode releases. See
    <http://www.unicode.org/> for more information.
 4. `Collator::getVersion()` in C++ and `ucol_getVersion()` in C return the version
    number of the Collator. This is a code version number for the collation code
    and algorithm. It is a combination of version numbers for the collation
    implementation, the Unicode Collation Algorithm data (which is the data that
    is used for characters that are not mentioned in a locale's specific
    collation elements), and the collation elements.

 #### Configuration and Management

 A major new feature in ICU 2.0 is the ability to link to different versions of
 ICU with the same program. Using this new feature, a program can keep using ICU
 1.8 collation, for example, while using ICU 2.0 for other services. ICU now can
 also be unloaded if needed, to free up resources, and then reloaded when it is
 needed.

 ### Namespace in C++

 ICU 2.0 introduced the use of a C++ namespace to avoid naming collision between
 ICU exported symbols and other libraries. All the public ICU C++ classes are
 defined in the "icu_VersionNumber::" namespace, which is also aliased as
 namespace "icu". Starting with ICU 2.0, including any public ICU C++ header by
 default includes a "using namespace icu_VersionNumber" statement. This is for
 backward compatibility, and should be turned off in favor of explicitly using
 `icu::UnicodeString` etc. (see [How To Use ICU](howtouseicu.md)). (If entry point
 renaming is turned off, then only the unversioned "icu" namespace is used.)

 Starting with ICU 49, ICU4C requires namespace support.

 ### Library Dependencies (C++)

 It is sometimes useful to see a dependency chart between the public ICU APIs and
 ICU libraries. This chart can be useful to people that are new to ICU or to
 people that want only certain ICU libraries.

 > :construction: **TODO**: The dependency chart is currently not available.

 Here are some things to realize about the chart.

 1. It gives a general overview of the ICU library dependencies.
 2. Internal dependencies, like the mutex API, are left out for clarity.
 3. Similar APIs were lumped together for clarity (e.g. Formatting). Some of
    these dependency details can be viewed from the ICU API reference.
 4. The descriptions of each API can be found in our [ICU API
    reference](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/)

 ### Code Dependencies (C++)

 Starting with ICU 49, the dependencies of code files (.o files compiled from
 .c/.cpp) are documented in
 [source/test/depstest/dependencies.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/depstest/dependencies.txt).
 Adjacent Python code is used to parse this file and to
 [verify](http://site.icu-project.org/processes/release/tasks/healthy-code#TOC-Check-library-dependencies)
 that it matches the actual dependencies of the code files.

 The dependency list can be used to build subset libraries. In addition, by
 reducing intra-library dependencies, the code size of statically linked ICU code
 has been reduced.

 ### ICU API categories

 ICU APIs, as defined in header and class files, are either "external" or
 "internal". External APIs are meant to be used by applications, while internal
 APIs should be used only within ICU. APIs are marked to indicate whether they
 are external or internal, as follows. Every external API has a lifecycle label,
 see below.

 #### External ICU4C APIs

 External ICU4C APIs are

 1. declared in header files in unicode folders and exported at build/install
    time to an `include/unicode` folder
 2. when C++ class members, are `public` or `protected`
 3. do not have an `@internal` label

 Exception: Layout engine header files are not in a unicode folder, although the
 public ones are still copied to the `include/unicode` folder at build/install
 time. External layout engine APIs are the ones that have lifecycle labels and
 not an `@internal` label.

 #### External ICU4J APIs

 External ICU4J APIs are

 1. declared in one of the ICU4J core packages (`com.ibm.icu.lang`,
    `com.ibm.icu.math`, `com.ibm.icu.text`, or `com.ibm.icu.util`).
 2. `public` or `protected` class members
 3. `public` or `protected` contained classes
 4. do not have an `@internal` label

 #### "System" APIs

 "System" APIs are external APIs that are intended only for special uses for
 system-level code, for example `u_cleanup()`. Normal users should not use them,
 although they are public and supported. System APIs have a `@system` label
 in addition to the lifecycle label that all external APIs have (see below).

 #### Internal APIs

 All APIs that do not fit any of the descriptions above are internal, which means
 that they are for ICU internal use only and may change at any time without
 notice. Some of them are member functions of public C++ or Java classes, and are
 "technically public but logistically internal" for implementation reasons;
 typically because programming languages don't provide sufficiently access
 control (without clumsy mechanisms). In this case, such APIs have an
 `@internal` label.

 ### ICU API compatibility

 As ICU develops, it adds external APIs - functions, classes, constants, and so
 on. Occasionally it is also necessary to remove or change external APIs. In
 order to make this work, we use the following process:

 For all API changes (and for significant/controversial/difficult implementation
 changes), we use proposals to announce and discuss them. A proposal is simply an
 email to the icu-design mailing list that details what is proposed to be
 changed, with an expiration date of typically a week. This gives all mailing
 list members a chance to review upcoming changes, and to discuss them. A
 proposal often changes significantly as a result of discussion. Most proposals
 will eventually find consensus among list members; otherwise, the ICU-TC decides
 what to do. If the addition or change of APIs would affect you, please subscribe
 to the main [icu-design mailing list](http://icu-project.org/contacts.html).

 When a **new API** is added to ICU, it **is marked as draft with a `@draft ICU
 x.y` label in the API documentation, **where x.y is the ICU version when the
 API *signature* was introduced or last changed**. A draft API is not guaranteed
 to be stable! Although we will not make gratuitous changes, sometimes the draft
 APIs turns out to be unsatisfactory in actual practice and may need to be
 changed or even removed. Changes of "draft" API are subject to the proposal
 process described above.

 **When a `@draft ICU x.y` API is changed, it must remain `@draft` and its version
 number must be updated.**

 In ICU4J 3.4.2 and earlier, `@draft` APIs were also marked with Java's `@deprecated`
 tag, so that uses of draft APIs in client code would be flagged by the compiler.
 These uses of the `@deprecated` tag were indicated with the comment “This is a
 draft API and might change in a future release of ICU.” Many clients found this
 confusing and/or undesireable, so ICU4J 3.4.3 no longer marks draft APIs with
 the `@deprecated` tag by default. For clients who prefer the earlier behavior,
 ICU4J provides an ant build target, `restoreDeprecated`, which will update the
 source files to use the `@deprecated` tag. Then clients can just rebuild the ICU4J
 jar as usual.

 When an API is judged to be stable and has not been changed for at least one ICU
 release, it is relabeled as stable with a `@stable ICU x.y**` label in the API
 documentation. A stable API is expected to be available in this form for a long
 time. The ICU version **x.y** indicates the last time the API *signature* was
 introduced or changed. **The promotion from `@draft ICU x.y` to `@stable ICU x.y`
 must not change the x.y version number.**

 We occasionally make an exception and allow adding new APIs marked as
 `@stable ICU x.y` APIs in the x.y release itself if we believe that they have to
 be stable. We might do this for enum constants that reflect 1:1 Unicode property
 aliases and property value aliases, for a Unicode upgrade in the x.y release.

 We sometimes **"broaden" a `@stable`** API function by changing its signature
 in a compatible way. For example, in Java, we might change an input parameter
 from a `String` to a `CharSequence`. In this case we keep the `@stable` but
 update the ICU version number indicating the function signature change.

 Even a stable API may eventually need to become deprecated or obsolete. Such
 APIs are strongly discouraged from use. Typically, an improved API is introduced
 at the time of deprecation/obsolescence of the old one.

 1. Use of deprecated APIs is strongly discouraged, but they are retained for
    backward compatibility. These are marked with labels like
    `@deprecated ICU x.y Use u_abc() instead.`. **The ICU version x.y shows the
    ICU release in which the API was first declared "deprecated".**
 2. In ICU4J, starting with release 57, a custom Javadoc tag `@discouraged`
    was added. While similar to `@deprecated` it is used when either ICU wants
    to discourage a particular API from use but the JDK hasn't deprecated it or
    ICU needs to keep it for compatibility reasons. These are marked with labels
    like `@discouraged ICU x.y. Use u_abc() instead.`.
 3. Obsolete APIs are are those whose continued retention will cause severe
    conflicts or user error, or whose continued support would be a very
    significant maintenance burden. We make every effort to keep these to a
    minimum. Obsolete APIs are marked with labels like `@obsolete ICU x.y. Use
    u_abc() instead since this API will be removed in that release.`.
    **The x.y indicates that we plan to remove it in ICU version x.y.**

 Stable C or Java APIs will not be obsoleted because doing so would break
 forward binary compatibility of the ICU library. Stable APIs may be
 deprecated, but they will be retained in the library.

 An "obsolete" API will remain unchanged until it is removed in the indicated
 ICU release, which will be usually one year after the API was declared
 obsolete. Sometimes we still keep it available for some time via a
 compile-time switch but stop maintaining it. In rare occasions, an API must
 be replaced right away because of naming conflicts or severe defects; in
 such cases we provide compile-time switches (`#ifdef` or other mechanisms) to
 select the old API.

 For example, here is how an API might be tagged in various versions:

 * **In ICU 0.2**: The API is newly introduced as a draft in this release.

   ```text
   @draft ICU 0.2
   f(x)
   ```

 * **In ICU 0.4**: The draft version number is updated, because the signature
   changed.

   ```text
   @draft ICU 0.4
   f(x, y)
   ```

 * **In ICU 0.6**: The API is promoted from draft to stable, but the version
   number does not change, as the signature is the same.

   ```text
   @stable ICU 0.4
   f(x, y)
   ```

 * **In ICU 1.0**: The API is "broadened" in a compatible way. For example,
   changing an input parameter from char to int or from some class to a base
   class. The signature is changed (so we update the ICU version number), but old
   calling code continues to work unchanged (so we retain @stable if that's what
   it was.)

   ```text
   @stable ICU 1.0
   f(xbase, y)
   ```

 * **In ICU 1.2**: The API is demoted to deprecated (or obsolete) status.

   ```text
   @deprecated ICU 1.2 Use g(x,y,z) instead.
   f(xbase, y)
   ```

   or, when this API is planned to be removed in ICU 1.4:

   ```text
   @obsolete ICU 1.4. Use g(x,y,z) instead.
   f(xbase, y)
   ```

 ### ICU Binary Compatibility

 *Using ICU as an Operating System Level Library*

 ICU4C may be configured for use as a system library in an environment where
 applications that are built with one version of ICU must continue to run without
 change with later versions of the ICU shared library.

 Here are the requirements for enabling binary compatibility for ICU4C:

 1. Applications must use only APIs that are marked as stable.
 2. Applications must use only plain C APIs, never C++.
 3. ICU must be built with function renaming disabled.
 4. Applications must be built using an ICU that was configured for binary
    compatibility.
 5. Use ICU version 3.0 or later.
 6. Provide both “common” and “i18n” libraries, or build a combined library.

 **Stable APIs Only.** APIs in the ICU library that are tagged as being stable
 will be maintained in future versions of the library. Stable functions will
 continue to exist with the same signature and the same meaning, allowing
 applications to continue to work without change.

 Stable APIs do not guarantee that the results from every function will always be
 completely identical between ICU versions (see the
 [Version Numbers in ICU](#version-numbers-in-icu) section above). Bugs may be
 fixed. The Unicode character data may change with new versions of the Unicode
 standard. Locale data may be updated or changed, yielding different results for
 operations like formatting or collation. Applications that require exact
 bit-for-bit, bug-for-bug compatibility of ICU results should not rely on ICU
 release-to-release binary compatibility, but should instead link against a
 specific version of ICU.

 To verify that an application uses only stable APIs, build it with the C
 preprocessor symbols `U_HIDE_DRAFT_API` and `U_HIDE_DEPRECATED_API` defined. This
 will produce build errors if any draft, deprecated or obsolete APIs are used. An
 operating system level installation of ICU may set this option permanently.

 **C APIs only.** Only plain C APIs remain compatible across ICU releases. The
 reason C++ binary compatibility is not supported is primarily because the design
 of C++ language and runtime environments present extreme technical difficulties
 to doing so. Stable C++ APIs are *source* compatible, but applications using
 them must be recompiled when moving between ICU releases.

 **Function renaming disabled.** Function renaming is an ICU feature that allows
 an application to explicitly link against a specific version of the ICU library,
 and to continue to use that version even when other ICU versions exist in the
 runtime environment. This is the exact opposite of release-to-release binary
 compatibility – instead of being able to transparently change ICU versions, an
 application is explicitly tied to one specific version.

 Function renaming is enabled by default, and must be disabled at ICU build time
 to enable release to release binary compatibility. To disable renaming, use the
 configure option

 ```shell
 configure -–disable-renaming [other configure options]
 ```

 (Configure options may also be passed to the runConfigureICU script.)

 To enable release-to-release binary compatibility, ICU must be built with
 `--disable-renaming`, *and* applications must be built using the headers and
 libraries that resulted from the `–-disable-renaming` ICU build

 **ICU Version 3.0 or Later.** Binary compatibility of ICU releases is supported
 beginning with ICU version 3.0. Older versions of ICU (2.8 and earlier) do not
 provide for binary compatibility between versions.

 **Provide both “common” and “i18n” libraries, or build a combined library.**
 It is rare but possible that services/APIs move from one library to another.
 For example, many years ago we moved the BreakIterator APIs from i18n to common,
 so that word titlecasing functions no longer needed separate code to find
 titlecasing or word break opportunities.

 More recently, the ListFormatter moved from the common library to i18n
 when its features grew beyond primitive patterns to also support
 FieldPosition and FormattedValue features.

 There is also a third, “io” library.
 It is possible that some of its functionality may be moved to the i18n or common
 libraries.
 (A likely candidate might be `operator<<(std::ostream& stream, const UnicodeString& s)`,
 although there are no actual plans to do so at the time of this writing.)

 One can build a combined library which provides the exports from
 both the “common” and “i18n” libraries,
 in order to provide a single library for linking against.

 This may be needed for some platforms where there is a strong relationship
 between an API and the library that implements it.
 For example, on Windows platforms, attempting to find an API that has been moved
 with a `LoadLibrary`/`GetProcAddress` approach will fail,
 unless you are using a combined library.

 #### Linking against multiple versions of ICU4C

 This section is intended to aid software developers who are implementing or
 integrating solutions based on ICU, that may need to consider having multiple
 versions of ICU running within the same executable (address space) at once.
 Typically, users of ICU are encouraged to update to the latest stable version.
 Under certain circumstances, however, behavior from earlier versions is desired,
 or else, an application is linking together code which is already built against
 a different version of ICU.

 The major and minor numbers are the first and second numbers in a version
 number, separated by a period. For example, in the version numbers 3.4.2.1,
 3.4.2, or 3.4, "3" is the major, and "4" is the minor. Normally, ICU employs
 "symbol renaming", such that the C function names and C++ object names are
 `#defined` to contain the major and minor numbers. So, for example, if your
 application calls the function `ucnv_open()`, it will link against
 `ucnv_open_3_4` if compiled against ICU 3.4, 3.4.2, or even 3.4.2.1. However, if
 compiled against ICU 3.8, the same code will link against `ucnv_open_3_8`.
 Similarly, `UnicodeString` is renamed to `UnicodeString_3_4`, etc. This is normally
 transparent to the user, however, if you inspect the symbols of the library or
 your code, you will see the modified symbols.

 If there are multiple versions of ICU being linked against in one application,
 it will need to link against all relevant libraries for each version, for
 example, common, i18n, and data. ICU uses standard library renaming, where, for
 example, `libicuuc.so` on one platform will actually be a symbolic link to
 `libicuuc.so.3.4`. When multiple ICU versions are used, the application may need
 to explicitly link against the exact versions of ICU being used.

 To disable renaming, build ICU with `--disable-renaming` passed to configure.
 Or, set the equivalent `#define U_DISABLE_RENAMING 1`. Renaming must be disabled
 both in the ICU build, and in the calling application.

 ### ICU Data Compatibility

 Starting in ICU 3.8 and later, the data library that comes with ICU is binary
 compatible and structurally compatible with versions of ICU with the same major
 and minor version, or a maintenance release. This allows multiple maintenance
 releases of ICU to share the same data, but generally the latest maintenance
 release of the data should be used.

 The binary compatibility of the data refers to the resource bundle binary format
 that is contains the locale data, charset conversion tables and other file
 formats supported by ICU. These binary formats are readable by many versions of
 ICU. For example, resource bundles written with ICU 3.6 are readable by ICU 3.8.

 The structural compatibility of the data refers to the structural contents of
 the ICU data. The structure of the locale data may change between reference
 releases, but the keys to reference specific types of data will be the same
 between maintenance releases. This means that resource keys to access data
 within resource bundles will work between maintenance releases of a specific
 reference release. For example, an ICU 3.8 calendar will be able to use ICU
 3.8.1 data, and vis versa; however ICU 3.6 may not be able to read ICU 3.8
 locale data. Generally, these keys are not accessible by ICU users because only
 the ICU implementation uses these resource keys.

 The contents of the data library may change between ICU maintenance releases and
 give you different results due to important updates and bug fixes. An example of
 an important update would be a timezone rule update for when a country changes
 when daylight saving time occurs. So the results may be different between
 maintenance releases.

 ### ICU4J Serialization Compatibility

 Starting in ICU4J 3.6, ICU4J stable API classes (marked as `@stable`) implementing
 `java.io.Serializable` support serialized objects to be deserialized by ICU4J 3.6
 or newer version of ICU4J. Some classes perform only shallow serialization,
 therefore, it is not guaranteed that a deserialized object behaves exactly same
 with the original object across ICU4J versions. Also, when it is difficult to
 maintain serialization compatibility in a certain class across different ICU4J
 versions for technical or other reasons, the ICU project committee may approve
 the breakage. In such event, a note explaining the compatibility issue will be
 posted in the ICU public mailing lists and also documented in the release note
 of the new ICU4J version introducing the incompatibility.