docs/userguide/dev/codingguidelines.md - external/github.com/unicode-org/icu - Git at Google

 ---
 layout: default
 title: Coding Guidelines
 nav_order: 1
 parent: Misc
 ---
 <!--
 © 2020 and later: Unicode, Inc. and others.
 License & terms of use: http://www.unicode.org/copyright.html
 -->

 # Coding Guidelines
 {: .no_toc }

 ## Contents
 {: .no_toc .text-delta }

 1. TOC
 {:toc}

 ---

 ## Overview

 This section provides the guidelines for developing C and C++ code, based on the
 coding conventions used by ICU programmers in the creation of the ICU library.

 ## Details about ICU Error Codes

 When calling an ICU API function and an error code pointer (C) or reference
 (C++), a `UErrorCode` variable is often passed in. This variable is allocated by
 the caller and must pass the test `U_SUCCESS()` before the function call.
 Otherwise, the function will not work. Normally, an error code variable is
 initialized by `U_ZERO_ERROR`.

 `UErrorCode` is passed around and used this way, instead of using C++ exceptions
 for the following reasons:

 * It is useful in the same form for C also
 * Some C++ compilers do not support exceptions

 > :point_right: **Note**: *This error code mechanism, in fact, works similar to
 > exceptions. If users call several ICU functions in a sequence, as soon as one
 > sets a failure code, the functions in the following example will not work. This
 > procedure prevents the API function from processing data that is not valid in
 > the sequence of function calls and relieves the caller from checking the error
 > code after each call. It is somewhat similar to how an exception terminates a
 > function block or try block early.*

 The following code shows the inside of an ICU function implementation:

 ```c++
 U_CAPI const UBiDiLevel * U_EXPORT2
 ubidi_getLevels(UBiDi *pBiDi, UErrorCode *pErrorCode) {
     int32_t start, length;

     if(U_FAILURE(*pErrorCode)) {
         return NULL;
     } else if(pBiDi==NULL || (length=pBiDi->length)<=0) {
         *pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
         return NULL;
     }

     ...
     return result;
 }
 ```

 Note: We have decided that we do not want to test for `pErrorCode==NULL`. Some
 existing code does this, but new code should not.

 Note: *Callers* (as opposed to implementers) of ICU APIs can simplify their code
 by defining and using a subclass of `icu::ErrorCode`. ICU implementers can use the
 `IcuTestErrorCode` class in intltest code.

 It is not necessary to check for `U_FAILURE()` immediately before calling a
 function that takes a `UErrorCode` parameter, because that function is supposed to
 check for failure. Exception: If the failure comes from objection allocation or
 creation, then you probably have a `NULL` object pointer and must not call any
 method on that object, not even one with a `UErrorCode` parameter.

 ### Sample Function with Error Checking

 ```c++
     U_CAPI int32_t U_EXPORT2
     uplrules_select(const UPluralRules *uplrules,   // Do not check
                                                     // "this"/uplrules vs. NULL.
                     double number,
                     UChar *keyword, int32_t capacity,
                     UErrorCode *status)             // Do not check status!=NULL.
     {
         if (U_FAILURE(*status)) {                   // Do check for U_FAILURE()
                                                     // before setting *status
             return 0;                               // or calling UErrorCode-less
                                                     // select(number).
         }
         if (keyword == NULL ? capacity != 0 : capacity < 0) {
                                                     // Standard destination buffer
                                                     // checks.
             *status = U_ILLEGAL_ARGUMENT_ERROR;
             return 0;
         }
         UnicodeString result = ((PluralRules*)uplrules)->select(number);
         return result.extract(keyword, capacity, *status);
     }
 ```

 ### New API Functions

 If the API function is non-const, then it should have a `UErrorCode` parameter.
 (Not the other way around: Some const functions may need a `UErrorCode` as well.)

 Default C++ assignment operators and copy constructors should not be used (they
 should be declared private and not implemented). Instead, define an `assign(Class
 &other, UErrorCode &errorCode)` function. Normal constructors are fine, and
 should have a `UErrorCode` parameter.

 ### Warning Codes

 Some `UErrorCode` values do not indicate a failure but an additional informational
 return value. Their enum constants have the `_WARNING` suffix and they pass the
 `U_SUCCESS()` test.

 However, experience has shown that they are problematic: They can get lost
 easily because subsequent function calls may set their own "warning" codes or
 may reset a `UErrorCode` to `U_ZERO_ERROR`.

 The source of the problem is that the `UErrorCode` mechanism is designed to mimic
 C++/Java exceptions. It prevents ICU function execution after a failure code is
 set, but like exceptions it does not work well for non-failure information
 passing.

 Therefore, we recommend to use warning codes very carefully:

 * Try not to rely on any warning codes.
 * Use real APIs to get the same information if possible.
   For example, when a string is completely written but cannot be
   NUL-terminated, then `U_STRING_NOT_TERMINATED_WARNING` indicates this, but so
   does the returned destination string length (which will have the same value
   as the destination capacity in this case). Checking the string length is
   safer than checking the warning code. (It is even safer to not rely on
   NUL-terminated strings but to use the length.)
 * If warning codes must be used, then the best is to set the `UErrorCode` to
   `U_ZERO_ERROR` immediately before calling the function in question, and to
   check for the expected warning code immediately after the function returns.

 Future versions of ICU will not introduce new warning codes, and will provide
 real API replacements for all existing warning codes.

 ### Bogus Objects

 Some objects, for example `UnicodeString` and `UnicodeSet`, can become "bogus". This
 is used when methods that create or modify the object fail (mostly due to an
 out-of-memory condition) but do not take a `UErrorCode` parameter and can
 therefore not otherwise report the failure.

 * A bogus object appears as empty.
 * A bogus object cannot be modified except with assignment-like functions.
 * The bogus state of one object does not transfer to another. For example,
   adding a bogus `UnicodeString` to a `UnicodeSet` does not make the set bogus.
   (It would be hard to make propagation consistent and test it well. Also,
   propagation among bogus states and error codes would be messy.)
 * If a bogus object is passed into a function that does have a `UErrorCode`
   parameter, then the function should set the `U_ILLEGAL_ARGUMENT_ERROR` code.

 ## API Documentation

 "API" means any public class, function, or constant.

 ### API status tag

 Aside from documenting an API's functionality, parameters, return values etc. we
 also mark every API with whether it is `@draft`, `@stable`, `@deprecated` or
 `@internal`. (Where `@internal` is used when something is not actually supported
 API but needs to be physically public anyway.) A new API is usually marked with
 "`@draft ICU 4.8`". For details of how we mark APIs see the "ICU API
 compatibility" section of the [ICU Architectural Design](../design.md) page. In
 Java, also see existing @draft APIs for complete examples.

 Functions that override a base class or interface definition take the API status
 of the base class function. For C++, use the `@copydoc base::function()` tag to
 copy both the description and the API status from the base function definition.
 For Java methods the status tags must be added by hand; use the `{@inheritDoc}`
 JavaDoc tag to pick up the rest of the base function documentation.
 Documentation should not be manually replicated in overriding functions; it is
 too hard to keep multiple copies synchronized.

 The policy for the treatment of status tags in overriding functions was
 introduced with ICU 64 for C++, and with ICU 59 for Java. Earlier code may
 deviate.

 ### Coding Example

 Coding examples help users to understand the usage of each API. Whenever
 possible, it is encouraged to embed a code snippet illustrating the usage of an
 API along with the functional specification.

 #### Embedding Coding Examples in ICU4J - JCite

 Since ICU4J 49M2, the ICU4J ant build target "doc" utilizes an external tool
 called [JCite](https://arrenbrecht.ch/jcite/). The tool allows us to cite a
 fragment of existing source code into JavaDoc comment using a tag. To embed a
 code snippet with the tag. For example,
 `{@.jcite com.ibm.icu.samples.util.timezone.BasicTimeZoneExample:---getNextTransitionExample}`
 will be replaced a fragment of code marked by comment lines
 `// ---getNextTransisionExample` in `BasicTimeZoneExample.java` in package
 `com.ibm.icu.samples.util.timezone`. When embedding code snippet using JCite, we
 recommend to follow next guidelines

 * A sample code should be placed in `<icu4j_root>/samples/src` directory,
   although you can cite any source fragment from source files in
   `<icu4j_root>/demos/src`, `<icu4j_root\>/main/core/*/src`,
   `<icu4j_root>/main/test/*/src`.
 * A sample code should use package name -
   `com.ibm.icu.samples.<subpackage>.<facility>`. `<subpackage>` is corresponding
   to the target ICU API class's package, that is, one of lang/math/text/util.
   `<facility>` is a name of facility, which is usually the base class of the
   service. For example, use package `com.ibm.icu.samples.text.dateformat` for
   samples related to ICU's date format service,
   `com.ibm.icu.samples.util.timezone` for samples related to time zone service.
 * A sample code should be self-contained as much as possible (use only JDK and
   ICU public APIs if possible). This allows readers to cut & paste a code
   snippet to try it out easily.
 * The citing comment should start with three consecutive hyphen followed by
   lower camel case token - for example, "`// ---compareToExample`"
 * Keep in mind that the JCite tag `{@.jcite ...}` is not resolved without JCite.
   It is encouraged to avoid placing code snippet within a sentence. Instead,
   you should place a code snippet using JCite in an independent paragraph.

 #### Embedding Coding Examples in ICU4C

 Also since ICU4C 49M2, ICU4C docs (using the [\\snippet command](http://www.doxygen.nl/manual/commands.html#cmdsnippet)
 which is new in Doxygen 1.7.5) can cite a fragment of existing sample or test code.

 Example in `ucnv.h`:

 ```c++
  /**
   * \snippet samples/ucnv/convsamp.cpp ucnv_open
   */
  ucnv_open( ... ) ...
 ```

 This cites code in `icu4c/source/samples/ucnv/convsamp.cpp` as follows:

 ```c++
   //! [ucnv_open]
   conv = ucnv_open("koi8-r", &status);
   //! [ucnv_open]
 ```

 Notice the tag "`ucnv_open`" which must be the same in all three places (in
 the header file, and twice in the cited file).

 ## C and C++ Coding Conventions Overview

 The ICU group uses the following coding guidelines to create software using the
 ICU C++ classes and methods as well as the ICU C methods.

 ### C/C++ Hiding Un-@stable APIs

 In C/C++, we enclose `@draft` and such APIs with `#ifndef U_HIDE_DRAFT_API` or
 similar as appropriate. When a draft API becomes stable, we need to remove the
 surrounding `#ifndef`.

 Note: The `@system` tag is *in addition to* the
 `@draft`/`@stable`/`@deprecated`/`@obsolete` status tag.

 Copy/paste the appropriate `#ifndef..#endif` pair from the following:

 ```c++
 #ifndef U_HIDE_DRAFT_API
 #endif  // U_HIDE_DRAFT_API

 #ifndef U_HIDE_DEPRECATED_API
 #endif  // U_HIDE_DEPRECATED_API

 #ifndef U_HIDE_OBSOLETE_API
 #endif  // U_HIDE_OBSOLETE_API

 #ifndef U_HIDE_SYSTEM_API
 #endif  // U_HIDE_SYSTEM_API

 #ifndef U_HIDE_INTERNAL_API
 #endif  // U_HIDE_INTERNAL_API
 ```

 We `#ifndef` `@draft`/`@deprecated`/... APIs as much as possible, including C
 functions, many C++ class methods (see exceptions below), enum constants (see
 exceptions below), whole enums, whole classes, etc.

 We do not `#ifndef` APIs where that would be problematic:

 * struct/class members where that would modify the object layout (non-static
   struct/class fields, virtual methods)
 * enum constants where that would modify the numeric values of following
   constants
   * actually, best to use `#ifndef` together with explicitly defining the
     numeric value of the next constant
 * C++ class boilerplate (e.g., default/copy constructors), if
   the compiler would auto-create public functions to replace `#ifndef`’ed ones
   * For example, the compiler automatically creates a default constructor if
     the class does not specify any other constructors.
 * private class members
 * definitions in internal/test/tools header files (that would be pointless;
   they should probably not have API tags in the first place)
 * forward or friend declarations
 * definitions that are needed for other definitions that would not be
   `#ifndef`'ed (e.g., for public macros or private methods)
 * platform macros (mostly in `platform.h`/`umachine.h` & similar) and
   user-configurable settings (mostly in `uconfig.h`)

 More handy copy-paste text:

 ```c++
     // Do not enclose the protected default constructor with #ifndef U_HIDE_INTERNAL_API
     // or else the compiler will create a public default constructor.

     // Do not enclose protected default/copy constructors with #ifndef U_HIDE_INTERNAL_API
     // or else the compiler will create public ones.
 ```

 ### C and C++ Type and Format Convention Guidelines

 The following C and C++ type and format conventions are used to maximize
 portability across platforms and to provide consistency in the code:

 #### Constants (#define, enum items, const)

 Use uppercase letters for constants. For example, use `UBREAKITERATOR_DONE`,
 `UBIDI_DEFAULT_LTR`, `ULESS`.

 For new enum types (as opposed to new values added to existing types), do not
 define enum types in C++ style. Instead, define C-style enums with U... type
 prefix and `U_`/`UMODULE_` constants. Define such enum types outside the ICU
 namespace and outside any C++ class. Define them in C header files if there are
 appropriate ones.

 #### Variables and Functions

 Use mixed-case letters that start with a lowercase letter for variables and
 functions. For example, use `getLength()`.

 #### Types (class, struct, enum, union)

 Use mixed-case that start with an uppercase letter for types. For example, use
 class `DateFormatSymbols`.

 #### Function Style

 Use the `getProperty()` and `setProperty()` style for functions where a lowercase
 letter begins the first word and the second word is capitalized without a space
 between it and the first word. For example, `UnicodeString`
 `getSymbol(ENumberFormatSymbol symbol)`,
 `void setSymbol(ENumberFormatSymbol symbol, UnicodeString value)` and
 `getLength()`, `getSomethingAt(index/offset)`.

 #### Common Parameter Names

 In order to keep function parameter names consistent, the following are
 recommendations for names or suffixes (usual "Camel case" applies):

 * "start": the index (of the first of several code units) in a string or array
 * "limit": the index (of the **first code unit after** a specified range) in a
   string or array (the number of units are (limit-start))
 * name the length (for the number of code units in a (range of a) string or
   array) either "length" or "somePrefixLength"
 * name the capacity (for the number of code units available in an output
   buffer) either "capacity" or "somePrefixCapacity"

 #### Order of Source/Destination Arguments

 Many ICU function signatures list source arguments before destination arguments,
 as is common in C++ and Java APIs. This is the preferred order for new APIs.
 (Example: `ucol_getSortKey(const UCollator *coll, const UChar *source,
 int32_t sourceLength, uint8_t *result, int32_t resultLength)`)

 Some ICU function signatures list destination arguments before source arguments,
 as is common in C standard library functions. This should be limited to
 functions that closely resemble such C standard library functions or closely
 related ICU functions. (Example: `u_strcpy(UChar *dst, const UChar *src)`)

 #### Order of Include File Includes

 Include system header files (like `<stdio.h>`) before ICU headers followed by
 application-specific ones. This assures that ICU headers can use existing
 definitions from system headers if both happen to define the same symbols. In
 ICU files, all used headers should be explicitly included, even if some of them
 already include others.

 Within a group of headers, place them in alphabetical order.

 #### Style for ICU Includes

 All ICU headers should be included using ""-style includes (like
 `"unicode/utypes.h"` or `"cmemory.h"`) in source files for the ICU library, tools,
 and tests.

 #### Pointer Conversions

 Do not cast pointers to integers or integers to pointers. Also, do not cast
 between data pointers and function pointers. This will not work on some
 compilers, especially with different sizes of such types. Exceptions are only
 possible in platform-specific code where the behavior is known.

 Please use C++-style casts, at least for pointers, for example `const_cast`.

 * For conversion between related types, for example from a base class to a
   subclass (when you *know* that the object is of that type), use
   `static_cast`. (When you are not sure if the object has the subclass type,
   then use a `dynamic_cast`; see a later section about that.)
 * Also use `static_cast`, not `reinterpret_cast`, for conversion from `void *`
   to a specific pointer type. (This is accepted and recommended because there
   is an implicit conversion available for the opposite conversion.) See
   [ICU-9434](https://unicode-org.atlassian.net/browse/ICU-9434) for details.
 * For conversion between unrelated types, for example between `char *` and
   `uint8_t *`, or between `Collator *` and `UCollator *`, use a
   `reinterpret_cast`.

 #### Returning a Number of Items

 To return a number of items, use `countItems()`, **not** `getItemCount()`, even if
 there is no need to actually count using that member function.

 #### Ranges of Indexes

 Specify a range of indexes by having start and limit parameters with names or
 suffix conventions that represent the index. A range should contain indexes from
 start to limit-1 such as an interval that is left-closed and right-open. Using
 mathematical notation, this is represented as: \[start..limit\[.

 #### Functions with Buffers

 Set the default value to -1 for functions that take a buffer (pointer) and a
 length argument with a default value so that the function determines the length
 of the input itself (for text, calling `u_strlen()`). Any other negative or
 undefined value constitutes an error.

 #### Primitive Types

 Primitive types are defined by the `unicode/utypes.h` file or a header file that
 includes other header files. The most common types are `uint8_t`, `uint16_t`,
 `uint32_t`, `int8_t`, `int16_t`, `int32_t`, `char16_t`,
 `UChar` (same as `char16_t`), `UChar32` (signed, 32-bit), and `UErrorCode`.

 The language built-in type `bool` and constants `true` and `false` may be used
 internally, for local variables and parameters of internal functions. The ICU
 type `UBool` must be used in public APIs and in the definition of any persistent
 data structures. `UBool` is guaranteed to be one byte in size and signed; `bool` is
 not.

 Traditionally, ICU4C has defined its own `FALSE`=0 / `TRUE`=1 macros for use with `UBool`.
 Starting with ICU 68 (2020q4), we no longer define these in public header files
 (unless `U_DEFINE_FALSE_AND_TRUE`=1),
 in order to avoid name collisions with code outside ICU defining enum constants and similar
 with these names.

 Instead, the versions of the C and C++ standards we require now do define type `bool`
 and values `false` & `true`, and we and our users can use these values.

 As of ICU 68, we are not changing ICU4C API from `UBool` to `bool`.
 Doing so in C API, or in structs that cross the library boundary,
 would break binary compatibility.
 Doing so only in other places in C++ could be confusingly inconsistent.
 We may revisit this.

 Note that the details of type `bool` (e.g., `sizeof`) depend on the compiler and
 may differ between C and C++.

 #### File Names (.h, .c, .cpp, data files if possible, etc.)

 Limit file names to 31 lowercase ASCII characters. (Older versions of MacOS have
 that length limit.)

 Exception: The layout engine uses mixed-case file names.

 (We have abandoned the 8.3 naming standard although we do not change the names
 of old header files.)

 #### Language Extensions and Standards

 Proprietary features, language extensions, or library functions, must not be
 used because they will not work on all C or C++ compilers.
 In Microsoft Visual C++, go to Project Settings(alt-f7)->All Configurations->
 C/C++->Customize and check Disable Language Extensions.

 Exception: some Microsoft headers will not compile without language extensions
 being enabled, which in turn requires some ICU files be built with language
 extensions.

 #### Tabs and Indentation

 Save files with spaces instead of tab characters (\\x09). The indentation size
 is 4.

 #### Documentation

 Use Java doc-style in-file documentation created with
 [doxygen](http://www.doxygen.org/) .

 #### Multiple Statements

 Place multiple statements in multiple lines. `if()` or loop heads must not be
 followed by their bodies on the same line.

 #### Placements of `{}` Curly Braces

 Place curly braces `{}` in reasonable and consistent locations. Each of us
 subscribes to different philosophies. It is recommended to use the style of a
 file, instead of mixing different styles. It is requested, however, to not have
 `if()` and loop bodies without curly braces.

 #### `if() {...}` and Loop Bodies

 Use curly braces for `if()` and else as well as loop bodies, etc., even if there
 is only one statement.

 #### Function Declarations

 Have one line that has the return type and place all the import declarations,
 extern declarations, export declarations, the function name, and function
 signature at the beginning of the next line.

 Function declarations need to be in the form `U_CAPI` return-type `U_EXPORT2` to
 satisfy all the compilers' requirements.

 For example, use the following
 convention:

 ```c++
 U_CAPI int32_t U_EXPORT2
 u_formatMessage(...);
 ```

 > :point_right: **Note**: The `U_CAPI`/`U_DEPRECATED` and `U_EXPORT2` qualifiers
 > are required for both the declaration and the definiton of *exported C and
 > static C++ functions*. Use `U_CAPI` (or `U_DEPRECATED`) before and `U_EXPORT2`
 > after the return type of *exported C and static C++ functions*.
 >
 > Internal functions that are visible outside a compilation unit need a `U_CFUNC`
 > before the return type.
 >
 > *Non-static C++ class member functions* do *not* get `U_CAPI`/`U_EXPORT2`
 > because they are exported and declared together with their class exports.

 > :point_right: **Note**: Before ICU 68 (2020q4) we used to use alternate qualifiers
 > like `U_DRAFT`, `U_STABLE` etc. rather than `U_CAPI`,
 > but keeping these in sync with API doc tags `@draft` and guard switches like `U_HIDE_DRAFT_API`
 > was tedious and error-prone and added no value.
 > Since ICU 68 (ICU-9961) we only use `U_CAPI` and `U_DEPRECATED`.

 #### Use Anonymous Namesapces or Static For File Scope

 Use anonymous namespaces or `static` for variables, functions, and constants that
 are not exported explicitly by a header file. Some platforms are confused if
 non-static symbols are not explicitly declared extern. These platforms will not
 be able to build ICU nor link to it.

 #### Using C Callbacks From C++ Code

 z/OS and Windows COM wrappers around ICU need `__cdecl` for callback functions.
 The reason is that C++ can have a different function calling convention from C.
 These callback functions also usually need to be private. So the following code

 ```c++
 UBool
 isAcceptable(void * /* context */,
              const char * /* type */, const char * /* name */,
              const UDataInfo *pInfo)
 {
     // Do something here.
 }
 ```

 should be changed to look like the following by adding `U_CDECL_BEGIN`, `static`,
 `U_CALLCONV` and `U_CDECL_END`.

 ```c++
 U_CDECL_BEGIN
 static UBool U_CALLCONV
 isAcceptable(void * /* context */,
              const char * /* type */, const char * /* name */,
              const UDataInfo *pInfo)
 {
     // Do something here.
 }
 U_CDECL_END
 ```

 #### Same Module and Functionality in C and in C++

 Determine if two headers are needed. If the same functionality is provided with
 both a C and a C++ API, then there can be two headers, one for each language,
 even if one uses the other. For example, there can be `umsg.h` for C and `msgfmt.h`
 for C++.

 Not all functionality has or needs both kinds of API. More and more
 functionality is available only via C APIs to avoid duplication of API,
 documentation, and maintenance. C APIs are perfectly usable from C++ code,
 especially with `UnicodeString` methods that alias or expose C-style string
 buffers.

 #### Platform Dependencies

 Use the platform dependencies that are within the header files that `utypes.h`
 files include. They are `platform.h` (which is generated by the configuration
 script from `platform.h.in`) and its more specific cousins like `pwin32.h` for
 Windows, which define basic types, and `putil.h`, which defines platform
 utilities.
 **Important:** Outside of these files, and a small number of implementation
 files that depend on platform differences (like `umutex.c`), **no** ICU source
 code may have **any** `#ifdef` **OperatingSystemName** instructions.

 #### Short, Unnested Mutex Blocks

 Do not use function calls within a mutex block for mutual-exclusion (mutex)
 blocks. This can prevent deadlocks from occurring later. There should be as
 little code inside a mutex block as possible to minimize the performance
 degradation from blocked threads.
 Also, it is not guaranteed that mutex blocks are re-entrant; therefore, they
 must not be nested.

 #### Names of Internal Functions

 Internal functions that are not declared static (regardless of inlining) must
 follow the naming conventions for exported functions because many compilers and
 linkers do not distinguish between library exports and intra-library visible
 functions.

 #### Which Language for the Implementation

 Write implementation code in C++. Use objects very carefully, as always:
 Implicit constructors, assignments etc. can make simple-looking code
 surprisingly slow.

 For every C API, make sure that there is at least one call from a pure C file in
 the cintltst test suite.

 Background: We used to prefer C or C-style C++ for implementation code because
 we used to have users ask for pure C. However, there was never a large, usable
 subset of ICU that was usable without any C++ dependencies, and C++ can(!) make
 for much shorter, simpler, less error-prone and easier-to-maintain code, for
 example via use of "smart pointers" (`unicode/localpointer.h` and `cmemory.h`).

 We still try to expose most functionality via *C APIs* because of the
 difficulties of binary compatible C++ APIs exported from DLLs/shared libraries.

 #### No Compiler Warnings

 ICU must compile without compiler warnings unless such warnings are verified to
 be harmless or bogus. Often times a warning on one compiler indicates a breaking
 error on another.

 #### Enum Values

 When casting an integer value to an enum type, the enum type *should* have a
 constant with this integer value, or at least it *must* have a constant whose
 value is at least as large as the integer value being cast, with the same
 signedness. For example, do not cast a -1 to an enum type that only has
 non-negative constants. Some compilers choose the internal representation very
 tightly for the defined enum constants, which may result in the equivalent of a
 `uint8_t` representation for an enum type with only small, non-negative constants.
 Casting a -1 to such a type may result in an actual value of 255. (This has
 happened!)

 When casting an enum value to an integer type, make sure that the enum value's
 numeric value is within range of the integer type.

 #### Do not check for `this!=NULL`, do not check for `NULL` references

 In public APIs, assume `this!=0` and assume that references are not 0. In C code,
 `"this"` is the "service object" pointer, such as `set` in
 `uset_add(USet* set, UChar32 c)` — don't check for `set!=NULL`.

 We do usually check all other (non-this) pointers for `NULL`, in those cases when
 `NULL` is not valid. (Many functions allow a `NULL` string or buffer pointer if the
 length or capacity is 0.)

 Rationale: `"this"` is not really an argument, and checking it costs a little bit
 of code size and runtime. Other libraries also commonly do not check for valid
 `"this"`, and resulting failures are fairly obvious.

 ### Memory Usage

 #### Dynamically Allocated Memory

 ICU4C APIs are designed to allow separate heaps for its libraries vs. the
 application. This is achieved by providing factory methods and matching
 destructors for all allocated objects. The C++ API uses a common base class with
 overridden `new`/`delete` operators and/or forms an equivalent pair with `createXyz()`
 factory methods and the `delete` operator. The C API provides pairs of `open`/`close`
 functions for each service. See the C++ and C guideline sections below for
 details.

 Exception: Most C++ API functions that return a `StringEnumeration` (by pointer
 which the caller must delete) are named `getXyz()` rather than `createXyz()`
 because `"get"` is much more natural. (These are not factory methods in the sense
 of `NumberFormat::createScientificInstance()`.) For example,
 `static StringEnumeration *Collator::``get``Keywords(UErrorCode &)`. We should document
 clearly in the API comments that the caller must delete the returned
 `StringEnumeration`.

 #### Declaring Static Data

 All unmodifiable data should be declared `const`. This includes the pointers and
 the data itself. Also if you do not need a pointer to a string, declare the
 string as an array. This reduces the time to load the library and all its
 pointers. This should be done so that the same library data can be shared across
 processes automatically. Here is an example:

 ```c++
 #define MY_MACRO_DEFINED_STR "macro string"
 const char *myCString = "myCString";
 int16_t myNumbers[] = {1, 2, 3};
 ```

 This should be changed to the following:

 ```c++
 static const char MY_MACRO_DEFINED_STR[] = "macro string";
 static const char myCString[] = "myCString";
 static const int16_t myNumbers[] = {1, 2, 3};
 ```

 #### No Static Initialization

 The most common reason to have static initialization is to declare a
 `static const UnicodeString`, for example (see `utypes.h` about invariant characters):

 ```c++
 static const UnicodeString myStr("myStr", "");
 ```

 The most portable and most efficient way to declare ASCII text as a Unicode
 string is to do the following instead:

 ```c++
 static const UChar myStr[] = { 0x6D, 0x79, 0x53, 0x74, 0x72, 0}; /* "myStr" */
 ```

 We do not use character literals
 for Unicode characters and strings because the execution character set of C/C++
 compilers is almost never Unicode and may not be ASCII-compatible (especially on
 EBCDIC platforms). Depending on the API where the string is to be used, a
 terminating NUL (0) may or may not be required. The length of the string (number
 of `UChar`s in the array) can be determined with `sizeof(myStr)/U_SIZEOF_UCHAR`,
 (subtract 1 for the NUL if present). Always remember to put in a comment at the
 end of the declaration what the Unicode string says.

 Static initialization of C++ objects **must not be used** in ICU libraries
 because of the following reasons:

 1. It leads to intractable order-of-initialization dependencies.
 2. It makes it difficult or impossible to release all of the libraries
    resources. See `u_cleanup()`.
 3. It takes time to initialize the library.
 4. Dependency checking is not completely done in C or C++. For instance, if an
    ICU user creates an ICU object or calls an ICU function statically that
    depends on static data, it is not guaranteed that the statically declared
    data is initialized.
 5. Certain users like to manage their own memory. They can not manage ICU's
    memory properly because of item #2.
 6. It is easier to debug code that does not use static initialization.
 7. Memory allocated at static initialization time is not guaranteed to be
    deallocated with a C++ destructor when the library is unloaded. This is a
    problem when ICU is unloaded and reloaded into memory and when you are using
    a heap debugging tool. It would also not work with the `u_cleanup()` function.
 8. Some platforms cannot handle static initialization or static destruction
    properly. Several compilers have this random bug (even in the year 2001).

 ICU users can use the `U_STRING_DECL` and `U_STRING_INIT` macros for C strings. Note
 that on some platforms this will incur a small initialization cost (simple
 conversion). Also, ICU users need to make sure that they properly and
 consistently declare the strings with both macros. See `ustring.h` for details.

 ### C++ Coding Guidelines

 This section describes the C++ specific guidelines or conventions to use.

 #### Portable Subset of C++

 ICU uses only a portable subset of C++ for maximum portability. Also, it does
 not use features of C++ that are not implemented well in all compilers or are
 cumbersome. In particular, ICU does not use exceptions, or the Standard Template
 Library (STL).

 We have started to use templates in ICU 4.2 (e.g., `StringByteSink`) and ICU 4.4
 (`LocalPointer` and some internal uses). We try to limit templates to where they
 provide a lot of benefit (robust code, avoid duplication) without much or any
 code bloat.

 We continue to not use the Standard Template Library (STL) in ICU library code
 because its design causes a lot of code bloat. More importantly:

 * Exceptions: STL classes and algorithms throw exceptions. ICU does not throw
   exceptions, and ICU code is not exception-safe.
 * Memory management: STL uses default new/delete, or Allocator parameters
   which create different types; they throw out-of-memory exceptions. ICU
   memory allocation is customizable and must not throw exceptions.
 * Non-polymorphic: For APIs, STL classes are also problematic because
   different template specializations create different types. For example, some
   systems use custom string classes (different allocators, different
   strategies for buffer sharing vs. copying), and ICU should be able to
   interface with most of them.

 We have started to use compiler-provided Run-Time Type Information (RTTI) in ICU
 4.6. It is now required for building ICU, and encouraged for using ICU where
 RTTI is needed. For example, use `dynamic_cast<DecimalFormat*>` on a
 `NumberFormat` pointer that is usually but not always a `DecimalFormat` instance.
 Do not use `dynamic_cast<>` on a reference, because that throws a `bad_cast`
 exception on failure.

 ICU uses a limited form of multiple inheritance equivalent to Java's interface
 mechanism: All but one base classes must be interface/mixin classes, i.e., they
 must contain only pure virtual member functions. For details see the
 'boilerplate' discussion below. This restriction to at most one base class with
 non-virtual members eliminates problems with the use and implementation of
 multiple inheritance in C++. ICU does not use virtual base classes.

 > :point_right: **Note**: Every additional base class, *even an interface/mixin
 class*, adds another vtable pointer to each subclass object, that is, it
 *increases the object/instance size by 8 bytes* on most platforms.

 #### Classes and Members

 C++ classes and their members do not need a 'U' or any other prefix.

 #### Global Operators

 Global operators (operators that are not class members) can be problematic for
 library entry point versioning, may confuse users and cannot be easily ported to
 Java (ICU4J). They should be avoided if possible.

 ~~The issue with library entry point versioning is that on platforms that do not
 support namespaces, users must rename all classes and global functions via
 urename.h. This renaming process is not possible with operators.~~ Starting with
 ICU 49, we require C++ namespace support. However, a global operator can be used
 in ICU4C (when necessary) if its function signature contains an ICU C++ class
 that is versioned. This will result in a mangled linker name that does contain
 the ICU version number via the versioned name of the class parameter. For
 example, ICU4C 2.8 added an operator + for `UnicodeString`, with two `UnicodeString`
 reference parameters.

 #### Virtual Destructors

 In classes with virtual methods, destructors must be explicitly declared, and
 must be defined (implemented) outside the class definition in a .cpp file.

 More precisely:

 1. All classes with any virtual members or any bases with any virtual members
    should have an explicitly declared virtual destructor.
 2. Constructors and destructors should be declared and/or defined prior to
    *any* other methods, public or private, within the class definition.
 3. All virtual destructors should be defined out-of-line, and in a .cpp file
    rather than a header file.

 This is so that the destructors serve as "key functions" so that the compiler
 emits the vtable in only and exactly the desired files. It can help make
 binaries smaller that use statically-linked ICU libraries, because the compiler
 and linker can prove more easily that some code is not used.

 The Itanium C++ ABI (which is used on all x86 Linux) says: "The virtual table
 for a class is emitted in the same object containing the definition of its key
 function, i.e. the first non-pure virtual function that is not inline at the
 point of class definition. If there is no key function, it is emitted everywhere
 used."

 (This was first done in ICU 49; see [ticket #8454](https://unicode-org.atlassian.net/browse/ICU-8454.)

 #### Namespaces

 Beginning with ICU version 2.0, ICU uses namespaces. The actual namespace is
 `icu_M_N` with M being the major ICU release number and N being the minor ICU
 release number. For convenience, the namespace `icu` is an alias to the current
 release-specific one. (The actual namespace name is `icu` itself if renaming is
 turned off.)

 Starting with ICU 49, we require C++ namespace support.

 Class declarations, even forward declarations, must be scoped to the ICU
 namespace. For example:

 ```c++
 U_NAMESPACE_BEGIN

 class Locale;

 U_NAMESPACE_END

 // outside U_NAMESPACE_BEGIN..U_NAMESPACE_END
 extern void fn(icu::UnicodeString&);

 // outside U_NAMESPACE_BEGIN..U_NAMESPACE_END
 // automatically set by utypes.h
 // but recommended to be not set automatically
 U_NAMESPACE_USE
 Locale loc("fi");
 ```

 `U_NAMESPACE_USE` (expands to using namespace icu_M_N; when available) is
 automatically done when `utypes.h` is included, so that all ICU classes are
 immediately usable. However, we recommend that you turn this off via
 `CXXFLAGS="-DU_USING_ICU_NAMESPACE=0"`.

 #### Declare Class APIs

 Class APIs need to be declared like either of the following:

 #### Inline-Implemented Member Functions

 Class member functions are usually declared but not inline-implemented in the
 class declaration. A long function implementation in the class declaration makes
 it hard to read the class declaration.

 It is ok to inline-implement *trivial* functions in the class declaration.
 Pretty much everyone agrees that inline implementations are ok if they fit on
 the same line as the function signature, even if that means bending the
 single-statement-per-line rule slightly:

 ```c++
 T *orphan() { T *p=ptr; ptr=NULL; return p; }
 ```

 Most people also agree that very short multi-line implementations are ok inline
 in the class declaration. Something like the following is probably the maximum:

 ```c++
 Value *getValue(int index) {
     if(index>=0 && index<fLimit) {
         return fArray[index];
     }
     return NULL;
 }
 ```

 If the inline implementation is longer than that, then just declare the function
 inline and put the actual inline implementations after the class declaration in
 the same file. (See `unicode/unistr.h` for many examples.)

 If it's significantly longer than that, then it's probably not a good candidate
 for inlining anyway.

 #### C++ class layout and 'boilerplate'

 There are different sets of requirements for different kinds of C++ classes. In
 general, all instantiable classes (i.e., all classes except for interface/mixin
 classes and ones with only static member functions) inherit the `UMemory` base
 class. `UMemory` provides `new`/`delete` operators, which allows to keep the ICU
 heap separate from the application heap, or to customize ICU's memory allocation
 consistently.

 > :point_right: **Note**: Public ICU APIs must return or orphan only C++ objects
 that are to be released with `delete`. They must not return allocated simple
 types (including pointers, and arrays of simple types or pointers) that would
 have to be released with a `free()` function call using the ICU library's heap.
 Simple types and pointers must be returned using fill-in parameters (instead of
 allocation), or cached and owned by the returning API.

 **Public ICU C++ classes** must inherit either the `UMemory` or the `UObject`
 base class for proper memory management, and implement the following common set
 of 'boilerplate' functions:

 * default constructor
 * copy constructor
 * assignment operator
 * operator==
 * operator!=

 > :point_right: **Note**: Each of the above either must be implemented, verified
 that the default implementation according to the C++ standard will work
 (typically not if any pointers are used), or declared private without
 implementation.

 * If public subclassing is intended, then the public class must inherit
   `UObject` and should implement
   * `clone()`
 * **RTTI:**
   * If a class is a subclass of a parent (e.g., `Format`) with ICU's "poor
     man's RTTI" (Run-Time Type Information) mechanism (via
     `getDynamicClassID()` and `getStaticClassID()`) then add that to the new
     subclass as well (copy implementations from existing C++ APIs).
   * If a class is a new, immediate subclass of `UObject` (e.g.,
     `Normalizer2`), creating a whole new class hierarchy, then declare a
     *private* `getDynamicClassID()` and define it to return `NULL` (to
     override the pure virtual version in `UObject`); copy the relevant lines
     from `normalizer2.h` and `normalizer2.cpp`
     (`UOBJECT_DEFINE_NO_RTTI_IMPLEMENTATION(className)`). Do not add any
     "poor man's RTTI" at all to subclasses of this class.

 **Interface/mixin classes** are equivalent to Java interfaces. They are as much
 multiple inheritance as ICU uses — they do not decrease performance, and they do
 not cause problems associated with multiple base classes having data members.
 Interface/mixin classes contain only pure virtual member functions, and must
 contain an empty virtual destructor. See for example the `UnicodeMatcher` class.
 Interface/mixin classes must not inherit any non-interface/mixin class,
 especially not `UMemory` or `UObject`. Instead, implementation classes must inherit
 one of these two (or a subclass of them) in addition to the interface/mixin
 classes they implement. See for example the `UnicodeSet` class.

 **Static classes** contain only static member functions and are therefore never
 instantiated. They must not inherit `UMemory` or `UObject`. Instead, they must
 declare a private default constructor (without any implementation) to prevent
 instantiation. See for example the `LESwaps` layout engine class.

 **C++ classes internal to ICU** need not (but may) implement the boilerplate
 functions as mentioned above. They must inherit at least `UMemory` if they are
 instantiable.

 #### Make Sure The Compiler Uses C++

 The `__cplusplus` macro being defined ensures that the compiler uses C++. Starting
 with ICU 49, we use this standard predefined macro.

 Up until ICU 4.8 we used to define and use `XP_CPLUSPLUS` but that was redundant
 and did not add any value because it was defined if-and-only-if `__cplusplus` was
 defined.

 #### Adoption of Objects

 Some constructors and factory functions take pointers to objects that they
 adopt. The newly created object contains a pointer to the adoptee and takes over
 ownership and lifecycle control. If an error occurs while creating the new
 object (and thus in the code that adopts an object), then the semantics used
 within ICU must be *adopt-on-call* (as opposed to, for example,
 adopt-on-success):

 * **General**: A constructor or factory function that adopts an object does so
   in all cases, even if an error occurs and a `UErrorCode` is set. This means
   that either the adoptee is deleted immediately or its pointer is stored in
   the new object. The former case is most common when the constructor or
   factory function is called and the `UErrorCode` already indicates a failure.
   In the latter case, the new object must take care of deleting the adoptee
   once it is deleted itself regardless of whether or not the constructor was
   successful.

 * **Constructors**: The code that creates the object with the new operator
   must check the resulting pointer returned by new and delete any adoptees if
   it is 0 because the constructor was not called. (Typically, a `UErrorCode`
   must be set to `U_MEMORY_ALLOCATION_ERROR`.)

   **Pitfall**: If you allocate/construct via "`ClassName *p = new ClassName(adoptee);`"
   and the memory allocation failed (`p==NULL`), then the
   constructor has not been called, the adoptee has not been adopted, and you
   are still responsible for deleting it!

 * **Factory functions (createInstance())**: The factory function must set a
   `U_MEMORY_ALLOCATION_ERROR` and delete any adoptees if it cannot allocate the
   new object. If the construction of the object fails otherwise, then the
   factory function must delete it and the factory function must delete its
   adoptees. As a result, a factory function always returns either a valid
   object and a successful `UErrorCode`, or a 0 pointer and a failure `UErrorCode`.
   A factory function returns a pointer to an object that must be deleted by
   the user/owner.

 Example: (This is a best-practice example. It does not reflect current `Calendar`
 code.)

 ```c++
 Calendar*
 Calendar::createInstance(TimeZone* zone, UErrorCode& errorCode) {
     LocalPointer<TimeZone> adoptedZone(zone);
     if(U_FAILURE(errorCode)) {
         // The adoptedZone destructor deletes the zone.
         return NULL;
     }
     // since the Locale isn't specified, use the default locale
     LocalPointer<Calendar> c(new GregorianCalendar(zone, Locale::getDefault(), errorCode));
     if(c.isNull()) {
         errorCode = U_MEMORY_ALLOCATION_ERROR;
         // The adoptedZone destructor deletes the zone. return NULL;
     } else if(U_FAILURE(errorCode)) {
         // The c destructor deletes the Calendar.
         return NULL;
     } // c adopted the zone. adoptedZone.orphan();
     return c.orphan();
 }
 ```

 #### Memory Allocation

 All ICU C++ class objects directly or indirectly inherit `UMemory` (see
 'boilerplate' discussion above) which provides `new`/`delete` operators, which in
 turn call the internal functions in `cmemory.c`. Creating and releasing ICU C++
 objects with `new`/`delete` automatically uses the ICU allocation functions.

 > :point_right: **Note**: Remember that (in absence of explicit :: scoping) C++
 determines which `new`/`delete` operator to use from which type is allocated or
 deleted, not from the context of where the statement is. Since non-class data
 types (like `int`) cannot define their own `new`/`delete` operators, C++ always
 uses the global ones for them by default.

 When global `new`/`delete` operators are to be used in the application (never inside
 ICU!), then they should be properly scoped as e.g. `::new`, and the application
 must ensure that matching `new`/`delete` operators are used. In some cases where
 such scoping is missing in non-ICU code, it may be simpler to compile ICU
 without its own `new`/`delete` operators. See `source/common/unicode/uobject.h` for
 details.

 In ICU library code, allocation of non-class data types — simple integer types
 **as well as pointers** — must use the functions in `cmemory.h`/`.c` (`uprv_malloc()`,
 `uprv_free()`, `uprv_realloc()`). Such memory objects must be released inside ICU,
 never by the user; this is achieved either by providing a "close" function for a
 service or by avoiding to pass ownership of these objects to the user (and
 instead filling user-provided buffers or returning constant pointers without
 passing ownership).

 The `cmemory.h`/`.c` functions can be overridden at ICU compile time for custom
 memory management. By default, `UMemory`'s `new`/`delete` operators are
 implemented by calling these common functions. Overriding the `cmemory.h`/`.c`
 functions changes the memory management for both C and C++.

 C++ objects that were either allocated with new or returned from a `createXYZ()`
 factory method must be deleted by the user/owner.

 #### Memory Allocation Failures

 All memory allocations and object creations should be checked for success. In
 the event of a failure (a `NULL` returned), a `U_MEMORY_ALLOCATION_ERROR` status
 should be returned by the ICU function in question. If the allocation failure
 leaves the ICU service in an invalid state, such that subsequent ICU operations
 could also fail, the situation should be flagged so that the subsequent
 operations will fail cleanly. Under no circumstances should a memory allocation
 failure result in a crash in ICU code, or cause incorrect results rather than a
 clean error return from an ICU function.

 Some functions, such as the C++ assignment operator, are unable to return an ICU
 error status to their caller. In the event of an allocation failure, these
 functions should mark the object as being in an invalid or bogus state so that
 subsequent attempts to use the object will fail. Deletion of an invalid object
 should always succeed.

 #### Memory Management

 C++ memory management is error-prone, and memory leaks are hard to avoid, but
 the following helps a lot.

 First, if you can stack-allocate an object (for example, a `UnicodeString` or
 `UnicodeSet`), do so. It is the easiest way to manage object lifetime.

 Inside functions, avoid raw pointers to owned objects. Instead, use
 [LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html)`<UnicodeString>`
 or `LocalUResouceBundlePointer` etc., which is ICU's "smart pointer"
 implementation. This is the "[Resource Acquisition Is Initialization(RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)"
 idiom. The "smart pointer" auto-deletes the object when it goes out of scope,
 which means that you can just return from the function when an error occurs and
 all auto-managed objects are deleted. You do not need to remember to write an
 increasing number of "`delete xyz;`" at every function exit point.

 *In fact, you should almost never need to write "delete" in any function.*

 * Except in a destructor where you delete all of the objects which the class
   instance owns.
 * Also, in static "cleanup" functions you still need to delete cached objects.

 When you pass on ownership of an object, for example to return the pointer of a
 newly built object, or when you call a function which adopts your object, use
 `LocalPointer`'s `.orphan()`.

 * Careful: When you return an object or pass it into an adopting factory
   method, you can use `.orphan()` directly.
 * However, when you pass it into an adopting constructor, you need to pass in
   the `.getAlias()`, and only if the *allocation* of the new owner succeeded
   (you got a non-NULL pointer for that) do you `.orphan()` your `LocalPointer`.
 * See the `Calendar::createInstance()` example above.
 * See the `AlphabeticIndex` implementation for live examples. Search for other
   uses of `LocalPointer`/`LocalArray`.

 Every object must always be deletable/destructable. That is, at a minimum, all
 pointers to owned memory must always be either NULL or point to owned objects.

 Internally:

 [cmemory.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/cmemory.h)
 defines the `LocalMemory` class for chunks of memory of primitive types which
 will be `uprv_free()`'ed.

 [cmemory.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/cmemory.h)
 also defines `MaybeStackArray` and `MaybeStackHeaderAndArray` which automate
 management of arrays.

 Use `CharString`
 ([charstr.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/charstr.h))
 for `char *` strings that you build and modify.

 #### Global Inline Functions

 Global functions (non-class member functions) that are declared inline must be
 made static inline. Some compilers will export symbols that are declared inline
 but not static.

 #### No Declarations in the for() Loop Head

 Iterations through `for()` loops must not use declarations in the first part of
 the loop. There have been two revisions for the scoping of these declarations
 and some compilers do not comply to the latest scoping. Declarations of loop
 variables should be outside these loops.

 #### Common or I18N

 Decide whether or not the module is part of the common or the i18n API
 collection. Use the appropriate macros. For example, use
 `U_COMMON_IMPLEMENTATION`, `U_I18N_IMPLEMENTATION`, `U_COMMON_API`, `U_I18N_API`.
 See `utypes.h`.

 #### Constructor Failure

 If there is a reasonable chance that a constructor fails (For example, if the
 constructor relies on loading data), then either it must use and set a
 `UErrorCode` or the class needs to support an `isBogus()`/`setToBogus()` mechanism
 like `UnicodeString` and `UnicodeSet`, and the constructor needs to set the object
 to bogus if it fails.

 #### `UVector`, `UVector32`, or `UVector64`

 Use `UVector` to store arrays of `void *`; use `UVector32` to store arrays of
 `int32_t`; use `UVector64` to store arrays of `int64_t`. Historically, `UVector`
 has stored either `int32_t` or `void *`, but now storing `int32_t` in a `UVector`
 is deprecated in favor of `UVector32`.

 ### C Coding Guidelines

 This section describes the C-specific guidelines or conventions to use.

 #### Declare and define C APIs with both `U_CAPI` and `U_EXPORT2`

 All C APIs need to be **both declared and defined** using the `U_CAPI` and
 `U_EXPORT2` qualifiers.

 ```c++
 U_CAPI int32_t U_EXPORT2
 u_formatMessage(...);
 ```

 > :point_right: **Note**: Use `U_CAPI` before and `U_EXPORT2` after the return
 type of exported C functions. Internal functions that are visible outside a
 compilation unit need a `U_CFUNC` before the return type.

 #### Subdivide the Name Space

 Use prefixes to avoid name collisions. Some of those prefixes contain a 3- (or
 sometimes 4-) letter module identifier. Very general names like
 `u_charDirection()` do not have a module identifier in their prefix.

 * For POSIX replacements, the (all lowercase) POSIX function names start with
   "u_": `u_strlen()`.
 * For other API functions, a 'u' is appended to the beginning with the module
   identifier (if appropriate), and an underscore '_', followed by the
   **mixed-case** function name. For example, use `u_charDirection()`,
   `ubidi_setPara()`.
 * For types (struct, enum, union), a "U" is appended to the beginning, often
   "`U<module identifier>`" directly to the typename, without an underscore. For
   example, use `UComparisonResult`.
 * For #defined constants and macros, a "U_" is appended to the beginning,
   often "`U<module identifier>_`" with an underscore to the uppercase macro
   name. For example, use `U_ZERO_ERROR`, `U_SUCCESS()`. For example, `UNORM_NFC`

 #### Functions for Constructors and Destructors

 Functions that roughly compare to constructors and destructors are called
 `umod_open()` and `umod_close()`. See the following example:

 ```c++
 CAPI UBiDi * U_EXPORT2
 ubidi_open();

 CAPI UBiDi * U_EXPORT2
 ubidi_openSized(UTextOffset maxLength, UTextOffset maxRunCount);

 CAPI void U_EXPORT2
 ubidi_close(UBiDi *pBiDi);
 ```

 Each successful call to a `umod_open()` returns a pointer to an object that must
 be released by the user/owner by calling the matching `umod_close()`.

 #### C "Service Object" Types and LocalPointer Equivalents

 For every C "service object" type (equivalent to C++ class), we want to have a
 [LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html)
 equivalent, so that C++ code calling the C API can use the specific "smart
 pointer" to implement the "[Resource Acquisition Is Initialization
 (RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)"
 idiom.

 For example, in `ubidi.h` we define the `UBiDi` "service object" type and also
 have the following "smart pointer" definition which will call `ubidi_close()` on
 destruction:

 ```c++
 // Use config switches like this only after including unicode/utypes.h
 // or another ICU header.
 #if U_SHOW_CPLUSPLUS_API

 U_NAMESPACE_BEGIN

 /**
  * class LocalUBiDiPointer
  * "Smart pointer" class, closes a UBiDi via ubidi_close().
  * For most methods see the LocalPointerBase base class.
  *
  * @see LocalPointerBase
  * @see LocalPointer
  * @stable ICU 4.4
  */
 U_DEFINE_LOCAL_OPEN_POINTER(LocalUBiDiPointer, UBiDi, ubidi_close);

 U_NAMESPACE_END

 #endif
 ```

 #### Inline Implementation Functions

 Some, but not all, C compilers allow ICU users to declare functions inline
 (which is a C++ language feature) with various keywords. This has advantages for
 implementations because inline functions are much safer and more easily debugged
 than macros.

 ICU *used to* use a portable `U_INLINE` declaration macro that can be used for
 inline functions in C. However, this was an unnecessary platform dependency.

 We have changed all code that used `U_INLINE` to C++ (.cpp) using "inline", and
 removed the `U_INLINE` definition.

 If you find yourself constrained by .c, change it to .cpp.

 All functions that are declared inline, or are small enough that an optimizing
 compiler might inline them even without the inline declaration, should be
 defined (implemented) – not just declared – before they are first used. This is
 to enable as much inlining as possible, and also to prevent compiler warnings
 for functions that are declared inline but whose definition is not available
 when they are called.

 #### C Equivalents for Classes with Multiple Constructors

 In cases like `BreakIterator` and `NumberFormat`, instead of having several
 different 'open' APIs for each kind of instances, use an enum selector.

 #### Source File Names

 Source file names for C begin with a 'u'.

 #### Memory APIs Inside ICU

 For memory allocation in C implementation files for ICU, use the functions and
 macros in `cmemory.h`. When allocated memory is returned from a C API function,
 there must be a corresponding function (like a `ucnv_close()`) that deallocates
 that memory.

 All memory allocations in ICU should be checked for success. In the event of a
 failure (a `NULL` returned from `uprv_malloc()`), a `U_MEMORY_ALLOCATION_ERROR` status
 should be returned by the ICU function in question. If the allocation failure
 leaves the ICU service in an invalid state, such that subsequent ICU operations
 could also fail, the situation should be flagged so that the subsequent
 operations will fail cleanly. Under no circumstances should a memory allocation
 failure result in a crash in ICU code, or cause incorrect results rather than a
 clean error return from an ICU function.

 #### // Comments

 C++ style // comments may be used in plain C files and in headers that will be
 included in C files.

 ## Source Code Strings with Unicode Characters

 ### `char *` strings in ICU

 | Declared type | encoding | example | Used with |
 | --- | --- | --- | --- |
 | `char *` | varies with platform | `"Hello"` | Most ICU API functions taking `char *` parameters. Unless otherwise noted, characters are restricted to the "Invariant" set, described below |
 | `char *` | UTF-8 |  `u8"¡Hola!"` | Only functions that are explicitly documented as expecting UTF-8. No restrictions on the characters used. |
 | `UChar *` | UTF-16 | `u"¡Hola!"` | All ICU functions with `UChar *` parameters |
 | `UChar32` | Code Point value | `U'😁'` | UChar32 single code point constant. |
 | `wchar_t` | unknown | `L"Hello"` | Not used with ICU. Unknown encoding, unknown size, not portable. |

 ICU source files are UTF-8 encoded, allowing any Unicode character to appear in
 Unicode string or character literals, without the need for escaping. But, for
 clarity, use escapes when plain text would be confusing, e.g. for invisible
 characters.

 For convenience, ICU4C tends to use `char *` strings in places where only
 "invariant characters" (a portable subset of the 7-bit ASCII repertoire) are
 used. This allows locale IDs, charset names, resource bundle item keys and
 similar items to be easily specified as string literals in the source code. The
 same types of strings are also stored as "invariant character" `char *` strings
 in the ICU data files.

 ICU has hard coded mapping tables in `source/common/putil.c` to convert invariant
 characters to and from Unicode without using a full ICU converter. These tables
 must match the encoding of string literals in the ICU code as well as in the ICU
 data files.

 > :point_right: **Note**: Important: ICU assumes that at least the invariant
 characters always have the same codes as is common on platforms with the same
 charset family (ASCII vs. EBCDIC). **ICU has not been tested on platforms where
 this is not the case.**

 Some usage of `char *` strings in ICU assumes the system charset instead of
 invariant characters. Such strings are only handled with the default converter
 (See the following section). The system charset is usually a superset of the
 invariant characters.

 The following are the ASCII and EBCDIC byte values for all of the invariant
 characters (see also `unicode/utypes.h`):

 | Character(s) | ASCII | EBCDIC |
 | --- | --- | --- |
 | a..i | 61..69 | 81..89 |
 | j..r | 6A..72 | 91..99 |
 | s..z | 73..7A | A2..A9 |
 | A..I | 41..49 | C1..C9 |
 | J..R | 4A..52 | D1..D9 |
 | S..Z | 53..5A | E2..E9 |
 | 0..9 | 30..39 | F0..F9 |
 | (space) | 20 | 40 |
 | " | 22 | 7F |
 | % | 25 | 6C |
 | & | 26 | 50 |
 | ' | 27 | 7D |
 | ( | 28 | 4D |
 | ) | 29 | 5D |
 | \* | 2A | 5C |
 | + | 2B | 4E |
 | , | 2C | 6B |
 | - | 2D | 60 |
 | . | 2E | 4B |
 | / | 2F | 61 |
 | : | 3A | 7A |
 | ; | 3B | 5E |
 | < | 3C | 4C |
 | = | 3D | 7E |
 | > | 3E | 6E |
 | ? | 3F | 6F |
 | _ | 5F | 6D |

 ### Rules Strings with Unicode Characters

 In order to include characters in source code strings that are not part of the
 invariant subset of ASCII, one has to use character escapes. In addition, rules
 strings for collation, etc. need to follow service-specific syntax, which means
 that spaces and ASCII punctuation must be quoted using the following rules:

 * Single quotes delineate literal text: `a'>'b` => `a>b`
 * Two single quotes, either between or outside of single quoted text, indicate
   a literal single quote:
   * `a''b` => `a'b`
   * `a'>''<'b` => `a>'<b`
 * A backslash precedes a single literal character:
 * Several standard mechanisms are handled by `u_unescape()` and its variants.

 > :point_right: **Note**: All of these quoting mechanisms are supported by the
 `RuleBasedTransliterator`. The single quote mechanisms (not backslash, not
 `u_unescape()`) are supported by the format classes. In its infancy,
 `ResourceBundle` supported the `\uXXXX` mechanism and nothing else.
 This quoting method is the current policy. However, there are modules within
 the ICU services that are being updated and this quoting method might not have
 been applied to all of the modules.

 ## Java Coding Conventions Overview

 The ICU group uses the following coding guidelines to create software using the
 ICU Java classes and methods.

 ### Code style

 The standard order for modifier keywords on APIs is:

 * `public static final synchronized strictfp`
 * `public abstract`

 Do not use wild card import, such as "`import java.util.*`". The sort order of
 import statements is `java` / `javax` / `org` / `com`. Within each top level package
 category, sub packages and classes are sorted by alphabetical order. We
 recommend ICU developers to use the Eclipse IDE feature \[Source\] - \[Organize
 Imports\] (Ctrl+Shift+O) to organize import statements.

 All if/else/for/while/do loops use braces, even if the controlled statement is a
 single line. This is for clarity and to avoid mistakes due to bad nesting of
 control statements, especially during maintenance.

 Tabs should not be present in source files.

 Indentation is 4 spaces.

 Make sure the code is formatted cleanly with regular indentation. Follow Java
 style code conventions, e.g., don't put multiple statements on a single line,
 use mixed-case identifiers for classes and methods and upper case for constants,
 and so on.

 Java source formatting rules described above is coming with the Eclipse project
 file. It is recommended to run \[Source\] - \[Format\] (Ctrl+Shift+F) on Eclipse
 IDE to clean up source files if necessary.

 Use UTF-8 encoding (without BOM) for java source files.

 Javadoc should be complete and correct when code is checked in, to avoid playing
 catch-up later during the throes of the release. Please javadoc all methods, not
 just external APIs, since this helps with maintenance.

 ### Code organization

 Avoid putting more than one top-level class in a single file. Either use
 separate files or nested classes.

 Always define at least one constructor in a public API class. The Java compiler
 automatically generates no-arg constructor when a class has no explicit
 constructors. We cannot provide proper API documentations for such default
 constructors.

 Do not mix test, tool, and runtime code in the same file. If you need some
 access to private or package methods or data, provide public accessors for them
 and mark them `@internal`. Test code should be placed in `com.ibm.icu.dev.test`
 package, and tools (e.g., code that generates data, source code, or computes
 constants) in `com.ibm.icu.dev.tool` package. Occasionally for very simple cases
 you can leave a few lines of tool code in the main source and comment it out,
 but maintenance is easier if you just comment the location of the tools in the
 source and put the actual code elsewhere.

 Avoid creating new interfaces unless you know you need to mix the interface into
 two or more classes that have separate inheritance. Interfaces are impossible to
 modify later in a backwards-compatible way. Abstract classes, on the other hand,
 can add new methods with default behavior. Use interfaces only if it is required
 by the architecture, not just for expediency.

 Current releases of ICU4J (since ICU 63) are restricted to use Java SE 7 APIs
 and language features.

 ### ICU Packages

 Public APIs should be placed in `com.ibm.icu.text`, `com.ibm.icu.util`, and
 `com.ibm.icu.lang`. For historical reasons and for easier migration from JDK
 classes, there are also APIs in `com.ibm.icu.math` but new APIs should not be
 added there.

 APIs used only during development, testing, or tools work should be placed in
 `com.ibm.icu.dev`.

 A class or method which is used by public APIs (listed above) but which is not
 itself public can be placed in different places:

 1. If it is only used by one class, make it private in that class.
 2. If it is only used by one class and its subclasses, make it protected in
    that class. In general, also tag it `@internal` unless you are working on a
    class that supports user-subclassing (rare).
 3. If it is used by multiple classes in one package, make it package private
    (also known as default access) and mark it `@internal`.
 4. If it is used by multiple packages, make it public and place the class in
    `the com.ibm.icu.impl` package.

 ### Error Handling and Exceptions

 Errors should be indicated by throwing exceptions, not by returning “bogus”
 values.

 If an input parameter is in error, then a new
 `IllegalArgumentException("description")` should be thrown.

 Exceptions should be caught only when something must be done, for example
 special cleanup or rethrowing a different exception. If the error “should never
 occur”, then throw a `new RuntimeException("description")` (rare). In this case,
 a comment should be added with a justification.

 Use exception chaining: When an exception is caught and a new one created and
 thrown (usually with additional information), the original exception should be
 chained to the new one.

 A catch expression should not catch Throwable. Catch expressions should specify
 the most specific subclass of Throwable that applies. If there are two concrete
 subclasses, both should be specified in separate catch statements.

 ### Binary Data Files

 ICU4J uses the same binary data files as ICU4C, in the big-endian/ASCII form.
 The `ICUBinary` class should be used to read them.

 Some data sources (for example, compressed Jar files) do not allow the use of
 several `InputStream` and related APIs:

 * Memory mapping is efficient, but not available for all data sources.
 * Do not depend on `InputStream.available()`: It does not provide reliable
   information for some data sources. Instead, the length of the data needs to
   be determined from the data itself.
 * Do not call `mark()` and `reset()` methods on `InputStream` without wrapping the
   `InputStream` object in a new `BufferedInputStream` object. These methods are
   not implemented by the `ZipInputStream` class, and their use may result in an
   `IOException`.

 ### Compiler Warnings

 There should be no compiler warnings when building ICU4J. It is recommended to
 develop using Eclipse, and to fix any problems that are shown in the Eclipse
 Problems panel (below the main window).

 When a warning is not avoidable, you should add `@SuppressWarnings` annotations
 with minimum scope.

 ### Miscellaneous

 Objects should not be cast to a class in the `sun.*` packages because this would
 cause a `SecurityException` when run under a `SecurityManager`. The exception needs
 to be caught and default action taken, instead of propagating the exception.

 ## Adding .c, .cpp and .h files to ICU

 In order to add compilable files to ICU, add them to the source code control
 system in the appropriate folder and also to the build environment.

 To add these files, use the following steps:

 1. Choose one of the ICU libraries:
    * The common library provides mostly low-level utilities and basic APIs that
      often do not make use of Locales. Examples are APIs that deal with character
      properties, the Locale APIs themselves, and ResourceBundle APIs.
    * The i18n library provides Locale-dependent and -using APIs, such as for
      collation and formatting, that are most useful for internationalized user
      input and output.
 2. Put the source code files into the folder `icu/source/library-name`, then add
    them to the build system:
    * For most platforms, add the expected .o files to
      `icu/source/library-name/Makefile.in`, to the OBJECTS variable. Add the
      **public** header files to the HEADERS variable.
    * For Microsoft Visual C++ 6.0, add all the source code files to
      `icu/source/library-name/library-name.dsp`. If you don't have Visual C++, add
      the filenames to the project file manually.
 3. Add test code to `icu/source/test/cintltest` for C APIs and to
    `icu/source/test/intltest` for C++ APIs.
 4. Make sure that the API functions are called by the test code (100% API
    coverage) and that at least 85% of the implementation code is exercised by
    the tests (>=85% code coverage).
 5. Create test code for C using the `log_err()`, `log_info()`, and `log_verbose()`
    APIs from `cintltst.h` (which uses `ctest.h`) and check it into the appropriate
    folder.
 6. In order to get your C test code called, add its top level function and a
    descriptive test module path to the test system by calling `addTest()`. The
    function that makes the call to `addTest()` ultimately must be called by
    `addAllTests()` in `calltest.c`. Groups of tests typically have a common
    `addGroup()` function that calls `addTest()` for the test functions in its
    group, according to the common part of the test module path.
 7. Add that test code to the build system also. Modify `Makefile.in` and the
    appropriate `.dsp` file (For example, the file for the library code).

 ## C Test Suite Notes

 The cintltst Test Suite contains all the tests for the International Components
 for Unicode C API. These tests may be automatically run by typing "cintltst" or
 "cintltst -all" at the command line. This depends on the C Test Services:
 `cintltst` or `cintltst -all`.

 ### C Test Services

 The purpose of the test services is to enable the writing of tests entirely in
 C. The services have been designed to make creating tests or converting old ones
 as simple as possible with a minimum of services overhead. A sample test file,
 "demo.c", is included at the end of this document. For more information
 regarding C test services, please see the `icu4c/source/tools/ctestfw` directory.

 ### Writing Test Functions

 The following shows the possible format of test functions:

 ```c++
 void some_test()
 {
 }
 ```

 Output from the test is accomplished with three printf-like functions:

 ```c++
 void log_err ( const char *fmt, ... );
 void log_info ( const char *fmt, ... );
 void log_verbose ( const char *fmt, ... );
 ```

 * `log_info()` writes to the console for informational messages.
 * `log_verbose()` writes to the console ONLY if the VERBOSE flag is turned
   on (or the `-v` option to the command line). This option is useful for
   debugging. By default, the VERBOSE flag is turned OFF.
 * `log_error()` can be called when a test failure is detected. The error is
   then logged and error count is incremented by one.

 To use the tests, link them into a hierarchical structure. The root of the
 structure will be allocated by default.

 ```c++
 TestNode *root = NULL; /* empty */
 addTest( &root, &some_test, "/test");
 ```

 Provide `addTest()` with the function pointer for the function that performs the
 test as well as the absolute 'path' to the test. Paths may be up to 127 chars in
 length and may be used to group tests.

 The calls to `addTest` must be placed in a function or a hierarchy of functions
 (perhaps mirroring the paths). See the existing cintltst for more details.

 ### Running the Tests

 A subtree may be extracted from another tree of tests for the programmatic
 running of subtests.

 ```c++
 TestNode* sub;
 sub = getTest(root, "/mytests");
 ```

 And a tree of tests may be run simply by:

 ```c++
 runTests( root ); /* or 'sub' */
 ```

 Similarly, `showTests()` lists out the tests. However, it is easier to use the
 command prompt with the Usage specified below.

 ### Globals

 The command line parser resets the error count and prints a summary of the
 failed tests. But if `runTest` is called directly, for instance, it needs to be
 managed manually. `ERROR_COUNT` contains the number of times `log_err` was
 called. `runTests` resets the count to zero before running the tests.
 `VERBOSITY` must be 1 to display `log_verbose()` data. Otherwise, `VERBOSITY`
 must be set to 0 (default).

 ### Building cintltst

 To compile this test suite using Microsoft Visual C++ (MSVC), follow the
 instructions in `icu4c/source/readme.html#HowToInstall` for building the `allC`
 workspace. This builds the libraries as well as the `cintltst` executable.

 ### Executing cintltst

 To run the test suite from the command line, change the directories to
 `icu4c/source/test/cintltst/Debug` for the debug build (or
 `icu4c/source/test/cintltst/Release` for the release build) and then type `cintltst`.

 ### cintltst Usage

 Type `cintltst -h` to view its command line parameters.

 ```text
 ### Syntax:
 ### Usage: [ -l ] [ -v ] [ -verbose] [-a] [ -all] [-n]
  [-no_err_msg] [ -h] [ /path/to/test ]
 ### -l To get a list of test names
 ### -all To run all the test
 ### -a To run all the test(same as -all)
 ### -verbose To turn ON verbosity
 ### -v To turn ON verbosity(same as -verbose)
 ### -h To print this message
 ### -n To turn OFF printing error messages
 ### -no_err_msg (same as -n)
 ### -[/subtest] To run a subtest
 ### For example to run just the utility tests type: cintltest /tsutil)
 ### To run just the locale test type: cintltst /tsutil/loctst
 ###

 /******************** sample ctestfw test ********************
 ********* Simply link this with libctestfw or ctestfw.dll ****
 ************************* demo.c *****************************/

 #include "stdlib.h"
 #include "ctest.h"
 #include "stdio.h"
 #include "string.h"

 /**
 * Some sample dummy tests.
 * the statics simply show how often the test is called.
 */
 void mytest()
 {
     static i = 0;
     log_info("I am a test[%d]\n", i++);
 }

 void mytest_err()
 {
     static i = 0;
     log_err("I am a test containing an error[%d]\n", i++);
     log_err("I am a test containing an error[%d]\n", i++);
 }

 void mytest_verbose()
 {
     /* will only show if verbose is on (-v) */
     log_verbose("I am a verbose test, blabbing about nothing at
 all.\n");
 }

 /**
 * Add your tests from this function
 */

 void add_tests( TestNode** root )
 {
     addTest(root, &mytest, "/apple/bravo" );
     addTest(root, &mytest, "/a/b/c/d/mytest");
     addTest(root, &mytest_err, "/d/e/f/h/junk");
     addTest(root, &mytest, "/a/b/c/d/another");
     addTest(root, &mytest, "/a/b/c/etest");
     addTest(root, &mytest_err, "/a/b/c");
     addTest(root, &mytest, "/bertrand/andre/damiba");
     addTest(root, &mytest_err, "/bertrand/andre/OJSimpson");
     addTest(root, &mytest, "/bertrand/andre/juice/oj");
     addTest(root, &mytest, "/bertrand/andre/juice/prune");
     addTest(root, &mytest_verbose, "/verbose");

 }

 int main(int argc, const char *argv[])
 {
     TestNode *root = NULL;

     add_tests(&root); /* address of root ptr- will be filled in */

     /* Run the tests. An int is returned suitable for the OS status code.
     (0 for success, neg for parameter errors, positive for the # of
     failed tests) */
     return processArgs( root, argc, argv );
 }
 ```

 ## C++ IntlTest Test Suite Documentation

 The IntlTest suite contains all of the tests for the C++ API of International
 Components for Unicode. These tests may be automatically run by typing `intltest`
 at the command line. Since the verbose option prints out a considerable amount
 of information, it is recommended that the output be redirected to a file:
 `intltest -v > testOutput`.

 ### Building IntlTest

 To compile this test suite using MSVC, follow the instructions for building the
 `alCPP` (All C++ interfaces) workspace. This builds the libraries as well as the
 `intltest` executable.

 ### Executing IntelTest

 To run the test suite from the command line, change the directories to
 `icu4c/source/test/intltest/Debug`, then type: `intltest -v >testOutput`. For the
 release build, the executable will reside in the
 `icu4c/source/test/intltest/Release` directory.

 ### IntelTest Usage

 Type just `intltest -h` to see the usage:

 ```text
 ### Syntax:
 ### IntlTest [-option1 -option2 ...] [testname1 testname2 ...]
 ### where options are: verbose (v), all (a), noerrormsg (n),
 ### exhaustive (e) and leaks (l).
 ### (Specify either -all (shortcut -a) or a test name).
 ### -all will run all of the tests.
 ###
 ### To get a list of the test names type: intltest LIST
 ### To run just the utility tests type: intltest utility
 ###
 ### Test names can be nested using slashes ("testA/subtest1")
 ### For example to list the utility tests type: intltest utility/LIST
 ### To run just the Locale test type: intltest utility/LocaleTest
 ###
 ### A parameter can be specified for a test by appending '@' and the value
 ### to the testname.
 ```

 ## C: Testing with Fake Time

 The "Fake Time" capability allows ICU4C to be tested as if the hardware clock is
 set to a specific time. This section documents how to use this facility.
 Note that this facility requires the POSIX `'gettimeofday'` function to be
 operable.

 This facility affects all ICU 'current time' calculations, including date,
 calendar, time zone formats, and relative formats. It doesn't affect any calls
 directly to the underlying operating system.

 1. Build ICU with the **`U_DEBUG_FAKETIME`** preprocessor macro set. This can
    be accomplished with the following line in a file
    **icu/source/icudefs.local** :

    ```shell
    CPPFLAGS+=-DU_DEBUG_FAKETIME
    ```

 2. Determine the `UDate` value (the time value in milliseconds ± Midnight, Jan 1,
    1970 GMT) which you want to use as the target. For this sample we will use
    the value `28800000`, which is Midnight, Pacific Standard Time 1/1/1970.
 3. Set the environment variable `U_FAKETIME_START=28800000`
 4. Now, the first time ICU checks the current time, it will start at midnight
    1/1/1970 (pacific time) and roll forward. So, at the end of 10 seconds of
    program runtime, the clock will appear to be at 12:00:10.
 5. You can test this by running the utility '`icuinfo -m`' which will print out
    the 'Milliseconds since Epoch'.
 6. You can also test this by running the cintltest test
    `/tsformat/ccaltst/TestCalendar` in verbose mode which will print out the
    current time:

    ```shell
    $ make check ICUINFO_OPTS=-m U_FAKETIME_START=28800000 CINTLTST_OPTS=-v
    /tsformat/ccaltst/TestCalendar
    U_DEBUG_FAKETIME was set at compile time, so the ICU clock will start at a
    preset value
    env variable U_FAKETIME_START=28800000 (28800000) for an offset of
    -1281957858861 ms from the current time 1281986658861
    PASS: The current date and time fetched is Thursday, January 1, 1970 12:00:00
    ```

 ## C: Threading Tests

 Threading tests for ICU4C functions should be placed in under utility /
 `MultithreadTest`, in the file `intltest/tsmthred.h` and `.cpp`. See the existing
 tests in this file for examples.

 Tests from this location are automatically run under the [Thread
 Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)
 (TSAN) in the ICU continuous build system. TSAN will reliably detect race
 conditions that could possibly occur, however improbable that occurrence might
 be normally.

 Data races are one of the most common and hardest to debug types of bugs in
 concurrent systems. A data race occurs when two threads access the same variable
 concurrently and at least one of the accesses is write. The C++11 standard
 officially bans data races as undefined behavior.

 ## Binary Data Formats

 ICU services rely heavily on data to perform their functions. Such data is
 available in various more or less structured text file formats, which make it
 easy to update and maintain. For high runtime performance, most data items are
 pre-built into binary formats, i.e., they are parsed and processed once and then
 stored in a format that is used directly during processing.

 Most of the data items are pre-built into binary files that are then installed
 on a user's machine. Some data can also be built at runtime but is not
 persistent. In the latter case, a primary object should be built once and then
 cloned to avoid the multiple parsing, processing, and building of the same data.

 Binary data formats for ICU must be portable across platforms that share the
 same endianness and the same charset family (ASCII vs. EBCDIC). It would be
 possible to handle data from other platform types, but that would require
 load-time or even runtime conversion.

 ### Data Types

 Binary data items are memory-mapped, i.e., they are used as readonly, constant
 data. Their structures must be portable according to the criteria above and
 should be efficiently usable at runtime without building additional runtime data
 structures.

 Most native C/C++ data types cannot be used as part of binary data formats
 because their sizes are not fixed across compilers. For example, an int could be
 16/32/64 or even any other number of bits wide. Only types with absolutely known
 widths and semantics must be used.

 Use for example:

 * `uint8_t`, `uint16_t`, `int32_t` etc.
 * `UBool`: same as `int8_t`
 * `UChar`: for 16-bit Unicode strings
 * `UChar32`: for Unicode code points
 * `char`: for "invariant characters", see `utypes.h`

 > :point_right: **Note**: ICU assumes that `char` is an 8-bit byte but makes no
 assumption about its signedness.

 **Do not use** for example:

 * `short`, `int`, `long`, `unsigned int` etc.: undefined widths
 * `float`, `double`: undefined formats
 * `bool`: undefined width and signedness
 * `enum`: undefined width and signedness
 * `wchar_t`: undefined width, signedness and encoding/charset

 Each field in a binary/mappable data format must be aligned naturally. This
 means that a field with a primitive type of size n bytes must be at an n-aligned
 offset from the start of the data block. `UChar` must be 2-aligned, `int32_t` must
 be 4-aligned, etc.

 It is possible to use struct types, but one must make sure that each field is
 naturally aligned, without possible implicit field padding by the compiler —
 assuming a reasonable compiler.

 ```c++
 // bad because i will be preceded by compiler-dependent padding
 // for proper alignment
 struct BadExample {
     UBool flag;
     int32_t i;
 };

 // ok with explicitly added padding or generally conscious
 // sequence of types
 struct OKExample {
     UBool flag;
     uint8_t pad[3];
     int32_t i;
 };
 ```

 Within the binary data, a `struct` type field must be aligned according to its
 widest member field. The struct `OKExample` must be 4-aligned because it contains
 an `int32_t` field. Make padding explicit via additional fields, rather than
 letting the compiler choose optional padding.

 Another potential problem with `struct` types, especially in C++, is that some
 compilers provide RTTI for all classes and structs, which inserts a `_vtable`
 pointer before the first declared field. When using `struct` types with
 binary/mappable data in C++, assert in some place in the code that `offsetof` the
 first field is 0. For an example see the genpname tool.

 ### Versioning

 ICU data files have a `UDataHeader` structure preceding the actual data. Among
 other fields, it contains a `formatVersion` field with four parts (one `uint8_t`
 each). It is best to use only the first (major) or first and second
 (major/minor) fields in the runtime code to determine binary compatibility,
 i.e., reject a data item only if its `formatVersion` contains an unrecognized
 major (or major/minor) version number. The following parts of the version should
 be used to indicate variations in the format that are backward compatible, or
 carry other information.

 For example, the current `uprops.icu` file's `formatVersion` (see the genprops tool
 and `uchar.c`/`uprops.c`) is set to indicate backward-incompatible changes with the
 major version number, backward-compatible additions with the minor version
 number, and shift width constants for the `UTrie` data structure in the third and
 fourth version numbers (these could change independently of the `uprops.icu`
 format).

 ## C/C++ Debugging Hints and Tips

 ### Makefile-based platforms

 * use `Makefile.local` files (override of `Makefile`), or `icudefs.local` (at the
   top level, override of `icudefs.mk`) to avoid the need to modify
   change-controlled source files with debugging information.
   * Example: **`CPPFLAGS+=-DUDATA_DEBUG`** in common to enable data
     debugging
   * Example: **`CINTLTST_OPTS=/tscoll`** in the cintltst directory provides
     arguments to the cintltest test upon make check, to only run collation
     tests.
     * intltest: `INTLTEST_OPTS`
     * cintltst: `CINTLTST_OPTS`
     * iotest: `IOTEST_OPTS`
     * icuinfo: `ICUINFO_OPTS`
     * (letest does not have an OPTS variable as of ICU 4.6.)

 ### Windows/Microsoft Visual Studio

 The following addition to autoexp.dat will cause **`UnicodeString`**s to be
 visible as strings in the debugger without expanding sub-items:

 ```text
 ;; Copyright (C) 2010 IBM Corporation and Others. All Rights Reserved.
 ;; ICU Additions
 ;; Add to {VISUAL STUDIO} \Common7\Packages\Debugger\autoexp.dat
 ;;   in the [autoexpand] section just before the final [hresult] section.
 ;;
 ;; Need to change 'icu_##' to the current major+minor (so icu_46 for 4.6.1 etc)

 icu_46::UnicodeString {
     preview        (
               #if($e.fFlags & 2)   ; stackbuffer
                (
                   #(
                 "U= '",
                 [$e.fUnion.fStackBuffer, su],
                 "', len=",
                 [$e.fShortLength, u]
                 ;[$e.fFields.fArray, su]
                )
               )
               #else
                (
                   #(
                 "U* '",
                 [$e.fUnion.fFields.fArray, su],
                 "', len=",
                 [$e.fShortLength, u]
                 ;[$e.fFields.fArray, su]
                )
               )
             )

     stringview    (
               #if($e.fFlags & 2)   ; stackbuffer
                (
                   #(
                 "U= '",
                 [$e.fUnion.fStackBuffer, su],
                 "', len=",
                 [$e.fShortLength, u]
                 ;[$e.fFields.fArray, su]
                )
               )
               #else
                (
                   #(
                 "U* '",
                 [$e.fUnion.fFields.fArray, su],
                 "', len=",
                 [$e.fShortLength, u]
                 ;[$e.fFields.fArray, su]
                )
               )
             )

 }
 ;;;
 ;;; End ICU Additions
 ;;;
 ```