| --- |
| layout: default |
| title: ICU FAQ |
| nav_order: 6 |
| parent: Misc |
| --- |
| <!-- |
| © 2020 and later: Unicode, Inc. and others. |
| License & terms of use: http://www.unicode.org/copyright.html |
| --> |
| |
| # ICU FAQs |
| {: .no_toc } |
| |
| ## Contents |
| {: .no_toc .text-delta } |
| |
| 1. TOC |
| {:toc} |
| |
| --- |
| |
| ## Introduction to ICU |
| |
| #### What is ICU? |
| |
| ICU is a cross-platform Unicode based globalization library. It includes support |
| for locale-sensitive string comparison, date/time/number/currency/message |
| formatting, text boundary detection, character set conversion and so on. |
| |
| #### Where can I get ICU? |
| |
| You can get ICU4C and ICU4J from <http://www.icu-project.org/download/> |
| |
| **Why don't you build binaries for my platform?** |
| |
| There are many versions of compilers on so many platforms that we cannot build |
| them all and guarantee compatibility between them all even on the same platform. |
| Due to these restrictions, we only distribute a limited number of binary |
| versions of ICU, but we will assist in building other versions from source. |
| |
| **Why don't you provide project files for my MSVC version (MSVC 2008, etc)?** |
| |
| You can use the Cygwin build environment to build ICU from source against the |
| MSVC compiler. See the ICU4C Readme. |
| |
| #### How do I install the binary versions of ICU? |
| |
| * **Windows**: |
| * The DLLs you may need for your application are located in |
| **bin\\icuXX##.dll**, where "XX" are two letters (such as "uc" for the |
| "common" library, "in" for the "i18n" library, etc.) and ## is the major |
| and the minor version number (such as **42** for **4.2** / **4.2**.0.1 |
| or **4.2**.4 ). |
| * Either place the DLLs in the same directory as your application's .EXE |
| files, or set the PATH variable to point to the directory containing the |
| ICU DLLs. |
| * For compiling applications, add the "include" direcotry (the parent of |
| the "unicode" and "layout" directories) to the include search path. |
| * For linking applications, add the "lib" directory to the appropriate |
| path. |
| * **Other Platforms**: |
| * For other platforms, the .tgz file unpacks to a "/usr/local" type |
| hierarchy. For system-wide installation, you can unpack all of the files |
| into /usr/local/bin, /usr/local/include, etc. |
| * The configuration script **/usr/local/bin/icu-config** or the similar |
| Makefile include fragment **/usr/local/lib/icu/current/Makefile.inc** |
| can be used in building applications. |
| |
| #### Can you help me build ICU4C for ... |
| |
| We can try ... make sure you read the latest "readme" and also the [ICU |
| Data](../icudata.md) section. You might also [searching the icu-support |
| archives](http://site.icu-project.org/contacts), and then posting a question |
| there. Additionally, sites such as |
| [StackOverflow](http://stackoverflow.com/search?q=icu) may have helpful tips for |
| your topic. |
| |
| * **Android NDK** |
| * Please try [searching the icu-support |
| archives](http://site.icu-project.org/contacts) and also see |
| [StackOverflow](http://stackoverflow.com/search?q=icu+android). |
| * **iPhone** |
| * Please try [searching the icu-support |
| archives](http://site.icu-project.org/contacts) and also see |
| [StackOverflow](http://stackoverflow.com/search?q=icu+iphone). |
| |
| #### What is the ICU binary compatibility policy? |
| |
| Please see the section on |
| [binary compatibility](../design#icu-binary-compatibility) |
| in the [design chapter](../design.md). |
| |
| #### How is ICU licensed? |
| |
| The ICU license is intended to allow ICU to be included both in free software |
| projects and in proprietary or commercial products. |
| |
| Since ICU 58, ICU is covered by the |
| [Unicode license](http://www.unicode.org/copyright.html) which is very similar to |
| the previous ICU license. |
| |
| ICU 1.8.1–ICU 57 and ICU4J 1.3.1–ICU4J 57 are covered by the [ICU |
| license](https://github.com/unicode-org/icu/blob/release-57-1/icu4c/LICENSE), |
| a simple, permissive non-copyleft free software license, compatible with the GNU |
| GPL. The ICU license is identical to the version of the X license that was |
| formerly available at <http://www.x.org/Downloads_terms.html> . (This site no |
| longer exists, but can still be retrieved through internet archive services.) |
| |
| #### Can I use ICU from other languages besides C/C++ and Java? |
| |
| There are a number of wrappers available, please see the |
| [Related Projects](http://site.icu-project.org/related) page. |
| |
| #### How do I upgrade to a new version of ICU? Should I be concerned about API changes, a new Unicode version or a new CLDR version)? |
| |
| Our goal is for ICU upgrades to go smoothly. Here are some steps you can take to |
| prepare for an upgrade, or to make sure that your usage of ICU is |
| upgrade-friendly. |
| |
| * **API:** ensure that you are not using draft APIs which may have changed in |
| a future release. See the section on |
| [API compatibility](../design#icu-api-compatibility) in the |
| [design chapter](../design.md). |
| * **Unicode:** See the release notes for particular versions of Unicode to |
| ensure that your code is not affected by property changes or other |
| specification changes. |
| * **CLDR:** If your application has test cases which depend on specific |
| translations, these assumptions may become invalid if the translation of an |
| item changes, new support is added, or if a country changes its currency. |
| Try not to depend on specific translations, or be prepared to change test |
| cases. Also, a newer version may support additional translations, |
| currencies, types of calendars |
| * **Building/Deploying your Application (ICU4C):** ICU4C usually builds with |
| symbol renaming (See: |
| [binary compatibility](../design#icu-binary-compatibility) |
| in the [design chapter](../design.md)). Be sure that you build your |
| application with the updated ICU header files, so that it will link against |
| the current ICU. Also, don't hard-code the names of ICU libraries in your |
| build scripts and projects. Where possible, link against just the |
| 'base name' such as `libicuuc.so` or `icuuc.lib` rather than a name |
| containing the version number such as `libicuuc.so.**46**` or |
| `icuuc**46**.dll`. |
| |
| ## Building and Testing ICU |
| |
| #### How do I build ICU? |
| |
| See the readme.html that is included with ICU. |
| |
| #### How do I get 32- or 64-bit versions of the ICU libraries? |
| |
| From ICU version 4.2 on, the configure script will build with the default bit |
| width of your platform. You can request 64 or 32 bits with the |
| **--with-library-bits=** option, (e.g. `runConfigureICU Linux |
| **--with-library-bits=64**` or `runConfigureICU MacOSX |
| **--with-library-bits=32**`). |
| (For the behavior of attempting 64 bits if possible, use |
| **--with-library-bits=64else32**). |
| |
| #### How do I build an optimized, non debug ICU? |
| |
| On Win32, choose the 'Release' configuration from the drop down menu. On other |
| platforms, use the runConfigureICU script, which uses the configure script. The |
| runConfigureICU script uses the safest level of optimization for the ICU |
| libraries. If your platform is not specified, set the following environment |
| variables before running configure or runConfigureICU: **CFLAGS=-O CXXFLAGS=-O** |
| |
| #### Why am I getting so many test failures when I use "gmake check"? |
| |
| Please view the readme that is included with ICU. It has all the details on how |
| to build and test ICU, and it usually answers most problems. |
| |
| If you are using a compiler that hasn't been tested with ICU before, you may |
| have encountered an optimization bug with the compiler. On Unix platforms you |
| can specify **--disable-release** when you are using runConfigureICU (e.g. |
| `runConfigureICU --disable-release LinuxRedHat`). If this fixes your problem, it |
| is recommended that you report the optimization bug to the compiler |
| manufacturer. |
| |
| If neither of these fix your problem, please send an e-mail to the [ICU4C |
| Support List](http://icu-project.org/contacts.html) . |
| |
| #### How can I reduce the size of the ICU data library? |
| |
| Use the [Data Customizer](https://unicode-org.atlassian.net/browse/ICU-12835) |
| or see |
| [Customizing ICU's Data Library](../icudata#customizing-icus-data-library) |
| in the [ICU Data Management](../icudata.md) chapter of this User's Guide. |
| |
| #### Why am I seeing a small ( only a few K ) instead of a large ( several megabytes ) data shared library (icudt)? |
| #### Opening ICU services fails with U_MISSING_RESOURCE_ERROR and u_init() returns failure. |
| |
| ICU libraries always must link with the ICU data library. However, so that ICU |
| can bootstrap itself, it first builds a 'stub' data library, in |
| **icu\\source\\stubdata**, so that the tools can function. You should only use |
| this in production if you are NOT using DLL-mode data access, in which case you |
| are accessing ICU data as individual files, as an archive (.dat) file, or some |
| other means. Normally, you should be using the larger library built from |
| **icu\\source\\data**. If you see this issue after ICU has completed building, |
| re-run 'make' in **icu\\source\\data**, or the '**makedata**' project in Visual |
| Studio. |
| |
| #### Can I add or remove a converter from ICU? |
| |
| Yes. Please see [Customizing ICU's Data Library](../icudata#customizing-icus-data-library) |
| in the [ICU Data Management](../icudata.md) of this User's Guide. You can also |
| get extra converters from <http://www.icu-project.org/charts/charset/> or use |
| the [ICU Data Customizer](https://unicode-org.atlassian.net/browse/ICU-12835) |
| tool. |
| |
| #### Why don't the makefiles work? |
| |
| You need GNU's make program version 3.8 or later, and you need to run the |
| runConfigureICU script, which is located in the `icu/source directory`. You may |
| be using a platform that ICU does not support. If the first two answers do not |
| apply to you, then you should send an e-mail to the |
| [ICU4C Support List](http://www.icu-project.org/contacts.html). |
| |
| Here are some places you can find gmake: |
| |
| 1. GNU: <http://www.gnu.org/software/make/> |
| |
| 2. Sun® Source/Binaries: <http://www.sunfreeware.com> |
| |
| 3. z/OS (OS/390) Source/Binaries: |
| <http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html#opensrc> |
| |
| 4. IBM i (OS/400) Source/Binaries: |
| <http://www.ibm.com/servers/enable/site/porting/iseries/overview/gnu_utilities.html> |
| |
| Due to differences in every platform's make program, we will not support other |
| versions of our make files. |
| |
| #### What version of the C iostream is used in ICU4C? |
| |
| ICU4C uses the latest available version of the iostream on the target platform. |
| Only the `io` library uses iostream. |
| |
| #### I only want to use the C APIs, do I need a C++ compiler? |
| |
| Large portions of ICU4C were always implemented in C++, and over time we are |
| moving more into that direction. We continue to support and add C APIs, in order |
| to provide binary-compatible APIs. For the implementation, C++ is much better: |
| It is generally easier to work with, which reduces bugs and maintenance. It is |
| closer to Java, which is important for porting between ICU4C and ICU4J. We use |
| [RAII](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization) |
| (e.g., LocalPointer) to reduce opportunities for memory leaks, we use inline |
| functions and type-safe constants instead of #define, etc. However, we do not |
| use exceptions, and we do not use the Standard Template Library (STL), so |
| ICU4C's dependencies on the C++ library are minimal. See the new |
| [dependencies.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/depstest/dependencies.txt) |
| and search for "group: cplusplus". |
| |
| As ICU does not use exceptions, the GCC option `-fno-exceptions` will reduce or |
| remove the dependencies on the standard C++ library. In |
| [GCC](http://gcc.gnu.org) 4.5 there is an option `-static-libstdc++` which will |
| remove C++ library dependencies. Visual Studio has the |
| [/MT option](http://msdn.microsoft.com/en-us/library/2kzt1wy3(v=VS.100).aspx), |
| and other compilers may have similar options. See the |
| [How To Use ICU](../howtouseicu.md) page for related information on this topic. |
| |
| ## Features of ICU |
| |
| #### What computer languages does ICU support? |
| |
| ICU4C (ICU) is written in C and C++, and ICU4J is written in Java™. |
| |
| #### How are the APIs documented for deprecation? |
| |
| Please read the [ICU API compatibility](../design#icu-api-compatibility) |
| section in the [ICU Design](../design.md) chapter. |
| |
| #### What version of Unicode standard does ICU support? |
| |
| ICU versions 65 supports Unicode version 12. |
| |
| The Unicode versions for older versions of ICU are listed on the ICU download |
| page, <http://www.icu-project.org/download/> |
| |
| #### Does ICU support UTF-16 surrogates and Unicode supplementary characters? |
| |
| Yes. |
| |
| #### Does Java support UTF-16 surrogates and Unicode supplementary characters? |
| |
| Java 5 introduced support for Unicode supplementary characters. Java 1.4 and |
| earlier do not directly support them. |
| |
| #### How does ICU relate to Java's java.text.\* package? |
| |
| The International Components for Unicode are available both as a C/C++ library |
| and a Java class library. ICU provides internationalization utilities for |
| writing global applications in C, C++ or Java programming languages. ICU was |
| originally developed by the Unicode group at the IBM Globalization Center of |
| Competency in Cupertino, and ICU was contributed to Sun for inclusion into the |
| JDK 1.1. ICU4J includes enhanced versions of some of these contributed classes |
| plus additional classes that complement the classes in the JDK. |
| |
| ICU4C started as a C++ port of the original Java Internationalization classes. |
| These classes are now partially implemented in C, with largely parallel C and |
| C++ APIs. ICU4C and ICU4J continue to leapfrog each other with features and bug |
| fixes. Over time, features from ICU4J get added to the JDK as well. |
| |
| Both versions of ICU have a goal to implement the latest Unicode standard, |
| maintain a single portable source code base, and to make it easier for software |
| developers to create global applications. |
| |
| ## Using ICU |
| |
| #### Can I use any of the features of ICU without Unicode strings? |
| |
| No. In order to use the collation, text boundary analysis, formatting or other |
| ICU APIs, you must use Unicode strings. In order to get Unicode strings from |
| your native codepage, you can use the conversion API. |
| |
| #### How do I declare a Unicode string in ICU? |
| |
| Use the `U_STRING_DECL` and `U_STRING_INIT` macros or use the UnicodeString |
| class for C++. Strings are represented as `UChar \*` as the base string type. |
| |
| Even though most platforms declare wide strings as `wchar_t \*` or `L""` as the |
| base string type, that declaration is not portable because the `sizeof(wchar_t)` |
| can be 1, 2 or 4, and the encoding may not even be Unicode. On the platforms |
| where `sizeof(wchar_t)` is 2 bytes, `UChar` is defined as `wchar_t`. In that |
| case you can use ICU's strings with 3rd party legacy functions; however, we do |
| not suggest using Unicode strings without the `U_STRING_DECL` and |
| `U_STRING_INIT` macros or UnicodeString class because they are platform |
| independent implementations. |
| |
| #### How is a Unicode string represented in ICU4C? |
| |
| A Unicode string is currently represented as UTF-16. The endianess of UTF-16 is |
| platform dependent. You can guarantee the endianess of UTF-16 by using a |
| converter. UTF-16 strings can be converted to other Unicode forms by using a |
| converter or with the UTF conversion macros. |
| |
| ICU does not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not support |
| surrogates, and UTF-16 does support surrogates. This means that UCS-2 only |
| supports UTF-16's Base Multilingual Plane (BMP). The notion of UCS-2 is |
| deprecated and dead. Unicode 2.0 in 1996 changed its default encoding to UTF-16. |
| |
| If you need to do a quick and easy conversion between UTF-16 and UTF-8, UTF-32 |
| or an encoding in `wchar_t`, you should take a look at unicode/ustring.h. In |
| that header file you will find `u_strToWCS`, `u_strFromWCS`, `u_strToUTF8`, |
| `u_strFromUTF8`, `u_strToUTF32` and `u_strFromUTF32` functions. These |
| functions are provided for your convenience instead of using the `ucnv_\*` API. |
| |
| You can also take a look at the `UTF_\*`, `UTF8_\*`, `UTF16_\*` and `UTF32_\*` |
| macros, which are defined in |
| [unicode/utf.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf.h), |
| [unicode/utf8.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf8.h), |
| [unicode/utf16.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf16.h) |
| and [unicode/utf32.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf32.h). |
| These macros are helpful for programmers that need to manipulate and process |
| Unicode strings. |
| |
| #### How do I index into a UTF-16 string? |
| |
| Typically, indexes and offsets in strings count string units, not characters |
| (although in C and Java they have a char type). |
| |
| For example, in old-fashioned MBCS strings, you would count indexes and offsets |
| by bytes, not by the variable-width character count. In UTF-16, you do the same, |
| just count 16-bit units (in ICU: UChar). |
| |
| #### What is the performance difference between UTF-8 and UTF-16? |
| |
| Most of the time, the memory throughput of the hard drive and RAM is the main |
| performance constraint. UTF-8 is 50% smaller than UTF-16 for US-ASCII, but UTF-8 |
| is 50% larger than UTF-16 for East and South Asian scripts. There is no memory |
| difference for Latin extensions, Greek, Cyrillic, Hebrew, and Arabic. |
| |
| For processing Unicode data, UTF-16 is much easier to handle. You get a choice |
| between either one or two units per character, not a choice among four lengths. |
| UTF-16 also does not have illegal 16-bit unit values, while you might want to |
| check for illegal bytes in UTF-8. Incomplete character sequences in UTF-16 are |
| less important and more benign. If you want to quickly convert small strings |
| between the different UTF encodings or get a UChar32 value, you can use the |
| macros provided in `utf.h` and its siblings `utf8.h` and `utf16.h`. For larger |
| or partial strings, please use the conversion API. |
| |
| #### How do the converters work? |
| |
| The converters act like a data stream. This means that the state of the last |
| character is saved in the converter after each call to the `ucnv_fromUnicode()` |
| and `ucnv_toUnicode()` functions. So if the source buffer ends with part of a |
| surrogate Unicode character pair, the next call to `ucnv_toUnicode()` will |
| write out the equivalent character to the destination buffer. Please see the |
| [Conversion](../conversion/index.md) chapter of the User's Guide for details. |
| |
| #### What does a locale look like in ICU? |
| |
| ICU locales are lightweight, and they are represented by just a string. |
| Lightweight means that there is just a string to represent a locale and nothing |
| more. Many platforms have numbers and other data structures to represent a |
| locale, but ICU has one simple platform independent string to represent a |
| locale. |
| |
| ICU locales usually contain an ISO-639 language name (2-3 characters), an |
| ISO-3166 country name (2-3 characters), and a variant name which is user |
| specified. When a language or country is not represented by these standards, ICU |
| uses 3 characters to represent that part of the locale. All three parts are |
| separated by an underscore "_". For example, US English is "en_US", and German |
| in Germany with the Euro symbol is represented as "de_DE_EURO". Traditionally |
| the language part of the locale is lowercase, the country is uppercase and the |
| variant is uppercase. More details are available from the [Locale |
| Chapter](../locale/index.md) of this User's Guide. |
| |
| #### How is ICU versioned? |
| |
| Please read the [ICU Design](../design.md) chapter of the User's Guide. |
| |
| #### What is the relationship between ICU locale data and system locale data? |
| |
| There is no relationship. ICU is not dependent on the operating system for the |
| locale data. |
| |
| This also means that `uloc_setDefault()` does not affect the operating system. |
| The function `uloc_setDefault()` only sets ICU's default locale. Normally the |
| default locale for ICU is whatever the operating system says is the default |
| locale. |
| |
| #### How are errors handled in ICU? |
| |
| Since not all compilers can handle exceptions, we return an error from functions |
| with a `UErrorCode` parameter. The `UErrorCode` parameter of a function will |
| return any errors that occurred while it was executing. It's usually a good idea |
| to check for errors after calling a function by using the `U_SUCCESS` and |
| `U_FAILURE` macros. `U_SUCCESS` returns true when the function did run properly, |
| and `U_FAILURE` returns true when the function did NOT run properly. You may |
| handle specific errors from a function by checking the exact value of error. The |
| possible values of `UErrorCode` are located in |
| [utypes.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utypes.h) |
| of the common project. Before any function is called with a `UErrorCode`, it |
| must be initialized to `U_ZERO_ERROR`. |
| |
| Here is an example of `UErrorCode` being used. |
| |
| ```c++ |
| UErrorCode err = U_ZERO_ERROR; |
| callMyFunction(&err); |
| if (U_FAILURE(err)) { |
| puts("callMyFunction() Failed!"); |
| } |
| ``` |
| |
| Please see the [ICU Design](../design.md) chapter for details. |
| |
| #### With calendar classes, why are months 0-based? |
| |
| "I have been using ICU for its calendar classes, and have found it to be |
| excellent. That said, I am wondering why the decision was made to keep months |
| 0-based while almost all the other calendrical units (years, weeks of year, |
| weeks of month, date, days of year, days of week, days of week in month) are |
| 1-based? This has been the source of several bugs whenever the mind is slightly |
| less than razor sharp." --Contributor |
| |
| This was not our choice. We inherited it from the Java Calendar API, |
| unfortunately. |
| |
| #### Is there a guideline for COBOL programs that want to use ICU? |
| |
| There is a COBOL/ICU guideline available since ICU 2.2. For more details, please |
| refer to the [COBOL section](../usefrom/cobol.md) of this User's Guide. |
| |
| #### Where can I get more information about using ICU? |
| |
| Please send an e-mail to the [ICU4C Support |
| List](http://www.icu-project.org/contacts.html) . |