| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| |
| <html> |
| <head> |
| <meta name="GENERATOR" content="Microsoft FrontPage 4.0"> |
| <meta name="COPYRIGHT" content= |
| "Copyright (c) IBM Corporation and others. All Rights Reserved."> |
| <meta name="KEYWORDS" content= |
| "ICU; International Components for Unicode; what's new; readme; read me; introduction; downloads; downloading; building; installation;"> |
| <meta name="DESCRIPTION" content= |
| "The introduction to the International Components for Unicode with instructions on building, installation, usage and other information about ICU."> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
| |
| <title>ReadMe for ICU</title> |
| <style type="text/css"> |
| h1 {border-width: 2px; border-style: solid; text-align: center; width: 100%; font-size: 200%; font-weight: bold} |
| h2 {margin-top: 3em; text-decoration: underline; page-break-before: always} |
| h2.TOC {page-break-before: auto} |
| h3 {margin-top: 2em; text-decoration: underline} |
| h4 {text-decoration: underline} |
| h5 {text-decoration: underline} |
| caption {font-weight: bold; text-align: left} |
| div.indent {margin-left: 2em} |
| ul.TOC {list-style-type: none} |
| samp {margin-left: 2em; border-style: groove; padding: 1em; display: block; background-color: #EEEEEE} |
| </style> |
| </head> |
| |
| <body lang="en-US"> |
| <h1>International Components for Unicode<br> |
| ICU 2.0 ReadMe</h1> |
| |
| <p>Version: 2001-Nov-16<br> |
| Copyright © 1995-2001 International Business Machines Corporation and |
| others. All Rights Reserved.</p> |
| <hr> |
| |
| <h2 class="TOC">Table of Contents</h2> |
| |
| <ul class="TOC"> |
| <li><a href="#Introduction">Introduction</a></li> |
| |
| <li><a href="#GettingStarted">Getting started</a></li> |
| |
| <li> |
| <a href="#News"> What is new in this release?</a> |
| |
| </li> |
| |
| <li><a href="#Download">How to Download the Source Code</a></li> |
| |
| <li><a href="#SourceCode">ICU Source Code Organization</a></li> |
| |
| <li> |
| <a href="#HowToBuild">How to Build And Install ICU</a> |
| |
| <ul class="TOC"> |
| <li><a href="#HowToBuildSupported">Supported Platforms</a></li> |
| |
| <li><a href="#HowToBuildWindows">Windows</a></li> |
| |
| <li><a href="#HowToBuildUnix">Unix</a></li> |
| |
| <li><a href="#HowToBuildOS390">OS/390 (zSeries)</a></li> |
| |
| <li><a href="#HowToBuildOS400">OS/400 (iSeries)</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#ImportantNotes">Important Notes About Using ICU</a> |
| |
| <ul class="TOC"> |
| <li><a href="#ImportantNotesWindows">Windows Platform</a></li> |
| |
| <li><a href="#ImportantNotesUnix">Unix Type Platforms</a></li> |
| |
| <li><a href="#ImportantNotesDefaultCP">Using the default codepage</a></li> |
| |
| <li><a href="#ImportantNotesDeprecatedAPI">Methods for enabling |
| deprecated APIs</a></li> |
| </ul> |
| </li> |
| |
| <li><a href="#PlatformDependencies">Platform Dependencies</a></li> |
| |
| </ul> |
| <hr> |
| |
| <h2><a name="Introduction" href="#Introduction">Introduction</a></h2> |
| |
| <p>Today's software market is a global one in which it is desirable to |
| develop and maintain one application (single source/single binary) that supports a wide variety of languages. |
| The International Components for Unicode (C/C++) provides tools to help write |
| platform-independent applications that are internationalized and localized, |
| with support for:</p> |
| |
| <ul> |
| <li>Support for the latest version of the Unicode standard</li> |
| |
| <li>Character set conversions, with support for over 200 codepages</li> |
| |
| <li>Locale data for more than 160 locales</li> |
| |
| <li>Text collation (sorting) based on the Unicode Collation Algorithm |
| (=ISO 14651), customizable and tailored for national standards</li> |
| |
| <li>Transliteration services for script<->script transliterations |
| and general text operations</li> |
| |
| <li>Resource bundles for storing and accessing localized information</li> |
| |
| <li>Date/Number/Message formatting and parsing of culture-specific |
| input/output formats</li> |
| |
| <li>Text boundary analysis for finding characters, word and sentence |
| boundaries</li> |
| </ul> |
| |
| <p>ICU has a sister project <a href="http://oss.software.ibm.com/icu4j/">ICU4J</a> |
| that extends the internationalization capabilities of Java to a level similar |
| to ICU. The ICU C/C++ project is also called ICU4C when a distinction is necessary.</p> |
| |
| <h2><a name="#GettingStarted" href="#GettingStarted">Getting started</a></h2> |
| |
| <p>This document describes how to build and install ICU on your machine. |
| For other information about ICU please see the following table of links.<br> |
| The ICU homepage also links to related information about writing |
| internationalized software.</p> |
| |
| <table border="1" cellpadding="3" width="100%"> |
| <caption> |
| Here are some useful links regarding ICU and internationalization in |
| general. |
| </caption> |
| |
| <tr> |
| <td>ICU Homepage</td> |
| <td><a href="http://oss.software.ibm.com/icu/">http://oss.software.ibm.com/icu/</a></td> |
| </tr> |
| |
| <tr> |
| <td>ICU4J Homepage</td> |
| <td><a href="http://oss.software.ibm.com/icu4j/">http://oss.software.ibm.com/icu4j/</a></td> |
| </tr> |
| |
| <tr> |
| <td>FAQ - Frequently Asked Questions about ICU</td> |
| <td><a href="http://oss.software.ibm.com/icu/userguide/icufaq.html"> |
| http://oss.software.ibm.com/icu/userguide/icufaq.html</a></td> |
| </tr> |
| |
| <tr> |
| <td>ICU User's Guide</td> |
| <td><a href="http://oss.software.ibm.com/icu/userguide/"> |
| http://oss.software.ibm.com/icu/userguide/</a></td> |
| </tr> |
| |
| <tr> |
| <td>Download ICU Releases</td> |
| <td><a href="http://oss.software.ibm.com/icu/download/"> |
| http://oss.software.ibm.com/icu/download/</a></td> |
| </tr> |
| |
| <tr> |
| <td>API Documentation Online</td> |
| <td><a href="http://oss.software.ibm.com/icu/apiref/"> |
| http://oss.software.ibm.com/icu/apiref/</a></td> |
| </tr> |
| |
| <tr> |
| <td>Online ICU Demos</td> |
| <td><a href="http://oss.software.ibm.com/icu/demo/"> |
| http://oss.software.ibm.com/icu/demo/</a></td> |
| </tr> |
| |
| <tr> |
| <td>Contacts & Bug Reports/Feature Requests</td> |
| <td><a href="http://oss.software.ibm.com/icu/archives/"> |
| http://oss.software.ibm.com/icu/archives/</a></td> |
| </tr> |
| </table> |
| |
| <p><strong>Important:</strong> Please make sure you understand the |
| <a href="license.html">Copyright and License Information</a>.</p> |
| |
| <h2><a name="News" href="#News"> What is new in this release?</a></h2> |
| |
| <p>The following list concentrates on changes that affect existing |
| applications migrating from previous ICU releases. For more news about this release, see the <a href="http://oss.software.ibm.com/icu/download/2.0/">ICU |
| 2.0 download page</a>.</p> |
| |
| <h3>Support for Unicode 3.1.1</h3> |
| |
| <p>ICU 2.0 has been upgraded to support <a href="http://www.unicode.org/unicode/standard/versions/Unicode3.1.1.html"> Unicode |
| 3.1.1</a>, which |
| includes the addition of 44,946 new encoded characters. |
| These characters cover several historic scripts, several sets of symbols, |
| and a very large collection of additional CJK ideographs.</p> |
| |
| <p>As part of this upgrade, a number of ICU services have been reviewed and |
| improved with regards to handling supplementary characters (surrogate |
| pairs). Especially, normalization is revamped for support of supplementary |
| characters and higher performance.</p> |
| |
| <h3>Euro transition</h3> |
| |
| <p>Locale data for countries that are switching their national currencies to |
| the Euro is updated to use the Euro symbol and appropriate currency |
| formatting. The old data is available in _PREEURO locale variants. The _EURO |
| variant selector can still be used to unambiguously get Euro currency symbol |
| formatting. For some time around the transition, software should explicitly |
| specify _PREEURO and _EURO variants to make sure to get the intended |
| currency format.</p> |
| |
| <p>For more on this topic see the <a href="http://www.ibm.com/developerworks/unicode/library/u-euro/">developerWorks |
| article "Are you really ready for the Euro?"</a>.</p> |
| |
| <h3>API changes</h3> |
| |
| <p>Functions that take C-style string input arguments with const UChar *src |
| and int32_t srcLength now consistently treat srcLength==-1 to mean that the |
| input string is NUL-terminated and get srcLength=u_strlen(src).</p> |
| |
| <p>Functions that take C-style string output arguments with UChar *dest and |
| int32_t destCapacity now handle NUL-termination of the output string |
| consistently. If the output length is equal to destCapacity, then dest is |
| filled with the output string and a warning code is set. For details about |
| string handling see the <a href="http://oss.software.ibm.com/icu/userguide/strings.html">User's |
| Guide Strings chapter</a>.</p> |
| |
| <p>Some APIs have been <i>deprecated </i>for a long time (more than a year) |
| and have been removed now.<br> |
| Some other APIs have been marked as <i>deprecated </i>because they are |
| replaced by improved APIs; the newly deprecated APIs will be available for |
| another year. In particular, the C++ classes UnicodeConverter, Unicode, and |
| BiDi are deprecated in favor of the equally powerful C APIs.<br> |
| A few <i>draft </i>APIs have changed, especially for transliteration.</p> |
| |
| <p>APIs that take a rules or pattern string (for collation, transliteration, |
| message formats, etc.) now also take a <code> UParseError</code> structure that is filled |
| with useful debugging information when a rule syntax error is detected. This |
| makes it easier in large rules to find problems. As a result, the signatures |
| of some functions have changed. The old signatures will be available for |
| about a year by #defining a constant. See affected header files for details.</p> |
| |
| <p>The C++ Normalizer class had a partially broken model for iterative |
| normalization; this is redone in a more consistent way. See the <a href="http://oss.software.ibm.com/icu/apiref/class_Normalizer.html">Normalizer |
| API documentation</a> for details.</p> |
| |
| <h3>Memory and resource cleanup</h3> |
| |
| <p>ICU is carefully tested for memory leaks. Some memory is held in internal |
| caches that do not normally get released during normal operation. These are |
| not leaks because ICU continues to use them as necessary.</p> |
| |
| <p>For testing purposes (for memory leaks) and for a small number of |
| applications it can be useful to close all the memory that is allocated for |
| a library. ICU 2.0 supports this with a new function <code><a href="http://oss.software.ibm.com/icu/apiref/uclean_h.html">u_cleanup()</a></code> |
| that may be called after an application has released all ICU objects. <code>u_cleanup()</code> |
| will then release all of ICU's internal memory. The ICU libraries can then |
| even be unloaded cleanly without shutting down the process.</p> |
| |
| <h3>ICU versioning - C++ namespaces</h3> |
| |
| <p>Beginning with ICU 2.0, multiple releases of ICU can be used in the same |
| process. Together with an arbitrary number of post-2.0 releases, one pre-2.0 |
| release can be loaded and active.</p> |
| |
| <p>This is achieved by renaming all library exports to include a release |
| number suffix. Each global function and each class is renamed in this way |
| using a header file with #defines. For C++, if the compiler supports |
| namespaces, all ICU C++ classes are defined in the "icu" |
| namespace. If the compiler does not support namespaces, then the classes are |
| renamed instead. This change also reduces the chance of naming collisions |
| with other libraries.</p> |
| |
| <p>For details see the <a href="http://oss.software.ibm.com/icu/userguide/design.html">User's |
| Guide Design Chapter</a>.</p> |
| |
| <h3>Data loading changed</h3> |
| |
| <p>ICU data loading is simplified for most users. By default, the ICU build |
| creates a DLL/shared library that is linked directly with the common library |
| (<code>[lib]icuuc</code>). By placing all ICU libraries including the data |
| library into the same folder, ICU should start up and find its data |
| immediately. Dynamic loading of data from DLLs/shared libraries is not |
| supported any more.</p> |
| |
| <p>Before ICU 2.0, ICU did not itself link directly with its data library, |
| but some ICU applications did (like the Xerces XML parser) and called <code>udata_setCommonData()</code>. |
| This is not necessary any more in the default case.<br> |
| On the other hand, this same technique can now be used to efficiently load |
| application data (e.g., for its own localization). An application can build |
| a data DLL/library of its own, link it, and call the new API <code>udata_setAppData()</code>.</p> |
| |
| <p>For details on finding and loading ICU data and on options for portable, |
| common data files etc. see the <a href="http://oss.software.ibm.com/icu/userguide/icudata.html">User's |
| Guide ICU Data Chapter</a>.</p> |
| |
| <h3>Collation improvements</h3> |
| |
| <p>The performance of Japanese Katakana collation is improved, and the |
| Japanese collation is changed for conformance with the JIS X 4061 standard. |
| The improvement is in the handling of the length and iteration marks, making |
| the processing of regular letters faster.</p> |
| |
| <p>The JIS X 4061 standard specifies a 5-level sorting algorithm. Sorting |
| with all five levels according to JIS is |
| achieved in ICU 2.0 with the "identical" strength. The fifth level |
| distinguishes regular character codes from compatibility variants.</p> |
| |
| <p>There is special code to handle the fourth (quarternary) level of the JIS |
| standard, which distinguishes between Hiragana and Katakana letters. In ICU |
| 2.0 string comparisons (like ucol_strcoll), when using the |
| "shifted" option, this is slow because it |
| generates complete sort keys for both strings. This is not an issue if the |
| "shifted" option is not used, or if the string comparison is done |
| with fewer levels.</p> |
| |
| <p> |
| Quarternary strength, without the "shifted" option, is the default for Japanese collation in ICU 2.0.</p> |
| |
| <p>Three-level sorting (tertiary strength) and lower — if sufficient — is |
| faster even with "shifted" on (for string comparisons: <em>much</em> |
| faster in this case).</p> |
| |
| <h3>License Change (for ICU 1.8.1 and up)</h3> |
| |
| <p>The ICU projects (ICU4C and ICU4J) have changed their licenses from the |
| IPL (IBM Public License) to the X license. The X license is a non-viral and |
| recommended free software license that is compatible with the GNU GPL |
| license. This is effective starting with release 1.8.1 of ICU4C and release |
| 1.3.1 of ICU4J. All previous ICU releases will continue to utilize the IPL. |
| New ICU releases will adopt the X license. The users of previous releases |
| of ICU will need to accept the terms and conditions of the X license in |
| order to adopt the new ICU releases.</p> |
| |
| <p>The main effect of the change is to provide GPL compatibility. The X |
| license is listed as GPL compatible, see the gnu page at <a href= |
| "http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses"> |
| http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses</a>.</p> |
| |
| <p>The text of the X license is available at <a href= |
| "http://www.x.org/terms.htm">http://www.x.org/terms.htm</a>. The IBM |
| version contains the essential text of the license, omitting the X-specific |
| trademarks and copyright notices.</p> |
| |
| <p>For more details please see the <a href= |
| "http://oss.software.ibm.com/icu/press.html">press announcement</a> and the |
| <a href="http://oss.software.ibm.com/icu/project_faq.html#license">Project |
| FAQ</a>.</p> |
| |
| <h3>Transliterator improvements</h3> |
| |
| <p>The transliterator service has undergone an extensive overhaul, in both |
| the rule-based engine and the built-in system rules. For a complete |
| description see the <a href="http://oss.software.ibm.com/icu/userguide/Transliteration.html">User's |
| Guide chapter on transliteration</a>.</p> |
| |
| <ul> |
| <li><b>New or rewritten rules:</b> <tt>Any-Accents</tt>, <tt> |
| Any-Publishing</tt>, <tt>Cyrillic-Latin</tt>*, <tt>Greek-Latin</tt>*, |
| <tt>Greek-Latin/UNGEGN</tt> (aka <tt>el-Latin</tt>), <tt> |
| Hiragana-Latin</tt>*, and <tt>Latin-Katakana</tt>*. New algorithmic rules |
| include <tt>Any-Name</tt>*, the normalization rules <tt>Any-NFC</tt>, |
| <tt>Any-NFKC</tt>, <tt>Any-NFD</tt>, and <tt>Any-NFKD</tt>, casing rules |
| <tt>Any-Upper</tt>, <tt>Any-Lower</tt>, and <tt>Any-Title</tt>. <tt> |
| Unicode-Hex</tt>* has been renamed <tt>Any-Hex</tt>*. <tt>Any-Remove</tt> |
| deletes its input. [*<em>applies to reverse rule as well</em>]</li> |
| |
| <li><b>Indic script rules:</b> Transliterators between Indic scripts and |
| from each script to and from Latin have been completely revised. Scripts |
| included are Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, |
| Oriya, Tamil, and Telugu. Taking Bengali as an example, transliterators |
| <tt>Bengali-X</tt> and <tt>X-Bengali</tt> exist, where X is any of the |
| other listed Indic scripts, or Latin.</li> |
| |
| <li><b>Deleted rules:</b> <tt>UnicodeName-UnicodeChar</tt> has been |
| replaced by <tt>Any-Name</tt>*. <tt>Latin-Arabic</tt>* and <tt> |
| Latin-Hebrew</tt>* have been removed until they can be rewritten. <tt> |
| KeyboardEscape-Latin1</tt> has been replaced by <tt>Any-Accents</tt> and |
| <tt>Any-Publishing</tt>. <tt>Latin-Kana</tt>* has been replaced by <tt> |
| Latin-Katakana</tt>* and <tt>Latin-Hiragana</tt>*. [*<em>applies to |
| reverse rule as well</em>]</li> |
| |
| <li><b>ID syntax changes:</b> Transliterator IDs ignore case and |
| whitespace now. They now have the standard form <em> |
| [filter]source-target/variant</em>. The "<em>[filter]</em>" element is |
| optional; if present, it limits the characters that the transliterator |
| operates on. The "<em>source-</em>" element is optional; if omitted, it |
| is taken to be <tt>Any</tt>. The "<em>/variant</em>" element is also |
| optional; if present, it selects between different flavors of a related |
| set of transliterators, for example, <tt>Greek-Latin</tt> and <tt> |
| Greek-Latin/UNGEGN</tt>. The source, target, and variant specifiers are |
| case-insensitive strings of the form <tt> |
| /[_[:L:]][_[:L:][:N:]]*/</tt>.</li> |
| |
| <li> |
| <b>Locale support:</b> The source, target, or both may be locales. In |
| this case the transliterator rules will be looked up in the system |
| locale resource bundles. Rules are sought under three tags, listed |
| below. The text after the underscore in each tag is always |
| canonicalized to uppercase before lookup. <em>Note: The underscore is |
| currently omitted from ICU4C tags, but will be restored when |
| possible.</em> |
| |
| <ul> |
| <li><tt>TransliterateTo_<em>SCRIPT</em></tt>: Unidirectional rules |
| from the enclosing locale to another script or specifier.</li> |
| |
| <li><tt>TransliterateFrom_<em>SCRIPT</em></tt>: Unidirectional rules |
| from another script or specifier to the enclosing locale.</li> |
| |
| <li><tt>Transliterate_<em>SCRIPT</em></tt>: Bidirectional rules, with |
| the forward direction being To and the reverse direction being |
| From.</li> |
| </ul> |
| Lookup proceeds in the following order: |
| |
| <ul> |
| <li>In the dynamic registry: <em>source-target</em></li> |
| |
| <li>In the <em>source</em> locale: <tt> |
| TransliterateTo_<em>TARGET</em></tt> then <tt> |
| Transliterate_<em>TARGET</em></tt> (forward direction)</li> |
| |
| <li>In the <em>target</em> locale: <tt> |
| TransliterateFrom_<em>SOURCE</em></tt> then <tt> |
| Transliterate_<em>SOURCE</em></tt> (reverse direction)</li> |
| </ul> |
| If either the source or target specifier is not a locale then the |
| corresponding locale lookup is skipped. If either is a locale, then |
| locale fallback from <tt>aa_BB_CCC</tt> to <tt>aa_BB</tt> to <tt> |
| aa</tt> is performed (where <tt>aa</tt>, <tt>BB</tt>, and <tt>CCC</tt> |
| are the locale language, country, and variant). The final fallback is |
| from the specifier, whether it is a locale or not (e.g., script |
| abbreviation), to the long script name associated with that specifier. |
| If a tag lookup succeeds, the attached element should be a string array |
| of <i>2n</i> items where <i>n</i> >= 1. Each pair of strings is a |
| variant name and rule string. The variants are matched against the |
| requested variant. If no variant is specified then the first variant is |
| considered to match. |
| </li> |
| |
| <li><b>Filters on compounds IDs:</b> A filter on a compound |
| transliterator can now be specified by giving a leading entry that |
| contains a filter and no transliterator ID. For example, "<tt>[abc]; |
| Latin-Katakana; Katakana-Hiragana</tt>" submits only the characters |
| contained in the UnicodeSet <tt>[abc]</tt> to the compound transliterator |
| <tt>Latin-Katakana; Katakana-Hiragana</tt>.</li> |
| |
| <li><b>Explicit reverse IDs:</b> Typically if a transliterator <tt> |
| A-B</tt> is formed, and its inverse is requested, the system tries to |
| create <tt>B-A</tt>. That is, the source and target are exchanged. In |
| some cases, the user may wish a different transliterator to be considered |
| the reverse. In order to do this, the reverse ID is specified in |
| parentheses immediately following the ID. For example, "<tt>A-B |
| (B-C)</tt>" is a transliterator <tt>A-B</tt> whose inverse is <tt> |
| B-C</tt>. If the ID of the inverse is requested, "<tt>B-C (A-B)</tt>" is |
| returned. The forward or reverse component may be empty, so |
| "<tt>(B-C)</tt>" and "<tt>A-B()</tt>" are legal IDs with <tt>Null</tt> |
| transliterator for the forward and reverse direction, respectively. This |
| is most useful in compounds where one element has no inverse or where a |
| different inverse from the standard inverse is desired. For example, |
| "<tt>Any-Lower(); Latin-Cyrillic</tt>".</li> |
| |
| <li><b>Quantifiers:</b> Transliterator rules may now contain quantifiers |
| '<tt>*</tt>', '<tt>+</tt>', and '<tt>?</tt>'. These indicate zero or |
| more, one or more, and zero or one matches, respectively. Quantifiers |
| apply to the last element, be it a single character, a UnicodeSet, a |
| segment definition, or a quote; the entire preceding element is repeated. |
| Quantifiers are implemented as greedy, non-backtracking matchers, unlike |
| their typical implementation in regular expressions. As a result, |
| expressions that match in a traditional regular expression engine (e.g., |
| Perl) will not match in transliterator. E.g., "[a-z]+ q > x;" will |
| <em>not</em> match "abcq", since the '<tt>+</tt>' quantifier consumes all |
| four characters.</li> |
| |
| <li><b>Dot character:</b> A new special character is recognized in rules, |
| '<tt>.</tt>' (U+0020). This character matches any characters in the set |
| <tt>[^[:Zp:][:Zl:]\r\n$]</tt>. Note the trailing '<tt>$</tt>' in the set |
| pattern, which indicates that the ETHER character is <em>not</em> matched |
| by '<tt>.</tt>'.</li> |
| |
| <li><b>::ID blocks in rules:</b> Transliterator IDs may now be included |
| in rule sets. These may occur in two locations: as one contiguous block |
| before any other rules, and as one contiguous block after all rules. The |
| effect of placing <tt>::ID</tt>s into a rule set is to enclose the |
| rule-based transliterator within a compound transliterator containing the |
| indicated IDs. The <tt>::ID</tt> syntax is exactly the same as the |
| standard ID syntax, with the difference that each ID element is preceded |
| by the special token "<tt>::</tt>".</li> |
| |
| <li><b>Segment definitions more flexible:</b> Segment definitions may be |
| nested and are now unlimited in number. Prior to 2.0, segments could not |
| be nested and were limited to nine ($1 to $9).</li> |
| |
| <li><b>Variable range pragma:</b> A new pragma is supported. This follows |
| the syntax:<code>use variable range 0xE800 0xEFFF;</code> (Any two code |
| points may be specified.) The code points are specified as decimal |
| constants, octal constants with a leading '0', or hexadecimal constants |
| with a leading "0x". The given range is used internally for stand-in |
| characters during processing. The default range is <b>0xF000..0xF8FF</b>. |
| If a rule set explicitly uses characters in the default variable range, a |
| new range, not containing any characters in use in the rule set, must be |
| specified. <em>Note:</em> This is the first of several planned |
| pragmas.</li> |
| |
| <li><b>Factory method registration:</b> Factory methods (function |
| pointers in ICU4C; functor objects in ICU4J) may be registered against |
| transliterator IDs. This is generally more efficient than the |
| registration of singleton prototypes, since no actual transliterator |
| object need be created until the user requires one. See the <tt> |
| registerFactory()</tt> method in <tt>Transliterator</tt>.</li> |
| |
| <li><b>Filtering semantics changed for subclasses:</b> Subclasses now |
| need not concern themselves with filters. Instead, they may assume that |
| all characters received by <tt>handleTransliterate()</tt> have already |
| passed through the filter. This simplifies subclass code greatly.</li> |
| </ul> |
| |
| <h3><a name="NewsUnicodeSet">UnicodeSet Improvements</a></h3> |
| |
| <ul> |
| <li><b><tt>[:Any:]</tt> set:</b> The set <tt>[:Any:]</tt> matches all |
| Unicode code points, that is, U+0000..U+10FFFF.</li> |
| |
| <li><b><tt>\p{}</tt> syntax:</b> UnicodeSet now recognizes a Perlish |
| syntax for character properties. Any property designated as <tt> |
| [:Foo:]</tt> may equivalently be designated <tt>\p{Foo}</tt>.</li> |
| |
| <li><b>Short, medium, and long property names:</b> In addition to the |
| short property names, such as <tt>[:Ll:]</tt>, equivalent medium (e.g., |
| <tt>[:gc=Ll:]</tt>) and long (e.g., <tt> |
| [:GeneralCategory=LowercaseLetter:]</tt>) forms are recognized. See the |
| <a href= |
| "http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/unicodeset_properties.html"> |
| UnicodeSet Properties design document</a> for details. As of this |
| release, general categories, numeric value, and script are |
| supported.</li> |
| </ul> |
| |
| <hr> |
| |
| <h2><a name="Download" href="#Download">How to Download the Source Code</a></h2> |
| |
| <p>There are two ways to download ICU releases:</p> |
| |
| <ul> |
| <li><strong>Official Release Snapshot:</strong><br> |
| If you want to use ICU (as opposed to developing it), you should |
| download an official packaged version of the ICU source code. These |
| versions are tested more thoroughly than day-to-day development builds of |
| the system, and they are packaged in zip and tar files for convenient |
| download. These packaged files can be found at <a href= |
| "http://oss.software.ibm.com/icu/download/"> |
| http://oss.software.ibm.com/icu/download/</a>.<br> |
| The packaged snapshots are named <strong>icu-nnnn.zip</strong> or <strong> |
| icu-nnnn.tgz</strong>, where nnnn is the version number. The .zip file is |
| used for Windows platforms, while the .tgz file is preferred on most |
| other platforms.<br> |
| Please unzip this file. It will reconstruct the source directory, |
| including anonymous CVS control directories (see below).</li> |
| |
| <li> |
| <strong>CVS Source Repository:</strong><br> |
| If you are interested in developing features, patches, or bug fixes |
| for ICU, you should probably be working with the latest version of the |
| ICU source code. You will need to check the code out of our CVS |
| repository to ensure that you have the most recent version of all of |
| the files. See our <a href="http://oss.software.ibm.com/icu/develop/cvs.html">CVS |
| page</a> for details. |
| </li> |
| </ul> |
| |
| <h2><a name="SourceCode" href="#SourceCode">ICU Source Code Organization</a></h2> |
| <p>In the descriptions below, <strong><i><ICU></i></strong> is the full path name |
| of the icu directory - the top level directory from the distribution archives |
| - in your file system.</p> |
| |
| <table border="1" cellpadding="0" width="100%" summary=""> |
| <caption> |
| The following files describe the code drop. |
| </caption> |
| |
| <tr> |
| <td>readme.html</td> |
| |
| <td>Describes the International Components for Unicode (this file)</td> |
| </tr> |
| |
| <tr> |
| <td>license.html</td> |
| |
| <td>Contains the text of the ICU license</td> |
| </tr> |
| </table> |
| |
| <p><br> |
| </p> |
| |
| <table border="1" cellpadding="0" width="100%" summary=""> |
| <caption> |
| The following directories contain source code and data files. |
| </caption> |
| |
| <tr> |
| <td><i><ICU></i>/source/common/</td> |
| |
| <td>The core Unicode and support functionality, such as resource |
| bundles, character properties, locales, codepage conversion, |
| normalization, Unicode properties, Locale, and UnicodeString.</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/i18n/</td> |
| |
| <td>Modules in i18n are generally the more data-driven, that is to say |
| resource bundle driven, components. These deal with higher level |
| internationalization issues such as formatting, collation, text break |
| analysis, and transliteration.</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/test/intltest/</td> |
| |
| <td>A test suite including all C++ APIs. For information about running |
| the test suite, see the users' guide.</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/test/cintltst/</td> |
| |
| <td>A test suite written in C, including all C APIs. For information |
| about running the test suite, see the users' guide.</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/data/</td> |
| |
| <td> |
| This directory contains the source data in text format, which is |
| compiled into binary form during the ICU build process. The output |
| from these files is stored in <i><ICU></i>/source/data/build while awaiting |
| further packaging. |
| |
| <ul> |
| <li><b>unidata/</b> This directory contains the Unicode data files. |
| Please see <a href="http://www.unicode.org/"> |
| http://www.unicode.org/</a> for more information.</li> |
| |
| <li> |
| <p><b>Resource Bundle sources</b> .txt files containing ICU |
| language and culture-specific localization data. Two special |
| bundles are <b>root</b>, which is the fallback data and parent of |
| other bundles, and <b>index</b> which contains a list of |
| installed bundles. <b>resfiles.txt</b> contains the list of |
| resource bundle files.</p> |
| |
| <p>Also here are transliteration bundles, and the list of |
| installed transliteration files in <b>translit_index.txt</b>.</p> |
| |
| <p>All resource bundles are compiled into .res files. The <b> |
| ucmfiles.txt</b> file contains the list of converter files.</p> |
| </li> |
| |
| <li><b>Code page converter tables</b> .ucm files containing |
| mappings to and from Unicode. These are compiled into .cnv |
| files.</li> |
| |
| <li><b>convrtrs.txt</b> is the alias mapping table from various |
| converter name formats to ICU internal format and vice versa. It |
| produces cnvalias.dat.</li> |
| |
| <li><b>timezone.txt</b> is a generated file which is compiled into |
| tz.dat, containing time zone information.</li> |
| </ul> |
| </td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/data</td> |
| |
| <td>This directory is where the final, packaged version of the ICU |
| binary data ends up. The intermediate individual data |
| files (.res, .cnv) are kept in the subdirectory |
| "<i><ICU></i>/source/data/build" prior to packaging.</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/tools</td> |
| |
| <td>Tools for generating the data files. Data files are generated by |
| invoking <i><ICU></i>/source/data/build/makedata.bat on Win32 or |
| <i><ICU></i>/source/make on Unix.</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/samples</td> |
| |
| <td>Various sample programs that use ICU</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/extra</td> |
| |
| <td>Non-supported API additions. Currently, it contains the 'ustdio' |
| file i/o library</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/layout</td> |
| |
| <td>Contains the ICU layout engine (not a rasterizer).</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/packaging<br> |
| <i><ICU></i>/debian</td> |
| |
| <td>These directories contain scripts and tools for packaging the final |
| ICU build for various release platforms.</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/config</td> |
| |
| <td>Contains helper makefiles for platform specific build commands. |
| Used by 'configure'.</td> |
| </tr> |
| |
| <tr> |
| <td><i><ICU></i>/source/allinone</td> |
| |
| <td>Contains top-level ICU project files, for instance to build all of |
| ICU under one MSVC project.</td> |
| </tr> |
| </table> |
| <!-- end of ICU structure ==================================== --> |
| |
| <h2><a name="HowToBuild" href="#HowToBuild">How To Build And Install ICU</a></h2> |
| |
| <h3><a name="HowToBuildSupported" href="#HowToBuildSupported">Supported Platforms</a></h3> |
| |
| <table border="1" cellpadding="3" summary=""> |
| <caption> |
| Here is a status of functionality of ICU on several different |
| platforms. |
| </caption> |
| |
| <tr> |
| <th>Operating system</th> |
| |
| <th>Compiler</th> |
| |
| <th>Testing frequency</th> |
| </tr> |
| |
| <tr> |
| <td>Windows 98/NT/2000</td> |
| |
| <td>Microsoft Visual C++ 6.0</td> |
| |
| <td>Reference platform</td> |
| </tr> |
| |
| <tr> |
| <td>Red Hat Linux 6.1</td> |
| |
| <td>gcc 2.95.2</td> |
| |
| <td>Reference platform</td> |
| </tr> |
| |
| <tr> |
| <td>AIX 4.3.3</td> |
| |
| <td>xlC 3.6.4</td> |
| |
| <td>Reference platform</td> |
| </tr> |
| |
| <tr> |
| <td>Solaris 2.6</td> |
| |
| <td>Workshop Pro CC 4.2</td> |
| |
| <td>Reference platform</td> |
| </tr> |
| |
| <tr> |
| <td>HP/UX 11.01</td> |
| |
| <td>aCC A.12.10</td> |
| |
| <td>Reference platform</td> |
| </tr> |
| |
| <tr> |
| <td>AIX 5.1.0 L</td> |
| |
| <td>Visual Age C++ 5.0</td> |
| |
| <td>Regularly tested</td> |
| </tr> |
| |
| <tr> |
| <td>Solaris 2.7</td> |
| |
| <td>Workshop Pro CC 6.0</td> |
| |
| <td>Regularly tested</td> |
| </tr> |
| |
| <tr> |
| <td>Solaris 2.6</td> |
| |
| <td>gcc 2.91.66</td> |
| |
| <td>Regularly tested</td> |
| </tr> |
| |
| <tr> |
| <td>FreeBSD 4.4</td> |
| |
| <td>gcc 2.95.3</td> |
| |
| <td>Regularly tested</td> |
| </tr> |
| |
| <tr> |
| <td>HP/UX 11.01</td> |
| |
| <td>CC A.03.10</td> |
| |
| <td>Regularly tested</td> |
| </tr> |
| |
| <tr> |
| <td>OS/390 (zSeries)</td> |
| |
| <td>CC</td> |
| |
| <td>Regularly tested</td> |
| </tr> |
| |
| <tr> |
| <td>AS/400 (iSeries) V5R1</td> |
| |
| <td>iCC</td> |
| |
| <td>Rarely tested</td> |
| </tr> |
| |
| <tr> |
| <td>NetBSD, OpenBSD</td> |
| |
| <td> </td> |
| |
| <td>Rarely tested</td> |
| </tr> |
| |
| <tr> |
| <td>SGI/IRIX</td> |
| |
| <td> </td> |
| |
| <td>Rarely tested</td> |
| </tr> |
| |
| <tr> |
| <td>PTX</td> |
| |
| <td> </td> |
| |
| <td>Rarely tested</td> |
| </tr> |
| |
| <tr> |
| <td>OS/2</td> |
| |
| <td>Visual Age</td> |
| |
| <td>Rarely tested</td> |
| </tr> |
| |
| <tr> |
| <td>Macintosh</td> |
| |
| <td> </td> |
| |
| <td>Needs help to port</td> |
| </tr> |
| </table> |
| |
| <p><br> |
| </p> |
| |
| <p><strong>Key to testing frequency</strong></p> |
| |
| <dl> |
| <dt><i>Reference platform</i></dt> |
| |
| <dd>ICU will work on these platforms with these compilers</dd> |
| |
| <dt><i>Regularly tested</i></dt> |
| |
| <dd>ICU should work on these platforms with these compilers</dd> |
| |
| <dt><i>Rarely tested</i></dt> |
| |
| <dd>ICU has been ported to these platforms but may not have been tested |
| there recently</dd> |
| </dl> |
| |
| <h3><a name="HowToBuildWindows" href="#HowToBuildWindows">How To Build And Install On |
| Windows</a></h3> |
| |
| <p>Building International Components for Unicode requires:</p> |
| |
| <ul> |
| <li>Microsoft NT 4.0 and above, or Windows 98 and above</li> |
| |
| <li>Microsoft Visual C++ 6.0 (Service Pack 2 is required to work with the |
| release build of max speed optimization).</li> |
| </ul> |
| |
| <p>The steps are:</p> |
| |
| <ol> |
| <li>Unzip the icu-XXXX.zip file into any convenient location. |
| Using command line zip, type "unzip -a icu-XXXX.zip -d |
| drive:\directory", or just use WinZip. |
| </li> |
| |
| <li>Be sure that the ICU binary directory, <i><ICU></i>\bin\, |
| is included in the <strong>PATH</strong> environment variable. |
| The tests will not work without the location of the ICU dll files |
| in the path.</li> |
| |
| <li>Set the <strong>TZ</strong> environment variable to <strong> |
| PST8PDT</strong>. The tests will not work in any other timezone.</li> |
| |
| <li>Open the "<i><ICU></i>\source\allinone\allinone.dsw" workspace |
| file in Microsoft Visual C++ 6.0. (This workspace includes |
| all the International Components for Unicode libraries, necessary ICU |
| building tools, and the intltest and cintltest test suite projects). |
| Please see the note below if you want to build from the command line |
| instead.</li> |
| |
| <li>Set the active Project to the "all" project. To do this: Choose |
| "Project" menu, and select "Set active project". In the submenu, select |
| the "all" workspace.</li> |
| |
| <li>Set the active configuration to "Win32 Debug" or "Win32 Release" (See |
| <a href="#HowToBuildWindowsConfig">note</a> below).</li> |
| |
| <li>Choose the "Build" menu and select "Rebuild All". If you want to |
| build the Debug and Release at the same time, see the <a href= |
| "#HowToBuildWindowsBatch">note</a> below.</li> |
| |
| <li>Run the C++ test suite, "intltest". To do this: set the active |
| project to "intltest", and press F5 to run it.</li> |
| |
| <li>Run the C test suite, "cintltst". To do this: set the active project |
| to "cintltst", and press F5 to run it.</li> |
| |
| <li>Make sure that both "cintltst" and "intltest" passed without any |
| errors. The return codes are non-zero when they do not pass. Visual C++ |
| will display the return codes in the debug tag of the output window. When |
| "intltest" and "cintltest" return 0, it means that everything is |
| installed correctly. You can press Ctrl+F5 on the test project to run the |
| test and see what error messages were displayed (if any tests |
| failed).</li> |
| |
| <li>Reset the <strong>TZ</strong> environment variable to its original |
| value, unless you plan on testing ICU any further.</li> |
| |
| <li>You are now able to develop applications with ICU.</li> |
| </ol> |
| |
| <p><a name="HowToBuildWindowsCommandLine"><strong>Using MSDEV At The |
| Command Line Note:</strong></a> You can build ICU from the command line. |
| Assuming that you have properly installed Microsoft Visual C++ to support |
| command line execution, you can run the following command, 'msdev |
| <i><ICU></i>\source\allinone\allinone.dsw /MAKE "ALL"'.</p> |
| |
| <p><a name="HowToBuildWindowsConfig"><strong>Setting Active Configuration |
| Note:</strong></a> To set the active configuration, two different |
| possibilities are:</p> |
| |
| <ul> |
| <li>Choose "Build" menu, select "Set Active Configuration", and select |
| "Win32 Release" or "Win32 Debug".</li> |
| |
| <li>Another way is to select "Customize" in the "Tools" menu, select the |
| "Toolbars" tab, enable "Build" instead of "Build Minibar", and click on |
| "Close". This will bring up a toolbar which you can move aside the other |
| permanent toolbars at the top of the MSVC window. The advantage is that |
| you now have an easy-to-reach pop-up menu that will always show the |
| currently selected active configuration. Or, you can drag the project and |
| configuration selections and drop them on the menu bar for later |
| selection.</li> |
| </ul> |
| |
| <p><a name="HowToBuildWindowsBatch"><strong>Batch Configuration |
| Note:</strong></a> If you want to build the Debug and Release |
| configurations at the same time, choose "Build" menu and select "Batch |
| Build..." instead (and mark all configurations as checked), then click the |
| button named "Rebuild All". The "all" workspace will build all the test |
| programs as well as the tools for generating binary locale data files. The |
| "makedata" project will be run automatically to convert the locale data |
| files from text format into icudata.dll.</p> |
| |
| <h3><a name="HowToBuildUnix" href="#HowToBuildUnix">How To Build And Install On Unix</a></h3> |
| |
| <p>Building International Components for Unicode on Unix requires:</p> |
| |
| <p>A UNIX C++ compiler, (gcc, cc, xlc_r, etc...) installed on the target |
| machine. A recent version of GNU make (3.7+). For a list of OS/390 tools |
| please view the <a href="#HowToBuildOS390">OS/390 build section</a> of this |
| document for further details.</p> |
| |
| <p>The steps are:</p> |
| |
| <ol> |
| <li>Decompress the icuXXXX.tar (or icuXXXX.tgz) file. For example, <tt>gunzip -d < icuXXXX.tgz | tar xvf -</tt></li> |
| |
| <li>Change directory to the "icu/source".</li> |
| |
| <li>chmod +x runConfigureICU install-sh</li> |
| |
| <li>Run the <a href="source/runConfigureICU">runConfigureICU</a> script |
| for your platform. If you are not using the runConfigureICU script or |
| your platform is not supported by the script, you need to set your CC, |
| CXX, CFLAGS and CXXFLAGS environment variables, and type "./configure". |
| You can type "./configure --help" to print the available options.</li> |
| |
| <li> |
| Type "gmake" to compile the libraries and all the data files. |
| |
| </li> |
| |
| <li>Optionally, type "gmake check" to verify the test suite. |
| <ul> |
| <li><b>Note:</b> You may have to set certain variables if you with |
| to run test programs individually, that is apart from "make check". |
| The <strong>TZ</strong> environment variable needs to be set to |
| <strong>PST8PDT</strong>. Also, the |
| environment variable <strong>ICU_DATA</strong> must be set to |
| the full pathname of the data directory, |
| to indicate where the locale data files and |
| conversion mapping tables are. The trailing "/" is required after |
| the directory name (e.g. "$Root/source/data/" will work, but the value |
| "$Root/source/data" is not acceptable).<P> |
| When running samples or other applications, ICU_DATA only needs to be |
| set if the data is not installed (such as via 'make install') into the |
| default location. |
| |
| </ul> |
| |
| </li> |
| |
| <li>Type "gmake install" to install.</li> |
| |
| |
| |
| </ol> |
| |
| <p>Some platforms use package management tools to control the installation |
| and uninstallation of files on the system, as well as the integrity of the |
| system configuration. You may want to check if ICU can be packaged for your |
| package management tools by looking into the "packaging" directory. (Please |
| note that if you are using a snapshot of ICU from CVS, it is probable that |
| the packaging scripts or related files are not up to date with the contents |
| of ICU at this time, so use them with caution.)</p> |
| |
| <h3><a name="HowToBuildOS390" href="#HowToBuildOS390">OS/390 (zSeries) Platform</a></h3> |
| |
| <p>If you are building on the OS/390 UNIX System Services platform, it is |
| important that you understand a few details:</p> |
| |
| <ul> |
| <li>The gnu utilities gmake and gzip/gunzip are needed and can be |
| obtained for OS/390 from <a href= |
| "http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html#opensrc"> |
| z/OS Unix - Tools and Toys</a>. Documentation on these tools can be found |
| at the <a href= |
| "http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg245944.html"> |
| Open Source Software for OS/390 UNIX</a> Red Book.</li> |
| |
| <li> |
| Encoding considerations: The source code assumes that it is compiled |
| with codepage ibm-1047 (to be exact, the UNIX System Services variant |
| of it). The pax command converts all of the source code files from |
| ASCII to codepage ibm-1047 (USS) EBCDIC. However, some files are binary |
| files and must not be converted, or must be converted back to their |
| original state. You can use the <a href="as_is\os390\unpax-icu.sh"> |
| unpax-icu.sh</a> script to do this for you automatically. It will |
| unpackage the tar file and convert all the necessary files for you |
| automatically. The files that must not be converted to ibm-1047 are the |
| following: |
| |
| <ul> |
| <li>All UTF-8 files</li> |
| |
| <li>icu/data/*.brk</li> |
| |
| <li>icu/source/test/testdata/uni-text.bin</li> |
| |
| <li>icu/source/test/testdata/th18057.txt</li> |
| </ul> |
| Such a conversion can be done using iconv:<br> |
| <code>iconv -f IBM-1047 -t ISO8859-1 uni-text.bin > |
| uni-text.bin</code> |
| </li> |
| |
| <li> |
| DLL directories and the LIBPATH setting: Building and testing ICU needs |
| the ICU libraries on the LIBPATH. In other words, the LIBPATH should |
| contain (each path prepended with the root directory that contains the |
| icu directory): |
| |
| <ul> |
| <li>icu/source/common</li> |
| |
| <li>icu/source/i18n</li> |
| |
| <li>icu/source/tools/ctestfw</li> |
| |
| <li>icu/source/tools/toolutil</li> |
| |
| <li>icu/source/extra/ustdio</li> |
| </ul> |
| </li> |
| |
| <li> |
| <p>OS/390 supports both native S/390 hexadecimal floating point and, |
| (with Version 2.6 and later) IEEE binary floating point. This is a |
| compile time option. Applications built with IEEE should use ICU dlls |
| that are built with IEEE (and vice versa). The environment variable |
| IEEE390=1 will cause the OS/390 version of ICU to be built with IEEE |
| floating point. The default is native hexadecimal floating point.<br> |
| <em>Important:</em> Currently (ICU 1.4.2), native floating point |
| support is sufficient for codepage conversion, resource bundle and |
| UnicodeString operations, but the Format APIs, especially ChoiceFormat, |
| require IEEE binary floating point.</p> |
| |
| <p>Examples for configuring ICU:<br> |
| Debug build: <code>IEEE390=1 ./configure</code><br> |
| Release build: <code>CFLAGS=-2 IEEE390=1 ./configure</code></p> |
| </li> |
| |
| <li>Since the default make on OS/390 is not gmake, the pkgdata tool |
| requires that the "make" command is aliased to your installed version of |
| gmake.</li> |
| |
| <li>The makedep executable that is used with the OS/390 ICU build process |
| is not shipped with ICU. It is available at the <a href= |
| "http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html#opensrc"> |
| z/OS Unix - Tools and Toys</a> site. The PATH environment variable should |
| be updated to contain the location of this executable prior to build. |
| Alternatively, makedep may be moved into an existing PATH directory.</li> |
| |
| <li>To run all of the tests for ICU, use "gmake check". When running |
| individual tests of the test suite, the TZ environment variable should be |
| set to export TZ="PST8PDT" so that time zone comparisons are |
| correct.</li> |
| </ul> |
| |
| <h4>OS/390 Batch (PDS) support</h4> |
| |
| <p>By default, ICU builds its libraries into the HFS. However, there is a |
| 390-specific switch to build some libraries into PDS files. The switch is |
| the environmental variable OS390BATCH, and if set, the following libraries |
| are built into PDS files: libicuuc<i>XX</i>.dll, libicudt<i>XX</i>e.dll, |
| libicudt<i>XX</i>e_390.dll, and libtestdata.dll. Turning on OS390BATCH does |
| not turn off the normal HFS build, thus the HFS dlls will always be |
| created.</p> |
| |
| <p>The names of the PDS files are determined by the value of the |
| environmental variables LOADMOD and LOADEXP. These variables must contain |
| the target PDS names whenever the OS390BATCH variable is set. LOADMOD is |
| the library (.dll) target dataset and LOADEXP is the side deck (.x) target |
| dataset.</p> |
| |
| <p>The PDS member names are as follows:</p> |
| <pre> |
| <samp>IXMICUUC --> libicuuc<i>XX</i>.dll |
| IXMICUDA --> libicudt<i>XX</i>e.dll |
| IXMICUD1 --> libicudt<i>XX</i>e_390.dll |
| IXMICUTE --> libtestdata.dll</samp> |
| </pre> |
| |
| <p>Example PDS attributes are as follows:</p> |
| <pre> |
| <samp>Data Set Name . . . : <i>USER</i>.ICU.LOAD |
| General Data |
| Management class. . : **None** |
| Storage class . . . : BASE |
| Volume serial . . . : TSO007 |
| Device type . . . . : 3390 |
| Data class. . . . . : LOAD |
| Organization . . . : PO |
| Record format . . . : U |
| Record length . . . : 0 |
| Block size . . . . : 32760 |
| 1st extent cylinders: 40 |
| Secondary cylinders : 59 |
| Data set name type : PDS |
| |
| Data Set Name . . . : <i>USER</i>.ICU.EXP |
| General Data |
| Management class. . : **None** |
| Storage class . . . : BASE |
| Volume serial . . . : TSO007 |
| Device type . . . . : 3390 |
| Data class. . . . . : **None** |
| Organization . . . : PO |
| Record format . . . : FB |
| Record length . . . : 80 |
| Block size . . . . : 3200 |
| 1st extent cylinders: 3 |
| Secondary cylinders : 3 |
| Data set name type : PDS</samp> |
| </pre> |
| |
| <h3><a name="HowToBuildOS400" href="#HowToBuildOS400">OS/400 (iSeries) Platform</a></h3> |
| |
| <p>ICU Reference Release 1.8.1 contains partial support for the 400 |
| platform, but additional work by the user is currently needed to get it to |
| build properly. A future release of ICU should work out-of-the-box under |
| OS/400.</p> |
| |
| <ul> |
| <li> |
| Requirements: |
| |
| <ul> |
| <li>QSHELL interpreter installed (install base option 30, operating |
| system)</li> |
| |
| <li>QShell Utilities, PRPQ 5799-XEH (not required for V4R5)</li> |
| |
| <li>ILE C++ for AS/400, PRPQ 5799-GDW (the latest cum package and PTF |
| SF62241 must be installed)</li> |
| |
| <li>GNU facilities (You can get the GNU facilities for OS/400 from <a |
| href="http://www.as400.ibm.com/developer/porting/gnu_utilities.html"> |
| http://www.as400.ibm.com/developer/porting/gnu_utilities.html</a>).</li> |
| </ul> |
| <!-- end requirements --> |
| </li> |
| |
| <li> |
| Build environment setup: |
| |
| <ol> |
| <li> |
| Create AS400 target library. This library will be the target for |
| the resulting modules, programs and service programs. You will |
| specify this library on the OUTPUTDIR environment variable in step |
| 2.<br> |
| |
| <pre> |
| <samp>CRTLIB LIB(<i>libraryname</i>)</samp> |
| </pre> |
| <br> |
| </li> |
| |
| <li> |
| Set up the following environment variables in your build process |
| (use the <i>libraryname</i> from the previous step) |
| <pre> |
| <samp>ADDENVVAR ENVVAR(ICU_DATA) VALUE('/icu/source/data') |
| ADDENVVAR ENVVAR(CC) VALUE('/usr/bin/icc') |
| ADDENVVAR ENVVAR(CXX) VALUE('/usr/bin/icc') |
| ADDENVVAR ENVVAR(MAKE) VALUE('/usr/bin/gmake') |
| ADDENVVAR ENVVAR(OUTPUTDIR) VALUE('<i>libraryname</i>')</samp> |
| </pre> |
| <i>libraryname</i> identifies target as400 library for *module, |
| *pgm and *srvpgm objects.<br> |
| <br> |
| </li> |
| |
| <li>Add QCXXN, to your build process library list. This results in |
| the resolution of CRTCPPMOD used by the icc compiler</li> |
| |
| <li> |
| In order to get the tests to run correctly, the QUTCOFFSET needs to |
| be set to the Pacific Time Zone offset.<br> |
| <br> |
| To check your QUTCOFFSET: |
| <pre> |
| <samp>DSPSYSVAL SYSVAL(QUTCOFFSET)</samp> |
| </pre> |
| <br> |
| To change your QUTCOFFSET:<br> |
| <pre> |
| <samp>CHGSYSVAL SYSVAL(QUTCOFFSET) VALUE('-0800')</samp> |
| </pre> |
| You should change -0800 to -0700 for daylight savings.<br> |
| <br> |
| </li> |
| |
| <li>Run 'CHGJOB CCSID(37)'</li> |
| |
| <li>Run 'QSH'</li> |
| |
| <li>Run gunzip on the ICU source code compressed tar archive |
| (icu-<i>X</i>-<i>Y</i>.tar.gz or icu-<i>X</i>-<i>Y</i>.tgz).</li> |
| |
| <li>Run unpax-icu.sh on the tar file from the ICU download page.</li> |
| |
| <li>Change your current directory to icu/source.</li> |
| |
| <li> |
| Configure the Makefiles with the as/400 configure script from the |
| ICU download page. <strong>Note:</strong> Verify that the mh-os400 |
| configure file is used. |
| |
| <ul> |
| <li>Run 'configure --host=as400-os400'</li> |
| |
| <li>The 'clean' and 'install' targets will not work without |
| changes because of symbolic links. To delete the target module, |
| program, or service programs replace <tt>rm -rf</tt> with |
| <strong>$(RMV)</strong>, and in the library installation targets |
| (install-library) change <tt>$(INSTALL)</tt> to <strong><tt> |
| $(INSTALL-S)</tt></strong>.</li> |
| </ul> |
| </li> |
| |
| <li>Run 'gmake -e'. The '-e' option is needed to pickup the |
| compilers.</li> |
| |
| <li>Run 'gmake -e check' to run the tests.</li> |
| </ol> |
| <!-- end build environment --> |
| </li> |
| </ul> |
| |
| <h2><a name="ImportantNotes" href="#ImportantNotes">Important Notes About Using ICU</a></h2> |
| |
| <h3><a name="ImportantNotesWindows" href="#ImportantNotesWindows">Windows Platform</a></h3> |
| |
| <p>If you are building on the Win32 platform, it is important that you |
| understand a few of the following build details.</p> |
| |
| <h4>DLL directories and the PATH |
| setting</h4> |
| |
| <p>As delivered, the International Components for Unicode build as several |
| DLLs which are placed in the "<i><ICU></i>\bin" directory. You must add this |
| directory to the PATH environment variable in your system, or any |
| executables you build will not be able to access International Components |
| for Unicode libraries. Alternatively, you can copy the DLL files into a |
| directory already in your PATH, but we do not recommend this. You can wind |
| up with multiple copies of the DLL and wind up using the wrong one.</p> |
| |
| <h4><a name="ImportantNotesWindowsPath">Changing your PATH</a></h4> |
| |
| <ul> |
| <li><strong>Windows 2000</strong>: Use the System Icon in the Control |
| Panel. Pick the "Advanced" tab. Select the "Environment Variables..." |
| button. Select the variable PATH in the lower box, and select the lower |
| "Edit..." button. In the "Variable Value" box, append the string |
| ";<i><ICU></i>\bin" to the end of the path string. If there is nothing there, |
| just type in "<i><ICU></i>\bin". Click the Set button, then the OK button.</li> |
| |
| <li><strong>Windows NT</strong>: Use the System Icon in the Control |
| Panel. Pick the "Environment" tab, and select the variable PATH in the |
| lower box. In the "value" box, append the string ";<i><ICU></i>\bin" at the end |
| of the path string. If there is nothing there, just type in "<i><ICU></i>\bin". |
| Click the Set button, then the OK button.</li> |
| |
| <li><strong>Windows 95/98/ME</strong>: Edit the autoexec.bat, and add the |
| following line to the end of file, "SET PATH=%PATH%;<i><ICU></i>\bin"</li> |
| </ul> |
| |
| <p>Note: when packaging a Windows application for distribution and |
| installation on user systems, copies of the ICU dlls should |
| be included with the application, and installed for exclusive use |
| by the application. This is the only way to insure that your app |
| is running with the same version of ICU, built with exactly the same |
| options, that you developed and tested with. Refer to Microsoft's |
| guidelines on the usage of dlls, or search for the phrase "dll hell" |
| on <a href="http://msdn.microsoft.com/">msdn.microsoft.com</a>.</p> |
| |
| <h4>Linking with Runtime |
| libraries</h4> |
| |
| <p>All the DLLs link with the C runtime library "Debug Multithreaded DLL" |
| or "Multithreaded DLL." (This is changed through the Project Settings |
| dialog, on the C/C++ tab, under Code Generation.) It is important that any |
| executable or other DLL you build which uses the International Components |
| for Unicode DLLs links with these runtime libraries as well. If you do not |
| do this, you will get random memory errors when you run the executable.<br> |
| </p> |
| |
| <h3><a name="ImportantNotesUnix" href="#ImportantNotesUnix">Unix Type Platform</a></h3> |
| |
| <p>If you are building on a Unix platform, it is important that you add the |
| location of your ICU libraries (including the data library) to your |
| LD_LIBRARY_PATH environment variable. The ICU libraries may not link or |
| load properly without doing this.</p> |
| |
| <h3><a name="ImportantNotesDefaultCP" href="#ImportantNotesDefaultCP">Using the default codepage</a></h3> |
| |
| <p>ICU has code to determine the default codepage of the system or process. |
| This default codepage can be used to convert <code>char *</code> strings to |
| and from Unicode.</p> |
| |
| <p>Depending on system design, setup and APIs, it may not always be possible |
| to find a default codepage that fully works as expected. For example,</p> |
| |
| <ul> |
| <li>On Windows there are three encodings in use at the same time. Unicode |
| (UTF-16) is always used inside of Windows, while for <code>char *</code> |
| encodings there are two classes, called "ANSI" and |
| "OEM" codepages. ICU will use the ANSI codepage. Note that the |
| OEM codepage is used by default for console window output.</li> |
| <li>On some Unix-type systems, non-standard names are used for encodings, |
| or non-standard encodings are used altogether. Although ICU supports 200 |
| encodings in its standard build and many more aliases for them, it will |
| not be able to recognize such non-standard names.</li> |
| <li>Some systems do not have a notion of a system or process codepage, and |
| may not have APIs for that.</li> |
| </ul> |
| <p>If you have means of detecting a default codepage name that are more |
| appropriate for your application, then you should set that name with <code>ucnv_setDefaultName()</code> |
| as the first ICU function call. This makes sure that the internally cached |
| default converter will be instantiated from your preferred name.</p> |
| |
| <p>Starting in ICU 2.0, when a converter for the default codepage cannot be opened, a |
| fallback default codepage name and converter will be used. On most platforms, this will be |
| US-ASCII. For OS/390 (z/OS), ibm-1047-s390 is the default fallback |
| codepage. For AS/400 (iSeries), ibm-37 is the default fallback codepage. |
| This default fallback codepage is used when the operating system is using |
| a non-standard name for a default codepage, or the converter was not |
| packaged with ICU. The feature allows ICU to run in unusual |
| computing environments without completely failing.</p> |
| |
| <h3><a name="ImportantNotesDeprecatedAPI" href="#ImportantNotesDeprecatedAPI">Methods for enabling deprecated |
| APIs</a></h3> |
| |
| <h4>C</h4> |
| |
| <p>Some deprecated C APIs can be enabled without recompiling the ICU |
| libraries. This can be achieved by defining certain symbols before |
| including the ICU header files. For example, to enable deprecated C APIs |
| for formatting.</p> |
| <pre> |
| <samp>#ifndef U_USE_DEPRECATED_FORMAT_API |
| # define U_USE_DEPRECATED_FORMAT_API 1 |
| #endif |
| |
| #include "unicode/udat.h" |
| |
| int main(){ |
| UDateFormat *def, *fr, *fr_pat ; |
| UErrorCode status = U_ZERO_ERROR; |
| UChar temp[30]; |
| |
| fr = udat_open(UDAT_FULL, UDAT_DEFAULT, "fr_FR", NULL,0, &status); |
| if(U_FAILURE(status)){ |
| printf("Error creating the french dateformat using full time style\n %s\n", |
| myErrorName(status) ); |
| } |
| /* This is supposed to open default date format, |
| but later on it treats it like it is "en_US". |
| This is very bad when you try to run the tests |
| on a machine where the default locale is NOT "en_US" |
| */ |
| def = udat_open(UDAT_SHORT, UDAT_SHORT, "en_US", NULL, 0, &status); |
| if(U_FAILURE(status)){ |
| .... /* handle the error */ |
| } |
| }</samp> |
| </pre> |
| |
| <h4>C++</h4> |
| |
| <p>Deprecated C++ APIs cannot be enabled without recompiling ICU libraries. |
| Every service has a specific symbol that should be defined to enable the |
| deprecated API of that service. For example: To enable deprecated APIs in |
| Transliteration service, the U_USE_DEPRECATED_TRANSLITERATOR_API symbol should |
| be defined before compiling ICU.</p> |
| |
| <h2><a name="PlatformDependencies" href="#PlatformDependencies">Platform Dependencies</a></h2> |
| |
| <p>The platform dependencies have been mostly isolated into the following |
| files in the common library. This information can be useful if you are |
| porting ICU to a new platform.</p> |
| |
| <ul> |
| <li> |
| <strong>unicode/platform.h.in</strong> (autoconf'ed platforms)<br> |
| <strong>unicode/p<i>XXXX</i>.h</strong> (others: pwin32.h, pmacos.h, |
| ..): Platform-dependent typedefs and defines:<br> |
| <br> |
| |
| |
| <ul> |
| <li>XP_CPLUSPLUS for C++ only.</li> |
| |
| <li>TRUE and FALSE, UBool, int8_t, int16_t etc.</li> |
| |
| <li>U_EXPORT and U_IMPORT for specifying dynamic library import and |
| export</li> |
| </ul> |
| <br> |
| </li> |
| |
| <li> |
| <strong>unicode/putil.h, putil.c</strong>: platform-dependent |
| implementations of various functions that are platform dependent:<br> |
| <br> |
| |
| |
| <ul> |
| <li>uprv_isNaN, uprv_isInfinite, uprv_getNaN and uprv_getInfinity for |
| handling special floating point values.</li> |
| |
| <li>uprv_tzset, uprv_timezone, uprv_tzname and time for getting |
| platform specific time and timezone information.</li> |
| |
| <li>u_getDataDirectory for getting the default data directory.</li> |
| |
| <li>uprv_getDefaultLocaleID for getting the default locale |
| setting.</li> |
| |
| <li>uprv_getDefaultCodepage for getting the default codepage |
| encoding.</li> |
| </ul> |
| <br> |
| </li> |
| |
| <li> |
| <strong>umutex.h, umutex.c</strong>: Code for doing synchronization in |
| multithreaded applications. If you wish to use International Components |
| for Unicode in a multithreaded application, you must provide a |
| synchronization primitive that the classes can use to protect their |
| global data against simultaneous modifications. See Users' guide for |
| more information.<br> |
| <br> |
| |
| |
| <ul> |
| <li>We supply sample implementations for WinNT, Win95, Win98, |
| Sun/Solaris, RedHat/Linux, HP-UX and for AIX on an RS/6000.</li> |
| </ul> |
| <br> |
| </li> |
| |
| <li> |
| <strong>umapfile.h, umapfile.c</strong>: functions for mapping |
| or otherwise reading or loading files into memory. All access |
| by ICU to data from files makes use of these functions. |
| <br> <br> |
| </li> |
| |
| <li>For the Intltest test suite, intltest.cpp in |
| "icu/source/test/intltest/" contains the method pathnameInContext, which |
| must also be adapted to any new platform.</li> |
| |
| <li>Using platform specific #ifdef macros are highly discouraged outside |
| of the scope of these files. When the source code gets updated in the |
| future, these #ifdef's can cause testing problems for your platform.</li> |
| </ul> |
| |
| <p>It is possible to build each library individually. They must be built in |
| the following order:<br> |
| </p> |
| |
| <ol> |
| <li>stubdata</li> |
| |
| <li>common</li> |
| |
| <li>i18n</li> |
| |
| <li>toolutil</li> |
| |
| <li>makeconv</li> |
| |
| <li>genrb</li> |
| |
| <li>gentz</li> |
| |
| <li>genccode</li> |
| |
| <li>gennames</li> |
| |
| <li>genuca</li> |
| |
| <li>gennorm</li> |
| |
| <li>makedata (a project on Windows, or source/data/Makefile on Unix)</li> |
| |
| <li>ctestfw, intltest and cintltst, if you want to run the test |
| suite.</li> |
| </ol> |
| |
| <hr> |
| |
| <p>Copyright © 1997-2001 International Business Machines Corporation |
| and others. All Rights Reserved.<br> |
| IBM Center for Emerging Technologies Silicon Valley,<br> |
| 10275 N De Anza Blvd., Cupertino, CA 95014<br> |
| All rights reserved.</p> |
| </body> |
| </html> |
| |