| <!DOCTYPE html public "-//W3C//DTD HTML 4.0 Transitional//EN"> |
| <html> |
| <head> |
| <title>ICU Coding Guidelines</title> |
| <meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"> |
| </head> |
| <body bgcolor="#FFFFFF"> |
| <h1>International Components for Unicode</h1> |
| <h2>ICU Coding Guidelines</h2> |
| <hr> |
| <p>This page describes guidelines for writing code for the International |
| Components for Unicode and how to add it to the project.</p> |
| <ul> |
| <li><a href="#coding">General Guidelines</a></li> |
| <li><a href="#addfiles">Adding files to ICU</a></li> |
| <li><a href="#tools">Build Tools</a></li> |
| </ul> |
| <a name="coding"></a> |
| <h2 align="center">General Guidelines</h2> |
| <ul> |
| <li>Constants (#define, enum items, const): uppercase. <code>UBREAKITERATOR_DONE</code>, |
| <code>UBIDI_DEFAULT_LTR</code>, <code>ULESS</code>.</li> |
| <li>Variables and functions: mixed-case, starting with lowercase. <code>getLength()</code>.</li> |
| <li>Types (class, struct, enum, union): mixed-case, starting with uppercase. |
| <code>class DateFormatSymbols</code>.</li> |
| <li>We use getProperty() and setProperty() functions.</li> |
| <li>getLength(), getSomethingAt(index/offset).</li> |
| <li>Where we return a number of items, it is <code>countItems()</code> |
| - <i>not</i> getItemCount() (even if we do not need to actually <i>count</i> |
| in the implementation of that member function).</li> |
| <li>Ranges of indexes: we specify a range of indexes by having <i>start</i> |
| and <i>limit</i> parameters, with names or suffixes like that. Such |
| a range contains indexes from start to limit-1, i.e., it is an interval |
| that is left-closed and right-open. Mathematically, [start..limit[.</li> |
| <li>Functions that take a buffer (pointer) and a length argument with |
| a default value so that the function determines the length of the input |
| itself (for text, calling u_strlen()), that default value should be |
| -1. Any other negative or undefined value constitutes an error.</li> |
| <li>Primitive types: they are defined by utypes.h or a header file that |
| it includes. The most common ones are uint8_t, uint16_t, uint32_t, int8_t, |
| int16_t, int32_t, UTextOffset (signed), and UErrorCode.</li> |
| <li>File names (.h, .c, .cpp, data files, etc.): 8.3, all lowercase.</li> |
| <li>Language extensions and standards: do not use any features (language |
| extensions or library functions) that are proprietary and do not work |
| on all C or C++ compilers.<br> |
| For example, in Microsoft Visual C++, you should go to Project Settings(alt-f7)->All |
| Configurations->C/C++->Customize, and set Disable Language Extensions.</li> |
| <li>Tabs and indentation: no tab characters (\x09), save files with spaces |
| instead.<br> |
| Indentation is of size 4.</li> |
| <li>Documentation: We use Javadoc-style in-file documentation with <a href="http://www.zib.de/Visual/software/doc++/index.html">doc++</a>. |
| We have a printed version of the manual in our "library", and our files |
| use this.</li> |
| <li>You should not place multiple statements into one line. You should |
| not have if() or loop heads followed by their bodies on the same line.</li> |
| <li>Placements of curly braces {}: each of us subscribes to different |
| philosophies. Please try to do it reasonably and consistently. It is |
| a good idea to use the style of a file when you modify it, instead of |
| mixing in your favorite style.<br> |
| The one thing that we ask for is to not have if() and loop bodies without |
| curly braces.</li> |
| <li>Function declarations should have one line with the return type and |
| all the import, extern, and export declarations, and the function name |
| and signature at the beginning of the next line. |
| <pre> |
| U_CAPI int32_t U_EXPORT2 |
| u_formatMessage(...); |
| </pre> |
| </li> |
| <li>Error codes: The ICU API functions for both C and C++ are using a |
| pointer or a reference to UErrorCode whereever we expect things to possibly |
| go wrong. |
| <p>In C, this is a pointer, and it must be checked for <code>NULL</code>. |
| In C++, this is a reference. In both cases, it must be checked for |
| an error code already being in there:</p> |
| <pre> |
| U_CAPI const UBiDiLevel * U_EXPORT2 |
| ubidi_getLevels(UBiDi *pBiDi, UErrorCode *pErrorCode) { |
| UTextOffset start, length; |
| |
| if(pErrorCode==NULL || U_FAILURE(*pErrorCode)) { |
| return NULL; |
| } else if(pBiDi==NULL || (length=pBiDi->length)<=0) { |
| *pErrorCode=U_ILLEGAL_ARGUMENT_ERROR; |
| return NULL; |
| } |
| |
| ... |
| return result; |
| } |
| </pre> |
| This prevents the API function from doing anything on data that is not |
| valid in a chain of function calls and relieves the caller from checking |
| the error code after each call.</li> |
| <li>Decide whether your module is part of the "common" or the "i18n" API |
| collection. Use the appropriate macros, like <code>U_COMMON_IMPLEMENTATION</code>, |
| <code>U_I18N_IMPLEMENTATION</code>, <code>U_COMMON_API</code>, <code>U_I18N_API</code>. |
| See <code>utypes.h</code>.</li> |
| <li>If we have the same module in C and in C++, then there will be two |
| header files, one for each language, even if one uses the other. For |
| example, ubidi.h for C and bidi.h for C++.</li> |
| <li> |
| <p>Platform dependencies are dealt with in the header files that <code>utypes.h</code> |
| includes. They are <code>platform.h</code> and its more specific cousins |
| like <code>pwin32.h</code> for Windows, which define basic types, |
| and <code>putil.h</code>, which defines platform utilities.</p> |
| <p></p> |
| <strong>Important: </strong>Outside of these files, and a small number |
| of implementation files that depend on platform differences (like <code>umutex.c</code>), |
| <i>no</i> ICU source code may have <i>any</i> <code>#ifdef <i>OperatingSystemName</i></code> |
| instructions. |
| <p></p> |
| </li> |
| <li> For mutual-exclusion (mutex) blocks, there should be no function |
| calls within a mutex block. The idea behind this is to prevent deadlocks |
| from occuring later. There should be as little code inside a mutex block |
| as possible to minimize the performance degradation from blocked threads. |
| </li> |
| </ul> |
| <h3>C Guidelines</h3> |
| <ul> |
| <li>Since we don't have classes to subdivide the namespace, we use prefixes |
| to avoid name collisions. Some of those prefixes contain a 3- (or sometimes |
| 4-) letter module identifier. Very general names like <code>u_charDirection()</code> |
| don't have a module identifier in their prefix. |
| <ul> |
| <li>For POSIX replacements, we prepend the (all lowercase) POSIX function |
| name with "u_": <code>u_strlen()</code>.</li> |
| <li>For other API functions, we prepend a 'u', the module identifier |
| (if appropriate), and an underscore '_', followed by the <i>mixed-case</i> |
| function name: <code>u_charDirection()</code>, <code>ubidi_setPara()</code>.</li> |
| <li>For types (struct, enum, union), we prepend a "U", often "U<module |
| identifier>" directly to the typename, without an underscore. <code>UComparisonResult</code>.</li> |
| <li>For <code>#define</code>d constants and macros , we prepend a |
| "U_", often "U<module identifier>_" with an underscore to the |
| uppercase macro name. <code>U_ZERO_ERROR</code>, <code>U_SUCCESS()</code>.</li> |
| </ul> |
| </li> |
| <li>Function declarations need to be of the form <code>U_CAPI return-type |
| U_EXPORT2</code> to satisfy all compilers' needs.</li> |
| <li>Functions that roughly compare to constructors and destructors are |
| called umod_open() and umod_close(). |
| <pre> |
| U_CAPI UBiDi * U_EXPORT2 |
| ubidi_open(); |
| |
| U_CAPI UBiDi * U_EXPORT2 |
| ubidi_openSized(UTextOffset maxLength, UTextOffset maxRunCount); |
| |
| U_CAPI void U_EXPORT2 |
| ubidi_close(UBiDi *pBiDi); |
| </pre> |
| </li> |
| <li>In cases like BreakIterator and NumberFormat, instead of having several |
| different 'open' APIs for each kind of instances, use an enum selector.</li> |
| <li>File names begin with a "u".</li> |
| <li>For memory allocation in C implementation files for ICU, the functions/macros |
| in <code>cmemory.h</code> must be used. </li> |
| </ul> |
| <h3>C++ Guidelines</h3> |
| <ul> |
| <li>Classes and their members do not need a "U" or any other prefix.</li> |
| <li>class APIs need to be declared like <code>class U_I18N_API SimpleDateFormat</code> |
| or like <code>class U_COMMON_API UCharCharacterIterator</code>.</li> |
| <li>Class member functions should be only declared, not inline-implemented, |
| in the class declaration. Inline implementations may follow after the |
| class declaration in the same file.</li> |
| <li>We are using <code>XP_PLUSPLUS</code> to make sure the compiler does |
| C++, not <code>__cplusplus</code>.</li> |
| <li>We do not use exceptions, and we do not use templates (at least on |
| the API).</li> |
| <li>File names do not begin with a "u".</li> |
| <li> |
| <h4>Adoption of objects:</h4> |
| Some constructors and factory functions take pointers to objects that |
| they <i>adopt</i>. This means that the newly created object will contain |
| a pointer to the adoptee and takes over ownership and lifecycle control. |
| If an error occurs while creating the new object - and thus in the code |
| that adopts an object - then the semantics used within ICU must be <i>adopt-on-call</i> |
| (as opposed to, e.g., adopt-on-success): |
| <ul> |
| <li>General: A constructor or factory function that adopts an object |
| does so in all cases, even if an error occurs and a <code>UErrorCode</code> |
| is set. This means that either the adoptee is deleted immediately |
| or its pointer is stored in the new object. The former case is most |
| common when the constructor or factory function is called and the |
| <code>UErrorCode</code> already indicates a failure. In the latter |
| case, the new object must take care of deleting the adoptee once |
| it is deleted itself regardless of whether the constructor was successful.</li> |
| <li>Constructors: The code that creates the object with the <code>new</code> |
| operator must check the resulting pointer returned by <code>new</code> |
| and delete any adoptees if it is <code>0</code> because the constructor |
| was not called. (Typically, a <code>UErrorCode</code> must be set |
| to <code>U_MEMORY_ALLOCATION_ERROR</code>.)</li> |
| <li>Factory functions (<code>createInstance()</code>): The factory |
| function must set a <code>U_MEMORY_ALLOCATION_ERROR</code> and delete |
| any adoptees if it cannot allocate the new object. If the construction |
| of the object fails otherwise, then the factory function must delete |
| it - and it in turn must delete its adoptees. As a result, a factory |
| function always returns either a valid object and a successful <code>UErrorCode</code>, |
| or a <code>0</code> pointer and a failure <code>UErrorCode</code>.<br> |
| Example: |
| <pre> |
| Calendar* |
| Calendar::createInstance(TimeZone* zone, UErrorCode& errorCode) { |
| if(U_FAILURE(errorCode)) { |
| delete zone; |
| return 0; |
| } |
| // since the Locale isn't specified, use the default locale |
| Calendar* c = new GregorianCalendar(zone, Locale::getDefault(), errorCode); |
| if(c == 0) { |
| errorCode = U_MEMORY_ALLOCATION_ERROR; |
| delete zone; |
| } else if(U_FAILURE(errorCode)) { |
| delete c; |
| c = 0; |
| } |
| return c; |
| } |
| </pre> |
| </ul> |
| </li> |
| <li>For memory allocation in C++ implementation files for ICU, the standard |
| <code>new</code> and <code>delete</code> operators (or the C functions/macros |
| in <code>cmemory.h</code>) must be used. </li> |
| </ul> |
| <a name="addfiles"></a> |
| <h2 align="center">Adding files to ICU</h2> |
| <h3>Adding .c, .cpp, and .h files</h3> |
| <p>In order to add compilable files to ICU, you not only need to add them |
| to the source code control system in the appropriate folder, but also |
| add them to the build environment.</p> |
| <p>The first step is to choose one of the ICU libraries:</p> |
| <ol> |
| <li>The <em>common</em> library provides mostly low-level utilities and |
| basic APIs that often do not make use of Locales. Examples are APIs |
| that deal with character properties, the Locale APIs themselves, and |
| ResourceBundle APIs.</li> |
| <li>The <em>i18n</em> library provides Locale-dependent and -using APIs, |
| like for collation and formatting, that are most useful for internationalized |
| user input and output.</li> |
| </ol> |
| <p>Put the source code files into the folder <code>icu/source/<i>library-name</i></code>.</p> |
| <p>Then add them to the build system: |
| <ul> |
| <li> |
| <p>For most platforms, add the expected .o files to <code>icu/source/<i>library-name</i>/Makefile.in</code>, |
| to the <code>OBJECTS</code> variable.</p> |
| <p>Add the <i>public</i> header files to the <code>HEADERS</code> variable.</p> |
| </li> |
| <li>For Microsoft Visual C++ 6.0, add all the source code files to <code>icu/source/<i>library-name</i>/<i>library-name</i>.dsp</code>. |
| If you don't have Visual C++, then try to add the filenames to the project |
| file manually; it is a text file, and this part should be fairly obvious.</li> |
| </ul> |
| <p></p> |
| <p>You also need to add test code to <code>icu/source/test/cintltest</code> |
| for C APIs and to <code>icu/source/test/intltest</code> for C++ APIs.</p> |
| <p>All the API functions must be called by the test code (100% API coverage), |
| and at least 85% of the implementation code should be exercised by the |
| tests (>=85% code coverage).</p> |
| <ol> |
| <li>For C, create test code using the <code>log_err()</code>, <code>log_info()</code>, |
| and <code>log_verbose()</code> APIs from <code>cintltst.h</code> (which |
| uses <code>ctest.h</code>), and check it into the appropriate folder.</li> |
| <li>In order to get your C test code called, you have to add its toplevel |
| function and a descriptive test module path to the test system by calling |
| <code>addTest()</code>. The function that makes the call to <code>addTest()</code> |
| ultimately has to be called by <code>addAllTests()</code> in <code>calltest.c</code>. |
| Groups of tests typically have a common <code>addGroup()</code> function |
| that calls <code>addTest()</code> for the test functions in its group, |
| according to the common part of the test module path.</li> |
| <li>Add that test code to the build system, too. Modify <code>Makefile.in</code> |
| and the appropriate <code>.dsp</code> file like for the library code.</li> |
| </ol> |
| <a name="tools"></a> |
| <h2 align="center">Build Tools</h2> |
| <p>We are using the following tools to build ICU:</p> |
| <ul> |
| <li>GNU make version 3.76.1 and up.</li> |
| <li>GNU cc version 2.95.2 and up. (egcs merged back into gcc.)</li> |
| <li>autoconf version 2.13 and up.</li> |
| <li>autoconf needs m4 version 1.4 and up.</li> |
| <li>Platform-specific compilers as listed with the supported platforms.<br> |
| (For a complete list of supported platforms, see the <a href="http://oss.software.ibm.com/icu/tech_faq.html">ICU |
| Technical FAQ</a>.)</li> |
| <li>On Windows: Microsoft Visual C++ 6.0 with the latest Service Packs.</li> |
| </ul> |
| </body> |
| </html> |