blob: 8cb4a12336bb6ba5a66a0c798b6a3efb941243ab [file] [log] [blame]
<html>
<!DOCTYPE html public "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>ICU Coding Guidelines</title>
<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">
</head>
<body bgcolor="#FFFFFF">
<h1>International Components for Unicode</h1>
<h2>ICU Coding Guidelines</h2>
<hr>
<p>This page describes guidelines for writing code for the International
Components for Unicode and how to add it to the project.</p>
<ul>
<li><a href="#coding">General Guidelines</a></li>
<li><a href="#addfiles">Adding files to ICU</a></li>
<li><a href="#tools">Build Tools</a></li>
</ul>
<a name="coding"></a>
<h2 align="center">General Guidelines</h2>
<ul>
<li>Constants (#define, enum items, const): uppercase. <code>UBREAKITERATOR_DONE</code>,
<code>UBIDI_DEFAULT_LTR</code>, <code>ULESS</code>.</li>
<li>Variables and functions: mixed-case, starting with lowercase. <code>getLength()</code>.</li>
<li>Types (class, struct, enum, union): mixed-case, starting with uppercase.
<code>class DateFormatSymbols</code>.</li>
<li>We use getProperty() and setProperty() functions.</li>
<li>getLength(), getSomethingAt(index/offset).</li>
<li>Where we return a number of items, it is <code>countItems()</code>
- <i>not</i> getItemCount() (even if we do not need to actually <i>count</i>
in the implementation of that member function).</li>
<li>Ranges of indexes: we specify a range of indexes by having <i>start</i>
and <i>limit</i> parameters, with names or suffixes like that. Such
a range contains indexes from start to limit-1, i.e., it is an interval
that is left-closed and right-open. Mathematically, [start..limit[.</li>
<li>Functions that take a buffer (pointer) and a length argument with
a default value so that the function determines the length of the input
itself (for text, calling u_strlen()), that default value should be
-1. Any other negative or undefined value constitutes an error.</li>
<li>Primitive types: they are defined by utypes.h or a header file that
it includes. The most common ones are uint8_t, uint16_t, uint32_t, int8_t,
int16_t, int32_t, UTextOffset (signed), and UErrorCode.</li>
<li>File names (.h, .c, .cpp, data files, etc.): 8.3, all lowercase.</li>
<li>Language extensions and standards: do not use any features (language
extensions or library functions) that are proprietary and do not work
on all C or C++ compilers.<br>
For example, in Microsoft Visual C++, you should go to Project Settings(alt-f7)->All
Configurations->C/C++->Customize, and set Disable Language Extensions.</li>
<li>Tabs and indentation: no tab characters (\x09), save files with spaces
instead.<br>
Indentation is of size 4.</li>
<li>Documentation: We use Javadoc-style in-file documentation with <a href="http://www.doxygen.org">doxygen</a>.
We have a printed version of the manual in our "library", and our files
use this.</li>
<li>You should not place multiple statements into one line. You should
not have if() or loop heads followed by their bodies on the same line.</li>
<li>Placements of curly braces {}: each of us subscribes to different
philosophies. Please try to do it reasonably and consistently. It is
a good idea to use the style of a file when you modify it, instead of
mixing in your favorite style.<br>
The one thing that we ask for is to not have if() and loop bodies without
curly braces.</li>
<li>Function declarations should have one line with the return type and
all the import, extern, and export declarations, and the function name
and signature at the beginning of the next line.
<pre>
U_CAPI int32_t U_EXPORT2
u_formatMessage(...);
</pre>
</li>
<li>Always use <code>static</code> for variables, functions, and constants
that are not exported explicitely by a header file. Some platforms are
confused if non-static symbols are not explictely declared extern, and
will not be able to build ICU and link against it. </li>
<li>Error codes: The ICU API functions for both C and C++ are using a
pointer or a reference to UErrorCode whereever we expect things to possibly
go wrong.
<p>In C, this is a pointer, and it must be checked for <code>NULL</code>.
In C++, this is a reference. In both cases, it must be checked for
an error code already being in there:</p>
<pre>
U_CAPI const UBiDiLevel * U_EXPORT2
ubidi_getLevels(UBiDi *pBiDi, UErrorCode *pErrorCode) {
UTextOffset start, length;
if(pErrorCode==NULL || U_FAILURE(*pErrorCode)) {
return NULL;
} else if(pBiDi==NULL || (length=pBiDi->length)<=0) {
*pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
return NULL;
}
...
return result;
}
</pre>
This prevents the API function from doing anything on data that is not
valid in a chain of function calls and relieves the caller from checking
the error code after each call.</li>
<li>Decide whether your module is part of the "common" or the "i18n" API
collection. Use the appropriate macros, like <code>U_COMMON_IMPLEMENTATION</code>,
<code>U_I18N_IMPLEMENTATION</code>, <code>U_COMMON_API</code>, <code>U_I18N_API</code>.
See <code>utypes.h</code>.</li>
<li>If we have the same module in C and in C++, then there will be two
header files, one for each language, even if one uses the other. For
example, ubidi.h for C and bidi.h for C++.</li>
<li>
<p>Platform dependencies are dealt with in the header files that <code>utypes.h</code>
includes. They are <code>platform.h</code> and its more specific cousins
like <code>pwin32.h</code> for Windows, which define basic types,
and <code>putil.h</code>, which defines platform utilities.</p>
<p></p>
<strong>Important: </strong>Outside of these files, and a small number
of implementation files that depend on platform differences (like <code>umutex.c</code>),
<i>no</i> ICU source code may have <i>any</i> <code>#ifdef <i>OperatingSystemName</i></code>
instructions.
<p></p>
</li>
<li> For mutual-exclusion (mutex) blocks, there should be no function
calls within a mutex block. The idea behind this is to prevent deadlocks
from occuring later. There should be as little code inside a mutex block
as possible to minimize the performance degradation from blocked threads.
</li>
</ul>
<h3>C Guidelines</h3>
<ul>
<li>Since we don't have classes to subdivide the namespace, we use prefixes
to avoid name collisions. Some of those prefixes contain a 3- (or sometimes
4-) letter module identifier. Very general names like <code>u_charDirection()</code>
don't have a module identifier in their prefix.
<ul>
<li>For POSIX replacements, we prepend the (all lowercase) POSIX function
name with "u_": <code>u_strlen()</code>.</li>
<li>For other API functions, we prepend a 'u', the module identifier
(if appropriate), and an underscore '_', followed by the <i>mixed-case</i>
function name: <code>u_charDirection()</code>, <code>ubidi_setPara()</code>.</li>
<li>For types (struct, enum, union), we prepend a "U", often "U&lt;module
identifier>" directly to the typename, without an underscore. <code>UComparisonResult</code>.</li>
<li>For <code>#define</code>d constants and macros , we prepend a
"U_", often "U&lt;module identifier>_" with an underscore to the
uppercase macro name. <code>U_ZERO_ERROR</code>, <code>U_SUCCESS()</code>.</li>
</ul>
</li>
<li>Function declarations need to be of the form <code>U_CAPI return-type
U_EXPORT2</code> to satisfy all compilers' needs.</li>
<li>Functions that roughly compare to constructors and destructors are
called umod_open() and umod_close().
<pre>
U_CAPI UBiDi * U_EXPORT2
ubidi_open();
U_CAPI UBiDi * U_EXPORT2
ubidi_openSized(UTextOffset maxLength, UTextOffset maxRunCount);
U_CAPI void U_EXPORT2
ubidi_close(UBiDi *pBiDi);
</pre>
</li>
<li>In cases like BreakIterator and NumberFormat, instead of having several
different 'open' APIs for each kind of instances, use an enum selector.</li>
<li>File names begin with a "u".</li>
<li>For memory allocation in C implementation files for ICU, the functions/macros
in <code>cmemory.h</code> must be used. </li>
<li>Do not use C++ style comments in C files and in headers that will be included in C files.
Some of the supported platforms are allergic to C++ style comments in C files.</li>
</ul>
<h3>C++ Guidelines</h3>
<ul>
<li>Classes and their members do not need a "U" or any other prefix.</li>
<li>Class APIs need to be declared like <code>class U_I18N_API SimpleDateFormat</code>
or like <code>class U_COMMON_API UCharCharacterIterator</code>.</li>
<li>Class member functions should be only declared, not inline-implemented,
in the class declaration. Inline implementations may follow after the
class declaration in the same file.</li>
<li>We are using <code>XP_PLUSPLUS</code> to make sure the compiler does
C++, not <code>__cplusplus</code>.</li>
<li>We do not use exceptions, and we do not use templates (at least on
the API).</li>
<li>File names do not begin with a "u".</li>
<li>
<h4>Adoption of objects:</h4>
Some constructors and factory functions take pointers to objects that
they <i>adopt</i>. This means that the newly created object will contain
a pointer to the adoptee and takes over ownership and lifecycle control.
If an error occurs while creating the new object - and thus in the code
that adopts an object - then the semantics used within ICU must be <i>adopt-on-call</i>
(as opposed to, e.g., adopt-on-success):
<ul>
<li>General: A constructor or factory function that adopts an object
does so in all cases, even if an error occurs and a <code>UErrorCode</code>
is set. This means that either the adoptee is deleted immediately
or its pointer is stored in the new object. The former case is most
common when the constructor or factory function is called and the
<code>UErrorCode</code> already indicates a failure. In the latter
case, the new object must take care of deleting the adoptee once
it is deleted itself regardless of whether the constructor was successful.</li>
<li>Constructors: The code that creates the object with the <code>new</code>
operator must check the resulting pointer returned by <code>new</code>
and delete any adoptees if it is <code>0</code> because the constructor
was not called. (Typically, a <code>UErrorCode</code> must be set
to <code>U_MEMORY_ALLOCATION_ERROR</code>.)</li>
<li>Factory functions (<code>createInstance()</code>): The factory
function must set a <code>U_MEMORY_ALLOCATION_ERROR</code> and delete
any adoptees if it cannot allocate the new object. If the construction
of the object fails otherwise, then the factory function must delete
it - and it in turn must delete its adoptees. As a result, a factory
function always returns either a valid object and a successful <code>UErrorCode</code>,
or a <code>0</code> pointer and a failure <code>UErrorCode</code>.<br>
Example:
<pre>
Calendar*
Calendar::createInstance(TimeZone* zone, UErrorCode& errorCode) {
if(U_FAILURE(errorCode)) {
delete zone;
return 0;
}
// since the Locale isn't specified, use the default locale
Calendar* c = new GregorianCalendar(zone, Locale::getDefault(), errorCode);
if(c == 0) {
errorCode = U_MEMORY_ALLOCATION_ERROR;
delete zone;
} else if(U_FAILURE(errorCode)) {
delete c;
c = 0;
}
return c;
}
</pre>
</ul>
</li>
<li>For memory allocation in C++ implementation files for ICU, the standard
<code>new</code> and <code>delete</code> operators (or the C functions/macros
in <code>cmemory.h</code>) must be used. </li>
<li>Iterations through <code>for()</code> loops should not use declarations
in the first part of the loop. The scoping of such declarations
has gone through two revisions about this, and some compilers do not
comply to the latest scoping. Declarations of loop variables should be
outside these loops. </li>
</ul>
<a name="addfiles"></a>
<h2 align="center">Adding files to ICU</h2>
<h3>Adding .c, .cpp, and .h files</h3>
<p>In order to add compilable files to ICU, you not only need to add them
to the source code control system in the appropriate folder, but also
add them to the build environment.</p>
<p>The first step is to choose one of the ICU libraries:</p>
<ol>
<li>The <em>common</em> library provides mostly low-level utilities and
basic APIs that often do not make use of Locales. Examples are APIs
that deal with character properties, the Locale APIs themselves, and
ResourceBundle APIs.</li>
<li>The <em>i18n</em> library provides Locale-dependent and -using APIs,
like for collation and formatting, that are most useful for internationalized
user input and output.</li>
</ol>
<p>Put the source code files into the folder <code>icu/source/<i>library-name</i></code>.</p>
<p>Then add them to the build system:
<ul>
<li>
<p>For most platforms, add the expected .o files to <code>icu/source/<i>library-name</i>/Makefile.in</code>,
to the <code>OBJECTS</code> variable.</p>
<p>Add the <i>public</i> header files to the <code>HEADERS</code> variable.</p>
</li>
<li>For Microsoft Visual C++ 6.0, add all the source code files to <code>icu/source/<i>library-name</i>/<i>library-name</i>.dsp</code>.
If you don't have Visual C++, then try to add the filenames to the project
file manually; it is a text file, and this part should be fairly obvious.</li>
</ul>
<p></p>
<p>You also need to add test code to <code>icu/source/test/cintltest</code>
for C APIs and to <code>icu/source/test/intltest</code> for C++ APIs.</p>
<p>All the API functions must be called by the test code (100% API coverage),
and at least 85% of the implementation code should be exercised by the
tests (>=85% code coverage).</p>
<ol>
<li>For C, create test code using the <code>log_err()</code>, <code>log_info()</code>,
and <code>log_verbose()</code> APIs from <code>cintltst.h</code> (which
uses <code>ctest.h</code>), and check it into the appropriate folder.</li>
<li>In order to get your C test code called, you have to add its toplevel
function and a descriptive test module path to the test system by calling
<code>addTest()</code>. The function that makes the call to <code>addTest()</code>
ultimately has to be called by <code>addAllTests()</code> in <code>calltest.c</code>.
Groups of tests typically have a common <code>addGroup()</code> function
that calls <code>addTest()</code> for the test functions in its group,
according to the common part of the test module path.</li>
<li>Add that test code to the build system, too. Modify <code>Makefile.in</code>
and the appropriate <code>.dsp</code> file like for the library code.</li>
</ol>
<a name="tools"></a>
<h2 align="center">Build Tools</h2>
<p>We are using the following tools to build ICU:</p>
<ul>
<li>GNU make version 3.76.1 and up.</li>
<li>GNU cc version 2.95.2 and up. (egcs merged back into gcc.)</li>
<li>autoconf version 2.13 and up.</li>
<li>autoconf needs m4 version 1.4 and up.</li>
<li>Platform-specific compilers as listed with the supported platforms.<br>
(For a complete list of supported platforms, see the <a href="http://oss.software.ibm.com/icu/tech_faq.html">ICU
Technical FAQ</a>.)</li>
<li>On Windows: Microsoft Visual C++ 6.0 with the latest Service Packs.</li>
</ul>
</body>
</html>