ICU-20088 Move User Guide to Markdown

See #919
diff --git a/docs/processes/rules_update.md b/docs/processes/rules_update.md
index 7cf7674..df6bfbd 100644
--- a/docs/processes/rules_update.md
+++ b/docs/processes/rules_update.md
@@ -110,7 +110,7 @@
     For this example, the rule file is `icu4c/source/data/brkitr/rules/char.txt`.
     (If the change is for word or line break, which have multiple rule files for tailorings, only update the root file at this time.)
 
-    Start by looking at how existing similar rules are being handled, and also refer to the ICU user guide section on [Break Rules](http://userguide.icu-project.org/boundaryanalysis/break-rules) for an explanation of rule syntax and behavior.
+    Start by looking at how existing similar rules are being handled, and also refer to the ICU user guide section on [Break Rules](../userguide/boundaryanalysis/break-rules.md) for an explanation of rule syntax and behavior.
 
     The transformation from UAX or CLDR style rules to ICU rules can be non-trivial. Sources of difficulties include:
 
diff --git a/docs/userguide/boundaryanalysis/break-rules.md b/docs/userguide/boundaryanalysis/break-rules.md
new file mode 100644
index 0000000..03dec09
--- /dev/null
+++ b/docs/userguide/boundaryanalysis/break-rules.md
@@ -0,0 +1,437 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Break Rules
+
+## Introduction
+
+ICU locates boundary positions within text by means of rules, which are a form
+of regular expressions. The form of the rules is similar, but not identical,
+to the boundary rules from the Unicode specifications
+[[UAX-14](https://unicode.org/reports/tr14/),
+[UAX-29](https://unicode.org/reports/tr29/)], and there is a reasonably close
+correspondence between the two.
+
+Taken as a set, the ICU rules describe how to move forward to the next boundary,
+starting from a known boundary.
+ICU includes rules for the standard boundary types (word, line, etc.).
+Applications may also create customized break iterators from their own rules.
+
+ICU's built-in rules are located at
+[icu/icu4c/source/data/brkitr/rules/](https://github.com/unicode-org/icu/tree/master/icu4c/source/data/brkitr/rules).
+These can serve as examples when writing your own, and as starting point for
+customizations.
+
+### Rule Tutorial
+
+Rules most commonly describe a range of text that should remain together,
+unbroken. For example, this rule
+
+    [\p{Letter}]+;
+
+matches a run of one or more letters, and would cause them to remain unbroken.
+
+The part within `[`brackets`]` follows normal ICU [UnicodeSet pattern
+syntax](../strings/unicodeset.md).
+
+The qualifier, '`+`' in this case, can be one of
+
+| Qualifier | Meaning                  |
+| --------- | ------------------------ |
+| empty     | Match exactly once       |
+| `?`       | Match zero or one time   |
+| `+`       | Match one or more times  |
+| `*`       | Match zero or more times |
+
+#### Variables
+
+A variable names a set or rule sub-expression. They are useful for documenting
+what something represents, and for simplifying complex expressions by breaking
+them up.
+
+"Variable" is something if a misnomer; they cannot be reassigned, but are more
+of a constant expression.
+
+They start with a '`$`', both in the definition and use.
+
+    # Variable Definition
+    $ASCIILetNum = [A-Za-z0-9];
+    # Variable Use
+    $ASCIILetNum+;
+
+#### Comments and Semicolons
+
+'`#`' begins a comment, which extends to the end of a line.
+
+Comments may stand alone, or appear after another statement on a line.
+
+All rule statements or expressions are terminated by semicolons.
+
+#### Chained Matching
+
+Most ICU rule sets use the concept of "chained matching". The idea is that
+complete match can be composed from multiple pieces, with each piece coming from
+an individual rule of a rule set.
+
+This idea is unique to ICU break rules, it is not a concept found in other
+regular expression based matchers. Some of the Unicode standard break rules
+would be difficult to implement without it.
+
+Starting with an example,
+
+    !!chain;
+    word_char = [\p{Letter}];
+    word_joiner = [_-];
+    $word_char+;
+    $word_char $word_joiner $word_char;
+
+These rules will match "`abc`", "`hello_world`", `"hi-there"`,
+"`a-bunch_of-joiners-here`".
+
+They will not match "`-abc`", "`multiple__joiners`", "`tail-`"
+
+A full match is composed of pieces or submatches, possibly from different rules,
+with adjacent submatches linked by at least one overlapping character.
+
+In the example below, matching "`hello_world`",
+
+* '`1`' shows matches of the first rule, `word_char+`
+
+* '`2`' shows matches of the second rule, `$word_char $word_joiner $word_char`
+
+      hello_world
+      11111 11111
+          222
+
+There is an overlap of the matched regions, which causes the chaining mechanism
+to join them into a single overall match.
+
+The mechanism is a good match to, for example, [Unicode's word break
+rules](http://www.unicode.org/reports/tr29/#Word_Boundary_Rules), where rules
+WB5 through WB13 combine to piece together longer words from multiple short
+segments.
+
+`!!chain;` enables chaining in a rule set. It is disabled by default for back
+compatibility—very old versions of ICU did not support it, and it was
+originally introduced as an option.
+
+#### Parentheses and Alternation
+
+Rule expressions can contain parentheses and '`|`' operators, representing
+alternation or "or" operations. This follows conventional regular expression
+behavior.
+
+For example, the following would match a simplified identifier:
+
+    $Letter ($Letter | $Digit)*;
+
+#### String and Character Literals
+
+Similarly to common regular expressions, literal characters that do not have
+other special meaning represent themselves. So the rule
+
+    Hello;
+
+would match the literal input "`Hello`".
+
+In practice, nearly all break rules are composed from `[`sets`]` based on Unicode
+character properties; literal characters in rules are very rare.
+
+To prevent random typos in rules from being treated as literals, use this
+option:
+
+    !!quoted_literals_only;
+
+With the option, the naked `Hello` becomes a rule syntax error while a quoted
+`"hello"` still matches a literal hello.
+
+`!!quoted_literals_only` is strongly recommended for all rule sets. The random
+typo problem is very real, and surprisingly hard to recognize and debug.
+
+#### Explicit Break Rules
+
+A rule containing a slash (`/`) will force a boundary when it matches, even when
+other rules or chaining would otherwise lead to a longer match. Also called Hard
+Break Rules, these have the form
+
+    pre-context / post-context;
+
+where the pre and post-context look like normal break rules. Both the pre and
+post context are required, and must not allow a zero-length match. There should
+be no overlap between characters that end a match of the pre-context and those
+that begin a match of the post-context.
+
+Chaining into a hard break rule operates normally. There is no chaining out of a
+hard break rule; when the post-context matches a break is forced immediately.
+
+Note: future versions of ICU may loosen the restrictions on explicit break
+rules. The behavior of rules with missing or overlapping contexts is subject to
+change.
+
+#### Chaining Control
+
+Chaining into a rule can be dis-allowed by beginning that rule with a '`^`'. Rules
+so marked can begin a match after a preceding boundary or at the start of text,
+but cannot extend a match via chaining from another rule.
+
+~~The !!LBCMNoChain; statement modifies chaining behavior by preventing chaining
+from one rule to another from occurring on any character whose Line Break
+property is Combining Mark. This option is subject to change or removal, and
+should not be used in general. Within ICU, it is used only with the line break
+rules. We hope to replace it with something more general.~~
+
+> :point_right: **Note**: `!!LBCMNoChain` is deprecated, and will be removed completely from a future
+version of ICU.
+
+## Rule Status Values
+
+Break rules can be tagged with a number, which is called the *rule status*.
+After a boundary has been located, the status number of the specific rule that
+determined the boundary position is available to the application through the
+function `getRuleStatus()`.
+
+For the predefined word boundary rules, status values are available to
+distinguish between boundaries associated with words, numbers, and those around
+spaces or punctuation. Similarly for line break boundaries, status values
+distinguish between mandatory line endings (new line characters) and break
+opportunities that are appropriate points for line wrapping. Refer to the ICU
+API documentation for the C header file `ubrk.h` or to Java class
+`RuleBasedBreakIterator` for a complete list of the predefined boundary
+classifications.
+
+When creating custom sets of break rules, integer status values can be
+associated with boundary rules in whatever way will be convenient for the
+application. There is no need to remain restricted to the predefined values and
+classifications from the standard rules.
+
+It is possible for a set of break rules to contain more than a single rule that
+produces some boundary in an input text. In this event, `getRuleStatus()` will
+return the numerically largest status value from the matching rules, and the
+alternate function `getRuleStatusVec()` will return a vector of the values from
+all of the matching rules.
+
+In the source form of the break rules, status numbers appear at end of a rule,
+and are enclosed in `{`braces`}`.
+
+Hard break rules that also have a status value place the status at the end, for
+example
+
+    pre-context / post-context {1234};
+
+### Word Dictionaries
+
+For some languages that don't normally use spaces between words, break iterators
+are able to supplement the rules with dictionary based breaking. Some languages,
+Thai or Lao, for example, use a dictionary for both word and line breaking.
+Others, such as Japanese, use a dictionary for word breaking, but not for line
+breaking.
+
+To enable dictionary use,
+
+1. The break rules must select, as unbroken chunks, ranges of text to be passed
+   off to the word dictionary for further subdivision.
+2. The break rules must define a character class named `$dictionary` that
+   contains the characters (letters) to be handled by the dictionary.
+
+The dictionary implementation, on receiving a range of text, will map it to a
+specific dictionary based on script, and then delegate to that dictionary for
+subdividing the range into words.
+
+See, for example, this snippet from the [line break
+rules](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/brkitr/rules/line.txt):
+
+    #   Dictionary character set, for triggering language-based break engines. Currently
+    #   limited to LineBreak=Complex_Context (SA).
+    $dictionary = [$SA];
+
+## Rule Options
+
+| Option          | Description |
+| --------------- | ----------- |
+| `!!chain`       |  Enable rule chaining. Default is no chaining. |
+| `!!forward`     |  The rules that follow are for forward iteration. Forward rules are now the only type of rules needed or used.   |
+
+### Deprecated Rule Options
+
+| Deprecated Option          | Description |
+| --------------- | ----------- |
+| ~~`!!reverse`~~     | ~~*[deprecated]* The rules that follow are for reverse iteration. No longer needed; any rules in a Reverse rule section are ignored.~~ |
+| ~~`!!safe_forward`~~ | ~~*[deprecated]* The rules that follow are for safe forward iteration. No longer needed; any rules in such a section are ignored.~~ |
+| ~~`!!safe_reverse`~~ | ~~*[deprecated]* The rules that follow are for safe reverse iteration. No longer needed; any rules in such a section are ignored.~~ |
+| ~~`!!LBCMNoChain`~~ | ~~*[deprecated]* Disable chaining when the overlap character matches `\p{Line_Break=Combining_Mark}`~~ |
+
+## Rule Syntax
+
+Here is the syntax for the boundary rules. (The EBNF Syntax is given below.)
+
+| Rule Name | Rule Values | Notes |
+| ---------- | ----------- | ----- |
+| rules | statement+ | |
+| statement | assignment \| rule \| control |
+| control | (`!!forward` \| `!!reverse` \| `!!safe_forward` \| `!!safe_reverse` \| `!!chain`) `;`
+| assignment | variable `=` expr `;` | 5 |
+| rule | `^`? expr (`{`number`}`)? `;` | 8,9 |
+| number | [0-9]+ | 1 |
+| break-point | `/` | 10 |
+| expr | expr-q \| expr `\|` expr \| expr expr | 3 |
+| expr-q | term \| term `*` \| term `?` \| term `+` |
+| term | rule-char \| unicode-set \| variable \| quoted-sequence \| `(` expr `)` \| break-point |
+| rule-special | *any printing ascii character except letters or numbers* \| white-space |
+| rule-char | *any non-escaped character that is not rule-special* \| `.` \| *any escaped character except* `\p` *or* `\P` |
+| variable | `$` name-start-char name-char* | 7 |
+| name-start-char | `_` \| \p{L} |
+| name-char | name-start-char \| \\p{N} |
+| quoted-sequence | `'` *(any char except single quote or line terminator or two adjacent single quotes)*+ `'` |
+| escaped-char | *See “Character Quoting and Escaping” in the [UnicodeSet](../strings/unicodeset.md) chapter* |
+| unicode-set | See [UnicodeSet](../strings/unicodeset.md) | 4 |
+| comment | unescaped `#` *(any char except new-line)** new-line | 2 |
+| s | unescaped \p{Z}, tab, LF, FF, CR, NEL | 6 |
+| new-line | LF, CR, NEL | 2 |
+
+### Rule Syntax Notes
+
+1. The number associated with a rule that actually determined a break position
+   is available to the application after the break has been returned. These
+   numbers are *not* Perl regular expression repeat counts.
+
+2. Comments are recognized and removed separately from otherwise parsing the
+   rules. They may appear wherever a space would be allowed (and ignored.)
+
+3. The implicit concatenation of adjacent terms has higher precedence than the
+   `|` operation. "`ab|cd`" is interpreted as "`(ab)|(cd)`", not as "`a(b|c)d`" or
+   "`(((ab)|c)d)`"
+
+4. The syntax for [unicode-set](../strings/unicodeset.md) is defined (and parsed) by the `UnicodeSet` class.
+   It is not repeated here.
+
+5. For `$`variables that will be referenced from inside of a `UnicodeSet`, the
+   definition must consist only of a Unicode Set. For example, when variable `$a`
+   is used in a rule like `[$a$b$c]`, then this definition of `$a` is ok:
+   “`$a=[:Lu:];`” while this one “`$a=abcd;`” would cause an error when `$a` was
+   used.
+
+6. Spaces are allowed nearly anywhere, and are not significant unless escaped.
+   Exceptions to this are noted.
+
+7. No spaces are allowed within a variable name. The variable name `$dictionary`
+   is special. If defined, it must be a Unicode Set, the characters of which
+   will trigger the use of word dictionary based boundaries.
+
+8. A leading `^` on a rule prevents chaining into that rule. It can only match
+   immediately after a preceding boundary, or at the start of text.
+
+9. `{`nnn`}` appearing at the end of a rule is a Rule Status number, not a repeat
+   count as it would be with conventional regular expression syntax.
+
+10. A `/` in a rule specifies a hard break point. If the rule matches, a
+    boundary will be forced at the position of the `/` within the match.
+
+### EBNF Syntax used for the RBBI rules syntax description
+
+| syntax | description |
+| -- | ------------------------- |
+| a? | zero or one instance of a |
+| a+ | one or more instances of a |
+| a* | zero or more instances of a |
+| a \| b | either a or b, but not both |
+| `a` "`a`" | the literal string between the quotes or displayed as `monospace` |
+
+## Planned Changes and Removed or Deprecated Rule Features
+
+1. Reverse rules could formerly be indicated by beginning them with an
+   exclamation `!`. This syntax is deprecated, and will be removed from a
+   future version of ICU.
+
+2. `!!LBCMNoChain` was a global option that specified that characters with the
+   line break property of "Combining Character" would not participate in rule
+   chaining. This option was always considered internal, is deprecated and will
+   be removed from a future version of ICU.
+
+3. Naked rule characters. Plain text, in the context of a rule, is treated as
+   literal text to be matched, much like normal regular expressions. This turns
+   out to be very error prone, has been the source of bugs in released versions
+   of ICU, and is not useful in implementing normal text boundary rules. A
+   future version will reject literal text that is not escaped.
+
+4. Exact reverse rules and safe forward rules: planned changes to the break
+   engine implementation will remove the need for exact reverse rules and safe
+   forward rules.
+
+5. `{bof}` and `{eof}`, appearing within `[`sets`]`, match the beginning or ending of
+   the input text, respectively. This is an internal (not documented) feature
+   that will probably be removed in a future version of ICU. They are currently
+   used by the standard rules for word, line and sentence breaking. An
+   alternative is probably needed. The existing implementation is incomplete.
+
+## Additional Sample Code
+
+**C/C++**: See
+[icu/source/samples/break/](https://github.com/unicode-org/icu/tree/master/icu4c/source/samples/break/)
+in the ICU source distribution for code samples showing the use of ICU boundary
+analysis.
+
+## Details about Dictionary-Based Break Iteration
+
+> :point_right: **Note**: This section originally from August 2012.
+> It is probably out of date, for example `brkfiles.mk` does not exist anyore.
+
+Certain Unicode characters have a "dictionary" bit set in the break iteration
+rules, and text made up of these characters cannot be handled by the rules-based
+break iteration code for lines or words. Rather, they must be handled by a
+dictionary-based approach. The ICU approach is as follows:
+
+Once the Dictionary bit is detected, the set of characters with that bit is
+handed off to "dictionary code." This code then inspects the characters more
+carefully, and splits them by script (Thai, Khmer, Chinese, Japanese, Korean).
+If text in this script has not yet been handled, it loads the appropriate
+dictionary from disk, and initializes a specialized "BreakEngine" class for that
+script.
+
+There are three such specialized classes: Thai, Khmer and CJK.
+
+Thai and Khmer use very similar approaches. They look through a dictionary that
+is not weighted by word frequency, and attempt to find the longest total "match"
+that can be made in the text.
+
+For Chinese and Japanese text, on the other hand, we have a unified dictionary
+(due to the fact that both use some of the same characters, it is difficult to
+distinguish them) that contains information about word frequencies. The
+algorithm to match text then uses dynamic programming to find the set of breaks
+it considers "most likely" based on the frequency of the words created by the
+breaks. This algorithm could also be used for Thai and Khmer, but we do not have
+sufficient data to do so. This algorithm could also be used for Korean, but once
+again we do not have the data to do so.
+
+Code of interest is in `source/common/dictbe.{h, cpp}`, `source/common/brkeng.{h,
+cpp}`, `source/common/dictionarydata.{h, cpp}`. The dictionaries use the `BytesTrie`
+and `UCharsTrie` as their data store. The binary form of these dictionaries is
+produced by the `gendict` tool, which has source in `source/tools/gendict`.
+
+In order to add new dictionary implementations, a few changes have to be made.
+First, you should create a new subclass of `DictionaryBreakEngine` or
+`LanguageBreakEngine` in `dictbe.cpp` that implements your algorithm. Then, in
+`brkeng.cpp`, you should add logic to create this dictionary break engine if we
+strike the appropriate script - which should only be 3 or so lines of code at
+the most. Lastly, you should add the correct data file. If your data is to be
+represented as a `.dict` file - as is recommended, and in fact required if you
+don't want to make substantial code changes to the engine loader - you need to
+simply add a file in the correct format for gendict to the `source/data/brkitr`
+directory, and add its name to the list of `BRK_DICT_SOURCE` in
+`source/data/brkitr/brkfiles.mk`. This will cause your dictionary (say, `foo.txt`)
+to be added as a `UCharsTrie` dictionary with the name foo.dict. If you want your
+dictionary to be a `BytesTrie` dictionary, you will need to specify a transform
+within the `Makefile`. To do so, find the part of `source/data/Makefile.in` and
+`source/data/makedata.mak` that deals with `thaidict.dict` and `khmerdict.dict` and
+add a similar set of lines for your script. Lastly, in
+`source/data/brkitr/root.txt`, add a line to the dictionaries `{}` section of the
+form:
+
+    shortscriptname:process(dependency){"dictionaryname.dict"}
+
+For example, for Katakana:
+
+    Kata:process(dependency){"cjdict.dict"}
+
+Make sure to add appropriate tests for the new implementation.
diff --git a/docs/userguide/boundaryanalysis/index.md b/docs/userguide/boundaryanalysis/index.md
new file mode 100644
index 0000000..3003c5b
--- /dev/null
+++ b/docs/userguide/boundaryanalysis/index.md
@@ -0,0 +1,529 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Boundary Analysis
+
+## Overview of Text Boundary Analysis
+
+Text boundary analysis is the process of locating linguistic boundaries while
+formatting and handling text. Examples of this process include:
+
+1. Locating appropriate points to word-wrap text to fit within specific margins
+   while displaying or printing.
+
+2. Locating the beginning of a word that the user has selected.
+
+3. Counting characters, words, sentences, or paragraphs.
+
+4. Determining how far to move the text cursor when the user hits an arrow key
+    (Some characters require more than one position in the text store and some
+    characters in the text store do not display at all).
+
+5. Making a list of the unique words in a document.
+
+6. Figuring out if a given range of text contains only whole words.
+
+7. Capitalizing the first letter of each word.
+
+8. Locating a particular unit of the text (For example, finding the third word
+    in the document).
+
+The `BreakIterator` classes were designed to support these kinds of tasks. The
+BreakIterator objects maintain a location between two characters in the text.
+This location will always be a text boundary. Clients can move the location
+forward to the next boundary or backward to the previous boundary. Clients can
+also check if a particular location within a source text is on a boundary or
+find the boundary which is before or after a particular location.
+
+## Four Types of BreakIterator
+
+ICU `BreakIterator`s can be used to locate the following kinds of text boundaries:
+
+1. Character Boundary
+
+2. Word Boundary
+
+3. Line-break Boundary
+
+4. Sentence Boundary
+
+Each type of boundary is found in accordance with the rules specified by Unicode
+Standard Annex #29, *Unicode Text Segmentation*
+(<https://unicode.org/reports/tr29/> ) or Unicode Standard Annex #14, *Unicode
+Line Breaking Algorithm* (<https://unicode.org/reports/tr14/>)
+
+### Character Boundary
+
+The character-boundary iterator locates the boundaries according to the rules
+defined in <http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>.
+These boundaries try to match what a user would think of as a "character"—a
+basic unit of a writing system for a language—which may be more than just a
+single Unicode code point.
+
+The letter `Ä`, for example, can be represented in Unicode either with a single
+code-point value or with two code-point values (one representing the `A` and
+another representing the umlaut `¨`). The character-boundary iterator will treat
+either representation as a single character.
+
+End-user characters, as described above, are also called grapheme clusters, in
+an attempt to limit the confusion caused by multiple meanings for the word
+"character".
+
+### Word Boundary
+
+The word-boundary iterator locates the boundaries of words, for purposes such as
+double click selection or "Find whole words" operations.
+
+Words boundaries are identified according to the rules in
+<https://www.unicode.org/reports/tr29/#Word_Boundaries>, supplemented by a word
+dictionary for text in Chinese, Japanese, Thai or Khmer. The rules used for
+locating word breaks take into account the alphabets and conventions used by
+different languages.
+
+Here's an example of a sentence, showing the boundary locations that will be
+identified by a word break iterator:
+
+> :point_right: **Note**: TODO: An example needs to be added here.
+
+### Line-break Boundary
+
+The line-break iterator locates positions that would be appropriate points to
+wrap lines when displaying the text. The boundary rules are define here:
+<https://www.unicode.org/reports/tr14/>
+
+This example shows the differences in the break locations produced by word and
+line break iterators:
+
+> :point_right: **Note**: TODO: An example needs to be added here.
+
+### Sentence Boundary
+
+A sentence-break iterator locates sentence boundaries according to the rules
+defined here: <https://www.unicode.org/reports/tr29/#Sentence_Boundaries>
+
+## Dictionary-Based BreakIterator
+
+Some languages are written without spaces, and word and line breaking requires
+more than rules over character sequences. ICU provides dictionary support for
+word boundaries in Chinese, Japanese, Thai, Lao, Khmer and Burmese.
+
+Use of the dictionaries is automatic when text in one of the dictionary
+languages is encountered. There is no separate API, and no extra programming
+steps required by applications making use of the dictionaries.
+
+## Usage
+
+To locate boundaries in a document, create a BreakIterator using the
+`BreakIterator::create***Instance` family of methods in C++, or the `ubrk_open()`
+function (C), where "`***`" is `Character`, `Word`, `Line` or `Sentence`,
+depending on the type of iterator wanted. These factory methods also take a
+parameter that specifies the locale for the language of the text to be processed.
+
+When creating a `BreakIterator`, a locale is also specified, and the behavior of
+the BreakIterator obtained may be specialized in some way for that locale. For
+most locales the default break iterator behavior is used.
+
+Applications also may register customized BreakIterators for use in specific
+locales. Once such a break iterator has been registered, any requests for break
+iterators for that locale will return copies of the registered break iterator.
+
+ICU may cache service instances. Therefore, registration should be done during
+startup, before opening services by locale ID.
+
+In the general-usage-model, applications will use the following basic steps to
+analyze a piece of text for boundaries:
+
+1. Create a `BreakIterator` with the desired behavior
+
+2. Use the `setText()` method to set the iterator to analyze a particular piece
+   of text.
+
+3. Locate the desired boundaries using the appropriate combination of `first()`,
+   `last()`, `next()`, `previous()`, `preceding()`, and `following()` methods.
+
+The `setText()` method can be called more than once, allowing reuse of a
+BreakIterator on new pieces of text. Because the creation of a `BreakIterator` can
+be relatively time-consuming, it makes good sense to reuse them when practical.
+
+The iterator always points to a boundary position between two characters. The
+numerical value of the position, as returned by `current()` is the zero-based
+index of the character following the boundary. Thus a position of zero
+represents a boundary preceding the first character of the text, and a position
+of one represents a boundary between the first and second characters.
+
+The `first()` and `last()` methods reset the iterator's current position to the
+beginning or end of the text (the beginning and the end are always considered
+boundaries). The `next()` and `previous()` methods advance the iterator one boundary
+forward or backward from the current position. If the `next()` or `previous()`
+methods run off the beginning or end of the text, it returns DONE. The `current()`
+method returns the current position.
+
+The `following()` and `preceding()` methods are used for random access, to move the
+iterator to an arbitrary position within the text. Since a BreakIterator always
+points to a boundary position, the `following()` and `preceding()` methods will
+never set the iterator to point to the position specified by the caller (even if
+it is, in fact, a boundary position). `BreakIterator` will, however, set the
+iterator to the nearest boundary position before or after the specified
+position.
+
+`isBoundary()` returns true if the specified position is a boundary.
+
+### Thread Safety
+
+`BreakIterator`s are not thread safe. This is inherit in their design—break
+iterators are stateful, holding a reference to and position in the text, meaning
+that a single instance cannot operate in parallel on multiple texts.
+
+For concurrent break iteration, each thread must use its own break iterator.
+These can be obtained by creating separate break iterators of the desired type,
+or by initially creating a master break iterator and then creating a clone for
+each thread.
+
+### Line Breaking Strictness, a CSS Property
+
+CSS has the concept of "[Line Breaking
+Strictness](https://www.w3.org/TR/css-text-3/#line-break-property)". This
+property specifies the strictness of line-breaking rules applied within an
+element: especially how wrapping interacts with punctuation and symbols. ICU
+line break iterators can choose a strictness using locale tags:
+
+| Locale       | Behavior    |
+| ------------ | ----------- |
+| `en@lb=strict` <br/> `ja@lb=strict`  | Breaks text using the most stringent set of line-breaking rules |
+| `en@lb=normal` <br/> `ja@lb=normal`  | Breaks text using the most common set of line-breaking rules. |
+| `en@lb=loose`  <br/> `ja@lb=loose`   | Breaks text using the least restrictive set of line-breaking rules. Typically used for short lines, such as in newspapers. |
+
+### Sentence Break Filters
+
+Sentence breaking can return false positives - an indication that sentence ends
+in an incorrect position - in the presence of abbreviations. For example,
+consider the sentence
+
+> In the meantime Mr. Weston arrived with his small ship.
+
+Default sentence break shows a false boundary following the "Mr."
+
+ICU includes lists of common abbreviations that can be used to filter, to
+ignore, these false sentence boundaries. Filtering is enabled by the presence of
+the `ss` locale tag when creating the break iterator.
+
+| Locale           | Behavior                                                |
+| ---------------- | ------------------------------------------------------- |
+| `en`             |  no filtering                                           |
+| `en@ss=standard` |  Filter based on common English language abbreviations. |
+| `es@ss=standard` |  Filter with common Spanish abbreviations.              |
+
+Abbreviation lists are available (as of ICU 64) for English, German, Spanish,
+French, Italian and Portuguese.
+
+## Accuracy
+
+ICU's break iterators are based on the default boundary rules described in the
+Unicode Standard Annexes [14](https://www.unicode.org/reports/tr14/) and
+[29](https://www.unicode.org/unicode/reports/tr29/) . These are relatively
+simple boundary rules that can be implemented efficiently, and are sufficient
+for many purposes and languages. However, some languages and applications will
+require a more sophisticated linguistic analysis of the text in order to find
+boundaries with good accuracy. Such an analysis is not directly available from
+ICU at this time.
+
+Break Iterators based on custom, user-supplied boundary rules can be created and
+used by applications with requirements that are not met by the standard default
+boundary rules.
+
+## BreakIterator Boundary Analysis Examples
+
+### Print out all the word-boundary positions in a UnicodeString
+
+**In C++:**
+
+```c++
+void listWordBoundaries(const UnicodeString& s) {
+    UErrorCode status = U_ZERO_ERROR;
+    BreakIterator* bi = BreakIterator::createWordInstance(Locale::getUS(), status);
+    bi->setText(s);
+    int32_t p = bi->first();
+    while (p != BreakIterator::DONE) {
+        printf("Boundary at position %d\n", p);
+        p = bi->next();
+    }
+    delete bi;
+}
+```
+
+**In C:**
+
+```c
+void listWordBoundaries(const UChar* s, int32_t len) {
+    UBreakIterator* bi;
+    int32_t p;
+    UErrorCode err = U_ZERO_ERROR;
+    bi = ubrk_open(UBRK_WORD, 0, s, len, &err);
+    if (U_FAILURE(err)) return;
+    p = ubrk_first(bi);
+    while (p != UBRK_DONE) {
+        printf("Boundary at position %d\n", p);
+        p = ubrk_next(bi);
+    }
+    ubrk_close(bi);
+}
+```
+
+### Get the boundaries of the word that contains a double-click position
+
+**In C++:**
+
+```c++
+void wordContaining(BreakIterator& wordBrk,
+        int32_t idx,
+        const UnicodeString& s,
+        int32_t& start,
+        int32_t& end) {
+    // this function is written to assume that we have an
+    // appropriate BreakIterator stored in an object or a
+    // global variable somewhere-- When possible, programmers
+    // should avoid having the create() and delete calls in
+    // a function of this nature.
+    if (s.isEmpty())
+        return;
+    wordBrk.setText(s);
+    start = wordBrk.preceding(idx + 1);
+    end = wordBrk.next();
+    // NOTE: for this and similar operations, use preceding() and next()
+    // as shown here, not following() and previous(). preceding() is
+    // faster than following() and next() is faster than previous()
+    // NOTE: By using preceding(idx + 1) above, we're adopting the convention
+    // that if the double-click comes right on top of a word boundary, it
+    // selects the word that _begins_ on that boundary (preceding(idx) would
+    // instead select the word that _ends_ on that boundary).
+}
+```
+
+**In C:**
+
+```c
+void wordContaining(UBreakIterator* wordBrk,
+    int32_t idx,
+    const UChar* s,
+    int32_t sLen,
+    int32_t* start,
+    int32_t* end,
+    UErrorCode* err) {
+    if (wordBrk == NULL || s == NULL || start == NULL || end == NULL) {
+        *err = U_ILLEGAL_ARGUMENT_ERROR;
+        return;
+    }
+    ubrk_setText(wordBrk, s, sLen, err);
+    if (U_SUCCESS(*err)) {
+        *start = ubrk_preceding(wordBrk, idx + 1);
+        *end = ubrk_next(wordBrk);
+    }
+}
+```
+
+### Check for Whole Words
+
+Use the following to check if a range of text is a "whole word":
+
+**In C++:**
+
+```c++
+UBool isWholeWord(BreakIterator& wordBrk,
+    const UnicodeString& s,
+    int32_t start,
+    int32_t end) {
+    if (s.isEmpty())
+        return FALSE;
+    wordBrk.setText(s);
+    if (!wordBrk.isBoundary(start))
+        return FALSE;
+    return wordBrk.isBoundary(end);
+}
+```
+
+**In C:**
+
+```c
+UBool isWholeWord(UBreakIterator* wordBrk,
+    const UChar* s,
+    int32_t sLen,
+    int32_t start,
+    int32_t end,
+    UErrorCode* err) {
+    UBool result = FALSE;
+    if (wordBrk == NULL || s == NULL) {
+        *err = U_ILLEGAL_ARGUMENT_ERROR;
+        return FALSE;
+    }
+    ubrk_setText(wordBrk, s, sLen, err);
+    if (U_SUCCESS(*err)) {
+        result = ubrk_isBoundary(wordBrk, start) && ubrk_isBoundary(wordBrk, end);
+    }
+    return result;
+}
+```
+
+Count the words in a document (C++ only):
+
+```c++
+int32_t containsLetters(RuleBasedBreakIterator& bi, const UnicodeString& s, int32_t start) {
+    bi.setText(s);
+    int32_t count = 0;
+    while (start != BreakIterator::DONE) {
+        int breakType = bi.getRuleStatus();
+        if (breakType != UBRK_WORD_NONE) {
+            // Exclude spaces, punctuation, and the like.
+            // A status value UBRK_WORD_NONE indicates that the boundary does
+            // not start a word or number.
+            //
+            ++count;
+        }
+        start = bi.next();
+    }
+    return count;
+}
+```
+
+The function `getRuleStatus()` returns an enum giving additional information on
+the text preceding the last break position found. Using this value, it is
+possible to distinguish between numbers, words, words containing kana
+characters, words containing ideographic characters, and non-word characters,
+such as spaces or punctuation. The sample uses the break status value to filter
+out, and not count, boundaries associated with non-word characters.
+
+### Word-wrap a document (C++ only)
+
+The sample function below wraps a paragraph so that each line is less than or
+equal to 72 characters. The function fills in an array passed in by the caller
+with the starting offsets of
+each line in the document. Also, it fills in a second array to track how many
+trailing white space characters there are in the line. For simplicity, it is
+assumed that an outside process has already broken the document into paragraphs.
+For example, it is assumed that every string the function is passed has a single
+newline at the end only.
+
+```c++
+int32_t wrapParagraph(const UnicodeString& s,
+                   const Locale& locale,
+                   int32_t lineStarts[],
+                   int32_t trailingwhitespace[],
+                   int32_t maxLines,
+                   UErrorCode &status) {
+
+    int32_t        numLines = 0;
+    int32_t        p, q;
+    const int32_t MAX_CHARS_PER_LINE = 72;
+    UChar          c;
+
+    BreakIterator *bi = BreakIterator::createLineInstance(locale, status);
+    if (U_FAILURE(status)) {
+        delete bi;
+        return 0;
+    }
+    bi->setText(s);
+
+
+    p = 0;
+    while (p < s.length()) {
+        // jump ahead in the paragraph by the maximum number of
+        // characters that will fit
+        q = p + MAX_CHARS_PER_LINE;
+
+        // if this puts us on a white space character, a control character
+        // (which includes newlines), or a non-spacing mark, seek forward
+        // and stop on the next character that is not any of these things
+        // since none of these characters will be visible at the end of a
+        // line, we can ignore them for the purposes of figuring out how
+        // many characters will fit on the line)
+        if (q < s.length()) {
+            c = s[q];
+            while (q < s.length()
+                   && (u_isspace(c)
+                       || u_charType(c) == U_CONTROL_CHAR
+                       || u_charType(c) == U_NON_SPACING_MARK
+            )) {
+                ++q;
+                c = s[q];
+            }
+        }
+
+        // then locate the last legal line-break decision at or before
+        // the current position ("at or before" is what causes the "+ 1")
+        q = bi->preceding(q + 1);
+
+        // if this causes us to wind back to where we started, then the
+        // line has no legal line-break positions. Break the line at
+        // the maximum number of characters
+        if (q == p) {
+            p += MAX_CHARS_PER_LINE;
+            lineStarts[numLines] = p;
+            trailingwhitespace[numLines] = 0;
+            ++numLines;
+        }
+        // otherwise, we got a good line-break position. Record the start of this
+        // line (p) and then seek back from the end of this line (q) until you find
+        // a non-white space character (same criteria as above) and
+        // record the number of white space characters at the end of the
+        // line in the other results array
+        else {
+            lineStarts[numLines] = p;
+            int32_t nextLineStart = q;
+
+            for (q--; q > p; q--) {
+                c = s[q];
+                if (!(u_isspace(c)
+                       || u_charType(c) == U_CONTROL_CHAR
+                       || u_charType(c) == U_NON_SPACING_MARK)) {
+                    break;
+                }
+            }
+            trailingwhitespace[numLines] = nextLineStart - q -1;
+            p = nextLineStart;
+           ++numLines;
+        }
+        if (numLines >= maxLines) {
+            break;
+        }
+    }
+    delete bi;
+    return numLines;
+}
+```
+
+Most text editors would not break lines based on the number of characters on a
+line. Even with a monospaced font, there are still many Unicode characters that
+are not displayed and therefore should be filtered out of the calculation. With
+a proportional font, character widths are added up until a maximum line width is
+exceeded or an end of the paragraph marker is reached.
+
+Trailing white space does not need to be counted in the line-width measurement
+because it does not need to be displayed at the end of a line. The sample code
+above returns an array of trailing white space values because an external
+rendering process needs to be able to measure the length of the line (without
+the trailing white space) to justify the lines. For example, if the text is
+right-justified, the invisible white space would be drawn outside the margin.
+The line would actually end with the last visible character.
+
+In either case, the basic principle is to jump ahead in the text to the location
+where the line would break (without taking word breaks into account). Then, move
+backwards using the preceding() method to find the last legal breaking position
+before that location. Iterating straight through the text with next() method
+will generally be slower.
+
+## ICU BreakIterator Data Files
+
+The source code for the ICU break rules for the standard boundary types is
+located in the directory
+[icu4c/source/data/brkitr/rules](https://github.com/unicode-org/icu/tree/master/icu4c/source/data/brkitr/rules).
+These files will be built, and the corresponding binary state tables
+incorporated into ICU's data, by the standard ICU4C build process.
+
+The dictionary word lists used by word break, and for some languages, line break
+are in
+[icu4c/source/data/brkitr/dictionaries](https://github.com/unicode-org/icu/tree/master/icu4c/source/data/brkitr/dictionaries).
+
+The same data is used by both ICU4C and ICU4J. In the normal ICU build process,
+the source data is processed into a binary form using ICU4C, and the resulting
+binary tables are incorporated into ICU4J.
diff --git a/docs/userguide/collation/api.md b/docs/userguide/collation/api.md
new file mode 100644
index 0000000..36d979d
--- /dev/null
+++ b/docs/userguide/collation/api.md
@@ -0,0 +1,696 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Collation API Details
+
+This section describes some of the usage conventions for the ICU Collation
+Service API.
+
+## Collator Instantiation
+
+To use the Collation Service, you must instantiate a `Collator`. The
+Collator defines the properties and behavior of the sort ordering. The Collator
+can be repeatedly referenced until all collation activities have been performed.
+The Collator can then be closed and removed.
+
+### Instantiating the Predefined Collators
+
+ICU comes with a large set of already predefined collators that are suited for
+specific locales. Most of the ICU locales have a predefined collator. In the worst
+case, the CLDR default set of rules,
+which is mostly equivalent to the UCA default ordering (DUCET), is used.
+The default sort order itself is designed to work well for many languages.
+(For example, there are no tailorings for the standard sort orders for
+English, German, French, etc.)
+
+To instantiate a predefined collator, use the APIs `ucol_open`, `createInstance` and
+`getInstance` for C, C++ and Java codes respectively. The C API takes a locale ID
+(or language tag) string argument, C++ takes a Locale object, and Java takes a
+Locale or ULocale.
+
+For some languages, multiple collation types are available; for example,
+"de-u-co-phonebk" / "de@collation=phonebook". They can be enumerated via
+`Collator::getKeywordValuesForLocale()`. See also the list of available collation
+tailorings in the online [ICU Collation
+Demo](http://demo.icu-project.org/icu-bin/collation.html).
+
+Starting with ICU 54, collation attributes can be specified via locale keywords
+as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in
+language tag syntax ("el-u-kf-upper"). Keywords and values are case-insensitive.
+
+See the [LDML Collation spec, Collation
+Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings),
+and the [data
+file](https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml) listing
+the valid collation keywords and their values. (The deprecated attributes
+kh/colHiraganaQuaternary and vt/variableTop are not supported.)
+
+For the [old locale extension
+syntax](http://www.unicode.org/reports/tr35/tr35.html#Old_Locale_Extension_Syntax),
+the data file's alias names are used (first alias, if defined, otherwise the
+name): "de@collation=phonebook;colCaseLevel=yes;kv=space"
+
+For the language tag syntax, the non-alias names are used, and "true" values can
+be omitted: "de-u-co-phonebk-kc-kv-space"
+
+This example demonstrates the instantiation of a collator.
+
+**C:**
+
+```C
+UErrorCode status = U_ZERO_ERROR;
+UCollator *coll = ucol_open("en_US", &status);
+if(U_SUCCESS(status)) {
+    /* close the collator*/
+    ucol_close(coll);
+}
+```
+
+**C++:**
+
+```C++
+UErrorCode status = U_ZERO_ERROR;
+Collator *coll = Collator::createInstance(Locale("en", "US"), status);
+if(U_SUCCESS(status)) {
+    //close the collator
+    delete coll;
+}
+```
+
+**Java:**
+
+```Java
+Collator col = null;
+try {
+    col = Collator.getInstance(Locale.US);
+} catch (Exception e) {
+    System.err.println("English collation creation failed.");
+    e.printStackTrace();
+}
+```
+
+### Instantiating Collators Using Custom Rules
+
+If the ICU predefined collators are not appropriate for your intended usage, you
+can
+define your own set of rules and instantiate a collator that uses them. For more
+details, please see [the section on collation
+customization](customization/index.md).
+
+This example demonstrates the instantiation of a collator.
+
+**C:**
+
+```C
+UErrorCode status = U_ZERO_ERROR;
+U_STRING_DECL(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52);
+UCollator *coll;
+
+U_STRING_INIT(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52);
+coll = ucol_openRules(rules, -1, UCOL_ON, UCOL_DEFAULT_STRENGTH, NULL, &status);
+if(U_SUCCESS(status)) {
+    /* close the collator*/
+    ucol_close(coll);
+}
+```
+
+**C++:**
+
+```C++
+UErrorCode status = U_ZERO_ERROR;
+UnicodeString rules(u"&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E");
+Collator *coll = new RuleBasedCollator(rules, status);
+if(U_SUCCESS(status)) {
+    //close the collator
+    delete coll;
+}
+```
+
+**Java:**
+
+```Java
+RuleBasedCollator coll = null;
+String ruleset = "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E";
+try {
+    coll = new RuleBasedCollator(ruleset);
+} catch (Exception e) {
+    System.err.println("Customized collation creation failed.");
+    e.printStackTrace();
+}
+```
+
+## Compare
+
+Two of the most used functions in ICU collation API, `ucol_strcoll` and `ucol_getSortKey`, have their counterparts in both Win32 and ANSI APIs:
+
+ICU C             | ICU C++                     | ICU Java                   | ANSI/POSIX | WIN32
+----------------- | --------------------------- | -------------------------- | ---------- | -----
+`ucol_strcoll`    | `Collator::compare`         | `Collator.compare`         | `strcoll`  | `CompareString`
+`ucol_getSortKey` | `Collator::getSortKey`      | `Collator.getCollationKey` | `strxfrm`  | `LCMapString`
+&nbsp;            | `Collator::getCollationKey` | &nbsp;                     | &nbsp;     |
+
+For more sophisticated usage, such as user-controlled language-sensitive text
+searching, an iterating interface to collation is provided. Please refer to the
+section below on `CollationElementIterator` for more details.
+
+The `ucol_compare` function compares one pair of strings at a time. Comparing two
+strings is much faster than calculating sort keys for both of them. However, if
+comparisons should be done repeatedly on a very large number of strings, generating
+and storing sort keys can improve performance. In all other cases (such as quick
+sort or bubble sort of a
+moderately-sized list of strings), comparing strings works very well.
+
+The C API used for comparing two strings is `ucol_strcoll`. It requires two
+`UChar *` strings and their lengths as parameters, as well as a pointer to a valid
+`UCollator` instance. The result is a `UCollationResult` constant, which can be one
+of `UCOL_LESS`, `UCOL_EQUAL` or `UCOL_GREATER`.
+
+The C++ API offers the method `Collator::compare` with several overloads.
+Acceptable input arguments are `UChar *` with length of strings, or `UnicodeString`
+instances. The result is a member of the `UCollationResult` or `EComparisonResult` enums.
+
+The Java API provides the method `Collator.compare` with one overload. Acceptable
+input arguments are Strings or Objects. The result is an int value, which is
+less than zero if source is less than target, zero if source and target are
+equal, or greater than zero if source is greater than target.
+
+There are also several convenience functions and methods returning a boolean
+value, such as `ucol_greater`, `ucol_greaterOrEqual`, `ucol_equal` (in C)
+`Collator::greater`, `Collator::greaterOrEqual`, `Collator::equal` (in C++) and
+`Collator.equals` (in Java).
+
+### Examples
+
+**C:**
+
+```C
+UChar *s [] = { /* list of Unicode strings */ };
+uint32_t listSize = sizeof(s)/sizeof(s[0]);
+UErrorCode status = U_ZERO_ERROR;
+UCollator *coll = ucol_open("en_US", &status);
+uint32_t i, j;
+if(U_SUCCESS(status)) {
+  for(i=listSize-1; i>=1; i--) {
+    for(j=0; j<i; j++) {
+      if(ucol_strcoll(s[j], -1, s[j+1], -1) == UCOL_LESS) {
+        swap(s[j], s[j+1]);
+     }
+   }
+}
+ucol_close(coll);
+}
+```
+
+**C++:**
+
+```C++
+UnicodeString s [] = { /* list of Unicode strings */ };
+uint32_t listSize = sizeof(s)/sizeof(s[0]);
+UErrorCode status = U_ZERO_ERROR;
+Collator *coll = Collator::createInstance(Locale("en", "US"), status);
+uint32_t i, j;
+if(U_SUCCESS(status)) {
+  for(i=listSize-1; i>=1; i--) {
+    for(j=0; j<i; j++) {
+      if(coll->compare(s[j], s[j+1]) == UCOL_LESS) {
+        swap(s[j], s[j+1]);
+     }
+   }
+}
+delete coll;
+}
+```
+
+**Java:**
+
+```Java
+String s [] = { /* list of Unicode strings */ };
+try {
+    Collator coll = Collator.getInstance(Locale.US);
+    for (int i = s.length - 1; i > = 1; i --) {
+        for (j=0; j<i; j++) {
+            if (coll.compare(s[j], s[j+1]) == -1) {
+                swap(s[j], s[j+1]);
+            }
+        }
+    }
+} catch (Exception e) {
+    System.err.println("English collation creation failed.");
+    e.printStackTrace();
+}
+```
+
+## GetSortKey
+
+The C API provides the `ucol_getSortKey` function, which requires (apart from a
+pointer to a valid `UCollator` instance), an original `UChar` pointer, together with
+its length. It also requires a pointer to a receiving buffer and its length.
+
+The C++ API provides the `Collator::getSortKey` method with similar parameters as
+the C version. It also provides `Collator::getCollationKey`, which produces a
+`CollationKey` object instance (a wrapper around a sort key).
+
+The Java API provides only the `Collator.getCollationKey` method, which produces a
+`CollationKey` object instance (a wrapper around a sort key).
+
+Sort keys are generally only useful in databases or other circumstances where
+function calls are extremely expensive. See [Sortkeys vs
+Comparison](concepts.md#sortkeys-vs-comparison).
+
+### Sort Key Features
+
+ICU writes sort keys as sequences of bytes.
+
+Each sort key ends with one 00 byte and does not contain any other 00 byte. The
+terminating 00 byte is included in the length of the sort key as returned by the
+API (unlike any other ICU API where terminating NUL bytes or characters are not
+counted as part of the length).
+
+Sort key byte sequences must be compared with an unsigned-byte comparison, as
+with `strcmp()`.
+
+Comparing the sort keys of two strings from the same collator yields the same
+ordering as using the collator to compare the two strings directly. That is:
+`strcmp(coll.getSortKey(str1), coll.getSortKey(str2))` is equivalent to
+`coll.compare(str1, str2)`.
+
+Sort keys from different collators (different locale or strength or any other
+attributes/settings) are not comparable.
+
+Sort keys can be "merged" as described in [UTS #10 Merging Sort
+Keys](http://www.unicode.org/reports/tr10/#Merging_Sort_Keys), via
+`ucol_mergeSortkeys()` or Java `CollationKey.merge()`.
+
+*   Since CLDR 1.9/ICU 4.6, the same effect can be achieved by concatenating
+    strings with U+FFFE between them. The concatenation has the same sort order
+    as the merged sort keys.
+*   However, it is not guaranteed that the sort key of the concatenated strings
+    is the same as the merged result of the individual sort keys. (That is,
+    merge(getSortKey(str1), getSortKey(str2)) may differ from getSortKey(str1 +
+    '\\uFFFE' + str2).)
+*   In particular, a future version of ICU is likely to generate shorter sort
+    keys when concatenating strings with U+FFFE between them (by using
+    compression across the U+FFFE weights).
+*   *The recommended way to achieve "merged" sorting is via strings with
+    U+FFFE.*
+
+Any further analysis or parsing of sort keys is not supported.
+
+Sort keys will change from one ICU version to another; therefore, if sort keys
+are stored in a database or other persistent storage, then each upgrade requires
+their regeneration.
+
+*   The details of the underlying data change with every Unicode and CLDR
+    version.
+*   Sort keys are also subject to enhancements and bug fixes in the builder and
+    implementation code.
+*   On the other hand, the sort *order* is much more stable. It is subject to
+    deliberate changes to the default Unicode collation order, which is kept
+    quite stable, and subject to deliberate changes in CLDR data as new data is
+    added and feedback on existing data is taken into account.
+
+Implementation notes: (Not supported as permanent constraints on sort keys)
+
+Byte 02 was unique as a merge separator for some versions of ICU before version
+ICU 53. Since ICU 53, 02 is also used in regular collation weights where there
+is no conflict (to expand the number of available short weights).
+
+Byte 01 has been unique as a level separator. This is not strictly necessary for
+non-primary levels. (A level's compressible "common" weight as its level
+separator would yield shorter sort keys.) However, the current implementation of
+`ucol_mergeSortkeys()` relies on it. (Also, test code currently examines sort keys
+for finding the strength of a comparison difference.) This may change in the
+future, especially if `ucol_mergeSortkeys()` were to become deprecated.
+
+Level separators are likely to be equivalent to single-byte weights (possibly
+compressible): Multi-byte level separators would noticeably lengthen sort keys
+for short strings.
+
+The byte values used in several ICU versions for sort keys and collation
+elements are documented in the [“Special Byte Values” design
+doc](http://site.icu-project.org/design/collation/bytes) on the ICU site.
+
+### Sort Key Output Buffer
+
+`ucol_getSortKey()` can operate in 'preflighting' mode, which returns the amount
+of memory needed to store the resulting sort key. This mode is automatically
+activated if the output buffer size passed is set to zero. Should the sort key
+become longer than the buffer provided, function again slips into preflighting
+mode. The overall performance is poorer than if the function is called with a
+zero output buffer . If the size of the sort key returned is greater than the
+size of the buffer provided, the content of the result buffer is undefined. In
+that case, the result buffer could be reallocated to its proper size and the
+sort key generator function can be used again.
+
+The best way to generate a series of sort keys is to do the following:
+
+1.  Create a big temporary buffer on the stack. Typically, this buffer is
+    allocated only once, and reused with every sort key generated. There is no
+    need to keep it as small as possible. A recommended size for the temporary
+    buffer is four times the length of the longest string processed.
+
+2.  Start the loop. Call `ucol_getSortKey()` to find out how big the sort key
+    buffer should be, and fill in the temporary buffer at the same time.
+
+3.  If the temporary buffer is too small, allocate or reallocate more space.
+    Fill in the sort key values in the overflow buffer.
+
+4.  Allocate the sort key buffer with the size returned by `ucol_getSortKey()` and
+    call memcpy to copy the sort key content from the temp buffer to the sort
+    key buffer.
+
+5.  Loop back to step 1 until you are done.
+
+6.  Delete the overflow buffer if you created one.
+
+### Example
+
+```C
+void GetSortKeys(const Ucollator* coll, const UChar*
+const *source, uint32_t arrayLength)
+{
+  char[1000] buffer; // allocate stack buffer
+  char* currBuffer = buffer;
+  int32_t bufferLen = sizeof(buffer);
+  int32_t expectedLen = 0;
+  UErrorCode err = U_ZERO_ERROR;
+
+  for (int i = 0; i < arrayLength; ++i) {
+    expectedLen = ucol_getSortKey(coll, source[i], -1, currBuffer, bufferLen);
+    if (expectedLen > bufferLen) {
+      if (currBuffer == buffer) {
+        currBuffer = (char*)malloc(expectedLen);
+      } else {
+        currBuffer = (char*)realloc(currBuffer, expectedLen);
+      }
+    }
+    bufferLen = ucol_getSortKey(coll, source[i], -1, currBuffer, expectedLen);
+  }
+  processSortKey(i, currBuffer, bufferLen);
+
+
+  if (currBuffer != buffer && currBuffer != NULL) {
+    free(currBuffer);
+  }
+}
+```
+
+> :point_right: **Note** Although the API allows you to call
+> `ucol_getSortKey` with `NULL` to see what the
+> sort key length is, it is strongly recommended that you NOT determine the length
+> first, then allocate and fill the sort key buffer. If you do, it requires twice
+> the processing since computing the length has to do the same calculation as
+> actually getting the sort key. Instead, the example shown above uses a stack buffer.
+
+### Using Iterators for String Comparison
+
+ICU4C's `ucol_strcollIter` API allows for comparing two strings that are supplied
+as character iterators (`UCharIterator`). This is useful when you need to compare
+differently encoded strings using `strcoll`. In that case, converting the strings
+first would probably be wasteful, since `strcoll` usually gives the result
+before whole strings are processed. This API is implemented only as a C function
+in ICU4C. There are no equivalent C++ or ICU4J functions.
+
+```C
+...
+/* we are arriving with two char*: utf8Source and utf8Target, with their
+* lengths in utf8SourceLen and utf8TargetLen
+*/
+    UCharIterator sIter, tIter;
+    uiter_setUTF8(&sIter, utf8Source, utf8SourceLen);
+    uiter_setUTF8(&tIter, utf8Target, utf8TargetLen);
+    compareResultUTF8 = ucol_strcollIter(myCollation, &sIter, &tIter, &status);
+...
+```
+
+### Obtaining Partial Sort Keys
+
+When using different sort algorithms, such as radix sort, sometimes it is useful
+to process strings only as much as needed to feed into the sorting algorithm.
+For that purpose, ICU provides the `ucol_nextSortKeyPart` API, which also takes
+character iterators. This API allows for iterating over subsequent pieces of an
+uncompressed sort key. Between calls to the API you need to save a 64-bit state.
+Following is an example of simulating a string compare function using the partial
+sort key API. Your usage model is bound to look much different.
+
+```C
+static UCollationResult compareUsingPartials(UCollator *coll,
+                                             const UChar source[], int32_t sLen,
+                                             const UChar target[], int32_t tLen,
+                                             int32_t pieceSize, UErrorCode *status) {
+  int32_t partialSKResult = 0;
+  UCharIterator sIter, tIter;
+  uint32_t sState[2], tState[2];
+  int32_t sSize = pieceSize, tSize = pieceSize;
+  int32_t i = 0;
+  uint8_t sBuf[16384], tBuf[16384];
+  if(pieceSize > 16384) {
+    *status = U_BUFFER_OVERFLOW_ERROR;
+    return UCOL_EQUAL;
+  }
+  *status = U_ZERO_ERROR;
+  sState[0] = 0; sState[1] = 0;
+  tState[0] = 0; tState[1] = 0;
+  while(sSize == pieceSize && tSize == pieceSize && partialSKResult == 0) {
+    uiter_setString(&sIter, source, sLen);
+    uiter_setString(&tIter, target, tLen);
+    sSize = ucol_nextSortKeyPart(coll, &sIter, sState, sBuf, pieceSize, status);
+    tSize = ucol_nextSortKeyPart(coll, &tIter, tState, tBuf, pieceSize, status);
+    partialSKResult = memcmp(sBuf, tBuf, pieceSize);
+  }
+
+  if(partialSKResult < 0) {
+      return UCOL_LESS;
+  } else if(partialSKResult > 0) {
+    return UCOL_GREATER;
+  } else {
+    return UCOL_EQUAL;
+  }
+}
+```
+
+### Other Examples
+
+A longer example is presented in the 'Examples' section. Here is an illustration
+of the usage model.
+
+**C:**
+
+```C
+#define MAX_KEY_SIZE 100
+#define MAX_BUFFER_SIZE 10000
+#define MAX_LIST_LENGTH 5
+const char text[] = {
+   "Quick",
+   "fox",
+   "Moving",
+   "trucks",
+   "riddle"
+};
+const UChar s [5][20];
+int i;
+int32_t length, expectedLen;
+uint8_t temp[MAX_BUFFER _SIZE];
+
+
+uint8_t *temp2 = NULL;
+uint8_t keys [MAX_LIST_LENGTH][MAX_KEY_SIZE];
+UErrorCode status = U_ZERO_ERROR;
+
+temp2 = temp;
+
+length = MAX_BUFFER_SIZE;
+for( i = 0; i < 5; i++)
+{
+   u_uastrcpy(s[i], text[i]);
+}
+UCollator *coll = ucol_open("en_US",&status);
+uint32_t length;
+if(U_SUCCESS(status)) {
+  for(i=0; i<MAX_LIST_LENGTH; i++) {
+    expectedLen = ucol_getSortKey(coll, s[i], -1,temp2,length );
+    if (expectedLen > length) {
+      if (temp2 == temp) {
+        temp2 =(char*)malloc(expectedLen);
+      } else
+        temp2 =(char*)realloc(temp2, expectedLen);
+      }
+        length =ucol_getSortKey(coll, s[i], -1, temp2, expectedLen);
+    }
+    memcpy(key[i], temp2, length);
+  }
+}
+qsort(keys, MAX_LIST_LENGTH,MAX_KEY_SIZE*sizeof(uint8_t), strcmp);
+for (i = 0; i < MAX_LIST_LENGTH; i++) {
+  free(key[i]);
+}
+ucol_close(coll);
+```
+
+**C++:**
+
+```C++
+#define MAX_LIST_LENGTH 5
+const UnicodeString s [] = {
+  "Quick",
+  "fox",
+  "Moving",
+  "trucks",
+  "riddle"
+};
+CollationKey *keys[MAX_LIST_LENGTH];
+UErrorCode status = U_ZERO_ERROR;
+Collator *coll = Collator::createInstance(Locale("en_US"), status);
+uint32_t i;
+if(U_SUCCESS(status)) {
+  for(i=0; i<listSize; i++) {
+    keys[i] = coll->getCollationKey(s[i], -1);
+  }
+  qsort(keys, MAX_LIST_LENGTH, sizeof(CollationKey),compareKeys);
+  delete[] keys;
+  delete coll;
+}
+```
+
+**Java:**
+
+```Java
+String s [] = {
+  "Quick",
+  "fox",
+  "Moving",
+  "trucks",
+  "riddle"
+};
+CollationKey keys[] = new CollationKey[s.length];
+try {
+    Collator coll = Collator.getInstance(Locale.US);
+    for (int i = 0; i < s.length; i ++) {
+        keys[i] = coll.getCollationKey(s[i]);
+    }
+
+    Arrays.sort(keys);
+}
+catch (Exception e) {
+    System.err.println("Error creating English collator");
+    e.printStackTrace();
+}
+```
+
+## Collation ElementIterator
+
+A collation element iterator can only be used in one direction. This is
+established at the time of the first call to retrieve a collation element. Once
+`ucol_next` (C), `CollationElementIterator::next` (C++) or
+`CollationElementIterator.next` (Java) are invoked,
+`ucol_previous` (C),
+`CollationElementIterator::previous` (C++) or `CollationElementIterator.previous`
+(Java) should not be used (and vice versa). The direction can be changed
+immediately after `ucol_first`, `ucol_last`, `ucol_reset` (in C),
+`CollationElementIterator::first`, `CollationElementIterator::last`,
+`CollationElementIterator::reset` (in C++) or `CollationElementIterator.first`,
+`CollationElementIterator.last`, `CollationElementIterator.reset` (in Java) is
+called, or when it reaches the end of string while traversing the string.
+
+When `ucol_next` is called at the end of the string buffer, `UCOL_NULLORDER` is
+always returned with any subsequent calls to `ucol_next`. The same applies to
+`ucol_previous`.
+
+An example of how iterators are used is the Boyer-Moore search implementation,
+which can be found in the samples section.
+
+### API Example
+
+**C:**
+
+```C
+UCollator         *coll = ucol_open("en_US",status);
+UErrorCode         status = U_ZERO_ERROR;
+UChar              text[20];
+UCollationElements *collelemitr;
+uint32_t           collelem;
+
+u_uastrcpy(text, "text");
+collelemitr = ucol_openElements(coll, text, -1, &status);
+collelem = 0;
+do {
+  collelem = ucol_next(collelemitr, &status);
+} while (collelem != UCOL_NULLORDER);
+
+ucol_closeElements(collelemitr);
+ucol_close(coll);
+```
+
+**C++:**
+
+```C++
+UErrorCode    status = U_ZERO_ERROR;
+Collator      *coll = Collator::createInstance(Locale::getUS(), status);
+UnicodeString text("text");
+CollationElementIterator *collelemitr = coll->createCollationElementIterator(text);
+uint32_t      collelem = 0;
+do {
+  collelem = collelemitr->next(status);
+} while (collelem != CollationElementIterator::NULLORDER);
+
+delete collelemitr;
+delete coll;
+```
+
+**Java:**
+
+```Java
+try {
+    RuleBasedCollator coll = (RuleBasedCollator)Collator.getInstance(Locale.US);
+    String text = "text";
+    CollationElementIterator collelemitr = coll.getCollationElementIterator(text);
+    int collelem = 0;
+    do {
+        collelem = collelemitr.next();
+    } while (collelem != CollationElementIterator.NULLORDER);
+} catch (Exception e) {
+    System.err.println("Error in collation iteration");
+    e.printStackTrace();
+}
+```
+
+## Setting and Getting Attributes
+
+The general attribute setting APIs are `ucol_setAttribute` (in C) and
+`Collator::setAttribute` (in C++). These APIs take an attribute name and an
+attribute value. If the name and the value pass a syntax and range check, the
+property of the collator is changed. If the name and value do not pass a syntax
+and range check, however, the state is not changed and the error code variable
+is set to an error condition. The Java version does not provide general
+attribute setting APIs; instead, each attribute has its own setter API of
+the form `RuleBasedCollator.setATTRIBUTE_NAME(arguments)`.
+
+The attribute getting APIs are `ucol_getAttribute` (C) and `Collator::getAttribute`
+(C++). Both APIs require an attribute name as an argument and return an
+attribute value if a valid attribute name was supplied. If a valid attribute
+name was not supplied, however, they return an undefined result and set the
+error code. Similarly to the setter APIs for the Java version, no generic getter
+API is provided. Each attribute has its own setter API of the form
+`RuleBasedCollator.getATTRIBUTE_NAME()` in the Java version.
+
+## References:
+
+1.  Ken Whistler, Markus Scherer: "Unicode Technical Standard #10, Unicode Collation
+    Algorithm" (<http://www.unicode.org/unicode/reports/tr10/>)
+
+2.  ICU Design doc: "Collation v2" (<http://site.icu-project.org/design/collation/v2>)
+
+3.  Mark Davis: "ICU Collation Design Document"
+    (<https://htmlpreview.github.io/?https://github.com/unicode-org/icu-docs/blob/master/design/collation/ICU_collation_design.htm>)
+
+3.  The Unicode Standard, chapter 5, "Implementation guidelines"
+    (<http://www.unicode.org/unicode/uni2book/ch05.pdf>)
+
+4.  Laura Werner: "Efficient text searching in Java: Finding the right string in
+    any language"
+    (<http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>)
+
+5.  Mark Davis, Martin Dürst: "Unicode Standard Annex #15: Unicode Normalization
+    Forms" (<http://www.unicode.org/unicode/reports/tr15/>).
diff --git a/docs/userguide/collation/architecture.md b/docs/userguide/collation/architecture.md
new file mode 100644
index 0000000..16c78a4
--- /dev/null
+++ b/docs/userguide/collation/architecture.md
@@ -0,0 +1,562 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Collation Service Architecture
+
+This section describes the design principles, architecture and coding
+conventions of the ICU Collation Service.
+
+## Collator
+
+To use the Collation Service, a Collator must first be instantiated. An
+Collator is a data structure or object that maintains all of the property
+and state information necessary to define and support the specific collation
+behavior provided. Examples of properties described in the Collator are the
+locale, whether normalization is to be performed, and how many levels of
+collation are to be evaluated. Examples of the state information described in
+the Collator include the direction of a Collation Element Iterator (forward
+or backward) and the status of the last API executed.
+
+The Collator is instantiated either by referencing a locale or by defining a
+custom set of rules (a tailoring).
+
+The Collation Service uses the paradigm:
+
+1.  Open a Collator,
+
+2.  Use while necessary,
+
+3.  Close the Collator.
+
+Collator instances cannot be shared among threads. You should open them
+instead, and use a different collator for each separate thread. The safe clone
+function is supported for cloning collators in a thread-safe fashion.
+
+The Collation Service follows the ICU conventions for locale designation
+when opening collators:
+
+1.  NULL means the default locale.
+
+2.  The empty locale name ("") means the root locale.
+    The Collation Service adheres to the ICU conventions described in the
+    "[ICU Architectural Design](../design.md) " section of the users guide.
+    In particular:
+
+3.  The standard error code convention is usually followed. (Functions that do
+    not take an error code parameter do so for backward compatibility.)
+
+4.  The string length convention is followed: when passing a `UChar *`, the
+    length is required in a separate argument. If -1 is passed for the length,
+    it is assumed that the string is zero terminated.
+
+### Collation locale and keyword handling
+
+When a collator is created from a locale, the collation service (like all ICU
+services) must map the requested locale to the localized collation data
+available to ICU at the time. It does so using the standard ICU locale fallback
+mechanism. See the fallback section of the [locale
+chapter](../locale/index.md) for more details.
+
+If you pass a regular locale in, like "en_US", the collation service first
+searches with fallback for "collations/default" key. The first such key it finds
+will have an associated string value; this is the keyword name for the collation
+that is default for this locale. If the search falls all the way back to the
+root locale, the collation service will us the "collations/default" key there,
+which has the value "standard".
+
+If there is a locale with a keyword, like "de-u-co-phonebk" or "de@collation=phonebook", the
+collation service searches with fallback for "collations/phonebook". If the
+search is successful, the collation service uses the string value it finds to
+instantiate a Collator. If the search fails because no such key is present in
+any of ICU's locale data (e.g., "de@collation=funky"), the service returns a
+collator implementing the default tailoring of the locale.
+If the fallback is all the way to the root locale, then
+the return `UErrorCode` is `U_USING_DEFAULT_WARNING`.
+
+## Input values for collation
+
+Collation deals with processing strings. ICU generally requires that all the
+strings should be in UTF-16 format, and that all the required conversion should
+done before ICU functions are used. In the case of collation, there are APIs
+that can also take instances of character iterators (`UCharIterator`)
+or UTF-8 directly.
+
+Theoretically, character iterators can iterate strings
+in any encoding. ICU currently provides character iterator implementations for
+UTF-8 and UTF-16BE (useful when processing data from a big endian platform on an
+little endian machine). It should be noted, however, that using iterators for
+collation APIs has a performance impact. It should be used in situations when it
+is not desirable to convert whole strings before the operation - such as when
+using a string compare function.
+
+## Collation Elements
+
+As discussed in the introduction, there are many possible orderings for sorted
+text, depending on language and other factors. Ideally, there is a way to
+describe each ordering as a set of rules for calculating numeric values for each
+string of text. The collation process then becomes one of simply comparing these
+numeric values.
+
+This essentially describes the way the Collation Service works. To implement
+a particular sort ordering, first the relationship between each character or
+character sequence is derived. For example, a Spanish ordering defines the
+letter sequence "CH" to be between the letters "C" and "D". As also discussed in
+the introduction, to order strings properly requires that comparison of base
+letters must be considered separately from comparison of accents. Letter case
+must also be considered separately from either base letters or accents. Any
+ordering specification language must provide a way to define the relationships
+between characters or character sequences on multiple levels. ICU supports this
+by using "<" to describe a relationship at the primary level, using "<<" to
+describe a relationship at the secondary level, and using "<<<" to describe a
+relationship at the tertiary level. Here are some example usages:
+
+Symbol | Example  | Description
+------ | -------- | -----------
+`<`    | `c < ch` | Make a primary (base letter) difference between "c" and the character sequence "ch"
+`<<`   | `a << ä` | Make a secondary (accent) difference between "a" and "ä"
+`<<<`  | `a<<<A`  | Make a tertiary difference between "a" and "A"
+
+A more complete description of the ordering specification symbols and their
+meanings is provided in the section on Collation Tailoring.
+
+Once a sort ordering is defined by specifying the desired relationships between
+characters and character sequences, ICU can convert these relationships to a
+series of numerical values (one for each level) that satisfy these same
+relationships.
+
+This series of numeric values, representing the relative weighting of a
+character or character sequence, is called a Collation Element (CE).
+One possible encoding of a Collation Element is a 32-bit value consisting of
+a 16-bit primary weight, a 8-bit secondary weight,
+2 case bits, and a 6-bit tertiary weight.
+
+The sort weight of a string is represented by the collation elements of its
+component characters and character sequences. For example, the sort weight of
+the string "apple" would consist of its component Collation Elements, as shown
+here:
+
+"Apple" | "Apple" Collation Elements
+------- | --------------------------
+a       | `[1900.05.05]`
+p       | `[3700.05.05]`
+p       | `[3700.05.05]`
+l       | `[2F00.05.05]`
+e       | `[2100.05.05]`
+
+In this example, the letter "a" has a 16-bit primary weight of 1900 (hex), an
+8-bit secondary weight of 05 (hex), and a combined 8-bit case-tertiary weight of
+05 (hex).
+
+String comparison is performed by comparing the collation elements of each
+string. Each of the primary weights are compared. If a difference is found, that
+difference determines the relationship between the two strings. If no
+differences are found, the secondary weights are compared and so forth.
+
+With ICU it is possible to specify how many levels should be compared. For some
+applications, it can be desirable to compare only primary levels or to compare
+only primary and secondary levels.
+
+## Sort Keys
+
+If a string is to be compared thousands or millions of times,
+it can be more efficient to use sort keys.
+Sort keys are useful in situations where a large amount of data is indexed
+and frequently searched. The sort key is generated once and used in subsequent
+comparisons, rather than repeatedly generating the string's Collation Elements.
+The comparison of sort keys is a very efficient and simple binary compare of strings of
+unsigned bytes.
+
+An important property of ICU sort keys is that you can obtain the same results
+by comparing 2 strings as you do by comparing the sort keys of the 2 strings
+(provided that the same ordering and related collation attributes are used).
+
+An ICU sort key is a pre-processed sequence of bytes generated from a Unicode
+string. The weights for each comparison level are concatenated, separated by a
+"0x01" byte between levels.
+The entire sequence is terminated with a 0x00 byte for convenience in C APIs.
+(This 0x00 terminator is counted in the sort key length —
+unlike regular strings where the NUL terminator is excluded from the string length.)
+
+ICU actually compresses the sort keys so that they take the
+minimum storage in memory and in databases.
+
+<!-- TODO: (diagram was missing in Google Sites already)
+    The diagram below represents an uncompressed sort key in ICU for ease of understanding.  -->
+
+### Sort key size
+
+One of the more important issues when considering using sort keys is the sort
+key size. Unfortunately, it is very hard to give a fast exact answer to the
+following question: "What is the maximum size for sort keys generated for
+strings of size X". This problem is twofold:
+
+1.  The maximum size of the sort key depends on the size of the collation
+    elements that are used to build it. Size of collation elements vary greatly
+    and depends both on the alphabet in question and on the locale used.
+
+2.  Compression is used in building sort keys. Most 'regular' sequences of
+    characters produce very compact sort keys.
+
+If one is to assume the worst case and use too-big buffers, a lot of space will
+be wasted. However, if you use too-small buffers, you will lose performance if
+generated sort keys are longer than supplied buffers too often
+(and you have to reallocate for each of those).
+A good strategy
+for this problem would be to manually manage a large buffer for storing sortkeys
+and keep a list of indices to sort keys in this buffer (see the "large buffers"
+[Collation Example](examples.md#using-large-buffers-to-manage-sort-keys)
+for more details).
+
+Here are some rules of a thumb, please do not rely on them. If you are looking
+at the East Asian locales, you probably want to go with 5 bytes per code point.
+For Thai, 3 bytes per code point should be sufficient. For all the other locales
+(mostly Latin and Cyrillic), you should be fine with 2 bytes per code point.
+These values are based on average lengths of sort keys generated with tertiary
+strength. If you need quaternary and identical strength (you should not), add 3
+bytes per code point to each of these.
+
+### Partial sort keys
+
+In some cases, most notably when implementing [radix
+sorting](http://en.wikipedia.org/wiki/Radix_sort), it is useful to produce only
+parts of sort keys at a time. ICU4C 2.6+ provides an API that allows producing
+parts of sort keys (`ucol_nextSortKeyPart` API). These sort keys may or may not be
+compressed; that is, they may or may not be compatible with regular sort keys.
+
+### Merging sort keys
+
+Sometimes, it is useful to be able to merge sort keys. One example is having
+separate sort keys for first and last names. If you need to perform an operation
+that requires a sort key generated on the whole name, instead of concatenating
+strings and regenerating sort keys, you should merge the sort keys. The merging
+is done by merging the corresponding levels while inserting a terminator between
+merged parts. The reserved sort key byte value for the merge terminator is 0x02.
+For more details see [UCA section 1.6, Merging Sort
+Keys](http://www.unicode.org/reports/tr10/#Interleaved_Levels).
+
+*   C API: unicode/ucol.h `ucol_mergeSortkeys()`
+*   Java API: `com.ibm.icu.text.CollationKey merge(CollationKey source)`
+
+CLDR 1.9/ICU 4.6 and later map U+FFFE to a special collation element that is
+intended to allow concatenating strings like firstName+\\uFFFE+lastName to yield
+the same results as merging their individual sort keys.
+This has been fully implemented in ICU since version 53.
+
+### Generating bounds for a sort key (prefix matching)
+
+Having sort keys for strings allows for easy creation of bounds - sort keys that
+are guaranteed to be smaller or larger than any sort key from a give range. For
+example, if bounds are produced for a sortkey of string "smith", strings between
+upper and lower bounds with one level would include "Smith", "SMITH", "sMiTh".
+Two kinds of upper bounds can be generated - the first one will match only
+strings of equal length, while the second one will match all the strings with
+the same initial prefix.
+
+CLDR 1.9/ICU 4.6 and later map U+FFFF to a collation element with the maximum
+primary weight, so that for example the string "smith\\uFFFF" can be used as the
+upper bound rather than modifying the sort key for "smith".
+
+## Collation Element Iterator
+
+The collation element iterator is used for traversing Unicode string collation
+elements one at a time. It can be used to implement language-sensitive text
+search algorithms like Boyer-Moore.
+
+For most applications, the two API categories, compare and sort key, are
+sufficient. Most people do not need to manipulate collation elements directly.
+
+Example:
+
+Consider iterating over "apple" and "äpple". Here are sequences of collation
+elements:
+
+String 1 | String 1 Collation Elements
+-------- | ---------------------------
+a        | `[1900.05.05]`
+p        | `[3700.05.05]`
+p        | `[3700.05.05]`
+l        | `[2F00.05.05]`
+e        | `[2100.05.05]`
+
+String 2 | String 2 Collation Elements
+-------- | ---------------------------
+a        | `[1900.05.05]`
+\\u0308  | `[0000.9D.05]`
+p        | `[3700.05.05]`
+p        | `[3700.05.05]`
+l        | `[2F00.05.05]`
+e        | `[2100.05.05]`
+
+The resulting CEs are typically masked according to the desired strength, and
+zero CEs are discarded. In the above example, masking with 0xFFFF0000 (for primary strength)
+produces the results of NULL secondary and tertiary differences. The collator then
+ignores the NULL differences and declares a match. For more details see the
+paper "Efficient text searching in Java™: Finding the right string in any
+language" by Laura Werner (
+<http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>).
+
+## Collation Attributes
+
+The Collation Service has a number of attributes whose values can be changed
+during run time. These attributes affect both the functionality and the
+performance of the Collation Service. This section describes these
+attributes and, where possible, their performance impact. Performance
+indications are only approximate and timings may vary significantly depending on
+the CPU, compiler, etc.
+
+Although string comparison by ICU and comparison of each string's sort key give
+the same results, attribute settings can impact the execution time of each
+method differently. To be precise in the discussion of performance, this section
+refers to the API employed in the measurement. The `ucol_strcoll` function is the
+API for string comparison. The `ucol_getSortKey` function is used to create sort
+keys.
+
+> :point_right: **Note** There is a special attribute value, `UCOL_DEFAULT`,
+> that can be used to set any attribute to its default value
+> (which is inherited from the UCA and the tailoring).
+
+### Attribute Types
+
+#### Strength level
+
+Collation strength, or the maximum collation level used for comparison, is set
+by using the `UCOL_STRENGTH` attribute. Valid values are:
+
+1.  `UCOL_PRIMARY`
+
+2.  `UCOL_SECONDARY`
+
+3.  `UCOL_TERTIARY` (default)
+
+4.  `UCOL_QUATERNARY`
+
+5.  `UCOL_IDENTICAL`
+
+#### French collation
+
+The `UCOL_FRENCH_COLLATION` attribute determines whether to sort the secondary
+differences in reverse order. Valid values are:
+
+1.  `UCOL_OFF` (default): compares secondary differences in the order they appear
+    in the string.
+
+2.  `UCOL_ON`: causes secondary differences to be considered in reverse order, as
+    it is done in the French language.
+
+#### Normalization mode
+
+The `UCOL_NORMALIZATION_MODE` attribute, or its alias `UCOL_DECOMPOSITION_MODE`,
+controls whether text normalization is performed on the input strings. Valid
+values are:
+
+1.  `UCOL_OFF` (default): turns off normalization check
+
+2.  `UCOL_ON` : normalization is checked and the collator performs normalization
+    if it is needed.
+
+X                     | FCD | NFC | NFD
+--------------------- | --- | --- | ---
+A-ring                | Y   | Y   |
+Angstrom              | Y   |     |
+A + ring              | Y   |     | Y
+A + grave             | Y   | Y   |
+A-ring + grave        | Y   |     |
+A + cedilla + ring    | Y   |     | Y
+A + ring + cedilla    |     |     |
+A-ring + cedilla      |     | Y   |
+
+With normalization mode turned on, the `ucol_strcoll` function slows down by 10%.
+In addition, the time to generate a sort key also increases by about 25%.
+
+#### Alternate handling
+
+This attribute allows shifting of the variable characters (usually spaces and
+punctuation, in the UCA also most symbols) from the primary to the quaternary
+strength level. This is set by using the `UCOL_ALTERNATE_HANDLING` attribute. For
+details see [UCA: Variable
+Weighting](http://www.unicode.org/reports/tr10/#Variable_Weighting), [LDML:
+Collation
+Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings),
+and [“Ignore Punctuation” Options](customization/ignorepunct.md).
+
+1.  `UCOL_NON_IGNORABLE` (CLDR/ICU default): variable characters are treated as
+    all the other characters
+
+2.  `UCOL_SHIFTED` (UCA default): all the variable characters will be ignored at
+    the primary, secondary and tertiary levels and their primary strengths will
+    be shifted to the quaternary level.
+
+#### Case Ordering
+
+Some conventions require uppercase letters to sort before lowercase ones, while
+others require the opposite. This attribute is controlled by the value of the
+`UCOL_CASE_FIRST`. The case difference in the UCA is contained in the tertiary
+weights along with other appearance characteristics (like circling of letters).
+The case-first attribute allows for emphasizing of the case property of the
+letters by reordering the tertiary weights with either upper-first, and/or
+lowercase-first. This difference gets the most significant bit in the weight.
+Valid values for this attribute are:
+
+1.  `UCOL_OFF` (default): leave tertiary weights unaffected
+
+2.  `UCOL_LOWER_FIRST`: causes lowercase letters and uncased characters to sort
+    before uppercase
+
+3.  `UCOL_UPPER_FIRST` : causes uppercase letters to sort first
+
+The case-first attribute does not affect the performance substantially.
+
+#### Case level
+
+When this attribute is set, an additional level is formed between the secondary
+and tertiary levels, known as the Case Level. The case level is used to
+distinguish large and small Japanese Kana characters. Case level could also be
+used in other situations. for example to distinguish certain Pinyin characters.
+Case level is controlled by `UCOL_CASE_LEVEL` attribute. Valid values for this
+attribute are
+
+1.  `UCOL_OFF` (default): no additional case level
+
+2.  `UCOL_ON` : adds a case level
+
+#### Hiragana Quaternary
+
+*This setting is deprecated and ignored in recent versions of ICU.*
+
+Hiragana Quaternary can be set to `UCOL_ON`, in which case Hiragana code points
+will sort before everything else on the quaternary level. If set to `UCOL_OFF`
+Hiragana letters are treated the same as all the other code points. This setting
+can be changed on run-time using the `UCOL_HIRAGANA_QUATERNARY_MODE` attribute.
+You probably won't need to use it.
+
+#### Variable Top
+
+Variable Top is a boundary which decides whether the code points will be treated
+as variable (shifted to quaternary level in the **shifted** mode) or
+non-ignorable. Special APIs are used for setting of variable top. It can
+basically be set either to a codepoint or a primary strength value.
+
+## Performance
+
+ICU collation is designed to be fast, small and customizable. Several techniques
+are used to enhance the performance:
+
+1.  Providing optimized processing for Latin characters.
+
+2.  Comparing strings incrementally and stopping at the first significant
+    difference.
+
+3.  Tuning to eliminate unnecessary file access or memory allocation.
+
+4.  Providing efficient preflight functions that allows fast sort key size
+    generation.
+
+5.  Using a single, shared copy of UCA in memory for the read-only default sort
+    order. Only small tailoring tables are kept in memory for locale-specific
+    customization.
+
+6.  Compressing sort keys efficiently.
+
+7.  Making the sort order be data-driven.
+
+In general, the best performance from the Collation Service is expected by
+doing the following:
+
+1.  After opening a collator, keep and reuse it until done. Do not open new
+    collators for the same sort order. (Note the restriction on
+    multi-threading.)
+
+2.  Use `ucol_strcoll` etc. when comparing strings. If it is necessary to
+    compare strings thousands or millions of times,
+    create the sort keys first and compare the sort keys instead.
+    Generating the sort keys of two strings is about 5-10
+    times slower than just comparing them directly.
+
+3.  Follow the best practice guidelines for generating sort keys. Do not call
+    `ucol_getSortKey` twice to first size the key and then allocate the sort key
+    buffer and repeat the call to the function to fill in the buffer.
+
+### Performance and Storage Implications of Attributes
+
+Most people use the default attributes when comparing strings or when creating
+sort keys. When they do want to customize the ordering, the most common options
+are the following :
+
+`UCOL_ALTERNATE_HANDLING == UCOL_SHIFTED`\
+Used to ignore space and punctuation characters
+
+`UCOL_ALTERNATE_HANDLING == UCOL_SHIFTED` **and** `UCOL_STRENGTH == UCOL_QUATERNARY`\
+Used to ignore the space and punctuation characters except when there are no previous letter, accent, or case/variable differences.
+
+`UCOL_CASE_FIRST == UCOL_LOWER_FIRST` **or** `UCOL_CASE_FIRST == UCOL_UPPER_FIRST`\
+Used to change the ordering of upper vs. lower case letters (as
+well as small vs. large kana)
+
+`UCOL_CASE_LEVEL == UCOL_ON` **and** `UCOL_STRENGTH == UCOL_PRIMARY`\
+Used to ignore only the accent differences.
+
+`UCOL_NORMALIZATION_MODE == UCOL_ON`\
+Force to always check for normalization. This
+is used if the input text may not be in FCD form.
+
+`UCOL_FRENCH_COLLATION == UCOL_OFF`\
+This is only useful for languages like French and Catalan that may turn this attribute on.
+(It is the default only for Canadian French ("fr-CA").)
+
+In String Comparison, most of these options have little or no effect on
+performance. The only noticeable one is normalization, which can cost 10%-40% in
+performance.
+
+For Sort Keys, most of these options either leave the storage alone or reduce
+it. Shifting can reduce the storage by about 10%-20%; case level + primary-only
+can decrease it about 20% to 40%. Using no French accents can reduce the storage
+by about 38% , but only for languages like French and Catalan that turn it on by
+default. On the other hand, using Shifted + Quaternary can increase the storage by
+10%-15%. (The Identical Level also increases the length, but this option is not
+recommended).
+
+> :point_right: **Note** All of the above numbers are based on
+> tests run on a particular machine, with a particular set of data.
+> (The data for each language is a large number of names
+> in that language in the format <first_name>, <last name>.)
+> The performance and storage may vary, depending on the particular computer,
+> operating system, and data.
+
+## Versioning
+
+Sort keys are often stored on disk for later reuse. A common example is the use
+of keys to build indexes in databases. When comparing keys, it is important to
+know that both keys were generated by the same algorithms and weightings.
+Otherwise, identical strings with keys generated on two different dates, for
+example, might compare as unequal. Sort keys can be affected by new versions of
+ICU or its data tables, new sort key formats, or changes to the Collator.
+Starting with release 1.8.1, ICU provides a versioning mechanism to identify the
+version information of the following (but not limited to),
+
+1.  The run-time executable
+
+2.  The collation element content
+
+3.  The Unicode/UCA database
+
+4.  The tailoring table
+
+The version information of Collator is a 32-bit integer. If a new version of ICU
+has changes affecting the content of collation elements, the version information
+will be changed. In that case, to use the new version of ICU collator will
+require regenerating any saved or stored sort keys.
+
+However, it is possible to modify ICU code or data without changing relevant version numbers,
+so it is safer to regenerate sort keys any time after any part of ICU has been updated.
+
+Since ICU4C 1.8.1.
+it is possible to build your program so that it uses more than one version of
+ICU (only in C/C++, not in Java). Therefore, you could use the current version
+for the features you need and use the older version for collation.
+
+## Programming Examples
+
+See the [Collation Examples](examples.md) chapter for an example of how to
+compare and create sort keys with the default locale in C, C++ and Java.
diff --git a/docs/userguide/collation/concepts.md b/docs/userguide/collation/concepts.md
new file mode 100644
index 0000000..c8468b5
--- /dev/null
+++ b/docs/userguide/collation/concepts.md
@@ -0,0 +1,814 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Collation Concepts
+
+The previous section demonstrated many of the requirements imposed on string
+comparison routines that try to correctly collate strings according to
+conventions of more than a hundred different languages, written in many
+different scripts. This section describes the principles and architecture behind
+the ICU Collation Service.
+
+## Sortkeys vs Comparison
+
+Sort keys are most useful in databases, where the overhead of calling a function
+for each comparison is very large.
+
+Generating a sort key from a Collator is many times more expensive than doing a
+compare with the Collator (for common use cases). That's if the two functions
+are called from Java or C. So for those languages, unless there is a very large
+number of comparisons, it is better to call the compare function.
+
+Here is an example, with a little back-of-the-envelope calculation. Let's
+suppose that with a given language on a given platform, the compare performance
+(CP) is 100 faster than sortKey performance (SP), and that you are doing a
+binary search of a list with 1,000 elements. The binary comparison performance
+is BP. We'd do about 10 comparisons, getting:
+
+compare: 10 \* CP
+
+sortkey: 1 \* SP + 10 \* BP
+
+Even if BP is free, compare would be better. One has to get up to where log2(n)
+= 100 before they break even.
+
+But even this calculation is only a rough guide. First, the binary comparison is
+not completely free. Secondly, the performance of compare function varies
+radically with the source data. We optimized for maximizing performance of
+collation in sorting and binary search, so comparing strings that are "close" is
+optimized to be much faster than comparing strings that are "far away". That
+optimization is important because normal sort/lookup operations compare close
+strings far more often -- think of binary search, where the last few comparisons
+are always with the closest strings. So even the above calculation is not very
+accurate.
+
+## Comparison Levels
+
+In general, when comparing and sorting objects, some properties can take
+precedence over others. For example, in geometry, you might consider first the
+number of sides a shape has, followed by the number of sides of equal length.
+This causes triangles to be sorted together, then rectangles, then pentagons,
+etc. Within each category, the shapes would be ordered according to whether they
+had 0, 2, 3 or more sides of the same length. However, this is not the only way
+the shapes can be sorted. For example, it might be preferable to sort shapes by
+color first, so that all red shapes are grouped together, then blue, etc.
+Another approach would be to sort the shapes by the amount of area they enclose.
+
+Similarly, character strings have properties, some of which can take precedence
+over others. There is more than one way to prioritize the properties.
+
+For example, a common approach is to distinguish characters first by their
+unadorned base letter (for example, without accents, vowels or tone marks), then
+by accents, and then by the case of the letter (upper vs. lower). Ideographic
+characters might be sorted by their component radicals and then by the number of
+strokes it takes to draw the character.
+An alternative ordering would be to sort these characters by strokes first and
+then by their radicals.
+
+The ICU Collation Service supports many levels of comparison (named "Levels",
+but also known as "Strengths"). Having these categories enables ICU to sort
+strings precisely according to local conventions. However, by allowing the
+levels to be selectively employed, searching for a string in text can be
+performed with various matching conditions.
+
+Performance optimizations have been made for ICU collation with the default
+level settings. Performance specific impacts are discussed in the Performance
+section below.
+
+Following is a list of the names for each level and an example usage:
+
+1.  Primary Level: Typically, this is used to denote differences between base
+    characters (for example, "a" < "b"). It is the strongest difference. For
+    example, dictionaries are divided into different sections by base character.
+    This is also called the level-1 strength.
+
+2.  Secondary Level: Accents in the characters are considered secondary
+    differences (for example, "as" < "às" < "at"). Other differences between
+    letters can also be considered secondary differences, depending on the
+    language. A secondary difference is ignored when there is a primary
+    difference anywhere in the strings. This is also called the level-2
+    strength.
+    Note: In some languages (such as Danish), certain accented letters are
+    considered to be separate base characters. In most languages, however, an
+    accented letter only has a secondary difference from the unaccented version
+    of that letter.
+
+3.  Tertiary Level: Upper and lower case differences in characters are
+    distinguished at the tertiary level (for example, "ao" < "Ao" < "aò"). In
+    addition, a variant of a letter differs from the base form on the tertiary
+    level (such as "A" and "Ⓐ"). Another example is the difference between large
+    and small Kana. A tertiary difference is ignored when there is a primary or
+    secondary difference anywhere in the strings. This is also called the
+    level-3 strength.
+
+4.  Quaternary Level: When punctuation is ignored (see Ignoring Punctuations
+    (§)) at level 1-3, an additional level can be used to distinguish words with
+    and without punctuation (for example, "ab" < "a-b" < "aB"). This difference
+    is ignored when there is a primary, secondary or tertiary difference. This
+    is also known as the level-4 strength. The quaternary level should only be
+    used if ignoring punctuation is required or when processing Japanese text
+    (see Hiragana processing (§)).
+
+5.  Identical Level: When all other levels are equal, the identical level is
+    used as a tiebreaker. The Unicode code point values of the NFD form of each
+    string are compared at this level, just in case there is no difference at
+    levels 1-4 . For example, Hebrew cantillation marks are only distinguished
+    at this level. This level should be used sparingly, as only code point
+    value differences between two strings is an extremely rare occurrence.
+    Using this level substantially decreases the performance for
+    both incremental comparison and sort key generation (as well as increasing
+    the sort key length). It is also known as level 5 strength.
+
+## Backward Secondary Sorting
+
+Some languages require words to be ordered on the secondary level according to
+the *last* accent difference, as opposed to the *first* accent difference. This
+was previously the default for all French locales, based on some French
+dictionary ordering traditions, but is currently only applicable to Canadian
+French (locale **fr_CA**), for conformance with the [Canadian sorting
+standard](http://www.unicode.org/reports/tr10/#CanStd). The difference in
+ordering is only noticeable for a small number of pairs of real words. For more
+information see [UCA: Contextual
+Sensitivity](http://www.unicode.org/reports/tr10/#Contextual_Sensitivity).
+
+Example:
+
+Forward secondary | Backward secondary
+----------------- | ------------------
+cote              | cote
+coté              | côte
+côte              | coté
+côté              | côté
+
+## Contractions
+
+A contraction is a sequence consisting of two or more letters. It is considered
+a single letter in sorting.
+
+For example, in the traditional Spanish sorting order, "ch" is considered a
+single letter. All words that begin with "ch" sort after all other words
+beginning with "c", but before words starting with "d".
+
+Other examples of contractions are "ch" in Czech, which sorts after "h", and
+"lj" and "nj" in Croatian and Latin Serbian, which sort after "l" and "n"
+respectively.
+
+Example:
+
+Order without contraction | Order with contraction "lj" sorting after letter "l"
+------------------------- | ----------------------------------------------------
+la                        | la
+li                        | li
+lj                        | lk
+lja                       | lz
+ljz                       | lj
+lk                        | lja
+lz                        | ljz
+ma                        | ma
+
+Contracting sequences such as the above are not very common in most languages.
+
+> :point_right: **Note** Since ICU 2.2, and as required by the UCA,
+> if a completely ignorable code point
+> appears in text in the middle of contraction, it will not break the contraction.
+> For example, in Czech sorting, cU+0000h will sort as it were ch.
+
+## Expansions
+
+If a letter sorts as if it were a sequence of more than one letter, it is called
+an expansion.
+
+For example, in German phonebook sorting (de@collation=phonebook or BCP 47
+de-u-co-phonebk), "ä" sorts as though it were equivalent to the sequence "ae."
+All words starting with "ä" will sort between words starting with "ad" and words
+starting with "af".
+
+In the case of Unicode encoding, characters can often be represented either as
+pre-composed characters or in decomposed form. For example, the letter "à" can
+be represented in its decomposed (a+\`) and pre-composed (à) form. Most
+applications do not want to distinguish text by the way it is encoded. A search
+for "à" should find all instances of the letter, regardless of whether the
+instance is in pre-composed or decomposed form. Therefore, either form of the
+letter must result in the same sort ordering. The architecture of the ICU
+Collation Service supports this.
+
+## Contractions Producing Expansions
+
+It is possible to have contractions that produce expansions.
+
+One example occurs in Japanese, where the vowel with a prolonged sound mark is
+treated to be equivalent to the long vowel version:
+
+カアー<<< カイー and\
+キイー<<< キイー
+
+> :point_right: **Note** Since ICU 2.0 Japanese tailoring uses
+> [prefix analysis](http://www.unicode.org/reports/tr35/tr35-collation.html#Context_Sensitive_Mappings)
+> instead of contraction producing expansions.
+
+## Normalization
+
+In the section on expansions, we discussed that text in Unicode can often be
+represented in either pre-composed or decomposed forms. There are other types of
+equivalences possible with Unicode, including Canonical and Compatibility. The
+process of
+Normalization ensures that text is written in a predictable way so that searches
+are not made unnecessarily complicated by having to match on equivalences. Not
+all text is normalized, however, so it is useful to have a collation service
+that can address text that is not normalized, but do so with efficiency.
+
+The ICU Collation Service handles un-normalized text properly, producing the
+same results as if the text were normalized.
+
+In practice, most data that is encountered is in normalized or semi-normalized
+form already. The ICU Collation Service is designed so that it can process a
+wide range of normalized or un-normalized text without a need for normalization
+processing. When a case is encountered that requires normalization, the ICU
+Collation Service drops into code specific to this purpose. This maximizes
+performance for the majority of text that does not require normalization.
+
+In addition, if the text is known with certainty not to contain un-normalized
+text, then even the overhead of checking for normalization can be eliminated.
+The ICU Collation Service has the ability to turn Normalization Checking either
+on or off. If Normalization Checking is turned off, it is the user's
+responsibility to insure that all text is already in the appropriate form. This
+is true in a great majority of the world languages, so normalization checking is
+turned off by default for most locales.
+
+If the text requires normalization processing, Normalization Checking should be
+on. Any language that uses multiple combining characters such as Arabic, ancient
+Greek, Hebrew, Hindi, Thai or Vietnamese either requires Normalization Checking
+to be on, or the text to go through a normalization process before collation.
+
+For more information about Normalization related reordering please see
+[Unicode Technical Note #5](http://www.unicode.org/notes/tn5/) and
+[UAX #15.](http://www.unicode.org/reports/tr15/)
+
+> :point_right: **Note** ICU supports two modes of normalization: on and off.
+> Java.text.\* classes offer compatibility decomposition mode, which is not supported in ICU.
+
+## Ignoring Punctuation
+
+In some cases, punctuation can be ignored while searching or sorting data. For
+example, this enables a search for "biweekly" to also return instances of
+"bi-weekly". In other cases, it is desirable for punctuated text to be
+distinguished from text without punctuation, but to have the text sort close
+together.
+
+These two behaviors can be accomplished if there is a way for a character to be
+ignored on all levels except for the quaternary level. If this is the case, then
+two strings which compare as identical on the first three levels (base letter,
+accents, and case) are then distinguished at the fourth level based on their
+punctuation (if any). If the comparison function ignores differences at the
+fourth level, then strings that differ by punctuation only are compared as
+equal.
+
+The following table shows the results of sorting a list of terms in 3 different
+ways. In the first column, punctuation characters (space " ", and hyphen "-")
+are not ignored (" " < "-" < "b"). In the second column, punctuation characters
+are ignored in the first 3 levels and compared only in the fourth level. In the
+third column, punctuation characters are ignored in the first 3 levels and the
+fourth level is not considered. In the last column, punctuated terms are
+equivalent to the identical terms without punctuation.
+
+For more options and details see the [“Ignore Punctuation”
+Options](customization/ignorepunct.md) page.
+
+Non-ignorable | Ignorable and Quaternary strength | Ignorable and Tertiary strength
+------------- | --------------------------------- | -------------------------------
+black bird    | black bird                        | **black bird**
+black Bird    | black-bird                        | **black-bird**
+black birds   | blackbird                         | **blackbird**
+black-bird    | black Bird                        | black Bird
+black-Bird    | black-Bird                        | black-Bird
+black-birds   | blackBird                         | blackBird
+blackbird     | black birds                       | black birds
+blackBird     | black-birds                       | black-birds
+blackbirds    | blackbirds                        | blackbirds
+
+> :point_right: **Note** The strings with the same font format in the last column are
+compared as equal by ICU Collator.\
+> Since ICU 2.2 and as prescribed by the UCA, primary ignorable code points that
+> follow shifted code points will be completely ignored. This means that an accent
+> following a space will compare as if it was a space alone.
+
+## Case Ordering
+
+The tertiary level is used to distinguish text by case, by small versus large
+Kana, and other letter variants as noted above.
+
+Some applications prefer to emphasize case differences so that words starting
+with the same case sort together. Some Japanese applications require the
+difference between small and large Kana be emphasized over other tertiary
+differences.
+
+The UCA does not provide means to separate out either case or Kana differences
+from the remaining tertiary differences. However, the ICU Collation Service has
+two options that help in customize case and/or Kana differences. Both options
+are turned off by default.
+
+### CaseFirst
+
+The Case-first option makes case the most significant part of the tertiary
+level. Primary and secondary levels are unaffected. With this option, words
+starting with the same case sort together. The Case-first option can be set to
+make either lowercase sort before
+uppercase or uppercase sort before lowercase.
+
+Note: The case-first option does not constitute a separate level; it is simply a
+reordering of the tertiary level.
+
+ICU makes use of the following three case categories for sorting
+
+1.  uppercase: "ABC"
+
+2.  mixed case: "Abc", "aBc"
+
+3.  normal (lowercase or no case): "abc", "123"
+
+Mixed case is always sorted between uppercase and normal case when the
+"case-first" option is set.
+
+### CaseLevel
+
+The Case Level option makes a separate level for case differences. This is an
+extra level positioned between secondary and tertiary. The case level is used in
+Japanese to make the difference between small and large Kana more important than
+the other tertiary differences. It also can be used to ignore other tertiary
+differences, or even secondary differences. This is especially useful in
+matching. For example, if the strength is set to primary only (level-1) and the
+case level is turned on, the comparison ignores accents and tertiary differences
+except for case. The contents of the case level are affected by the case-first
+option.
+
+The case level is independent from the strength of comparison. It is possible to
+have a collator set to primary strength with the case level turned on. This
+provides for comparison that takes into account the case differences, while at
+the same time ignoring accents and tertiary differences other than case. This
+may be used in searching.
+
+Example:
+
+**Case-first off, Case level off**
+
+apple\
+ⓐⓟⓟⓛⓔ\
+Abernathy\
+ⒶⒷⒺⓇⓃⒶⓉⒽⓎ\
+ähnlich\
+Ähnlichkeit
+
+**Lowercase-first, Case level off**
+
+apple\
+ⓐⓟⓟⓛⓔ\
+ähnlich\
+Abernathy\
+ⒶⒷⒺⓇⓃⒶⓉⒽⓎ\
+Ähnlichkeit
+
+**Uppercase-first, Case level off**
+
+Abernathy\
+ⒶⒷⒺⓇⓃⒶⓉⒽⓎ\
+Ähnlichkeit\
+apple\
+ⓐⓟⓟⓛⓔ\
+ähnlich
+
+**Lowercase-first, Case level on**
+
+apple\
+Abernathy\
+ⓐⓟⓟⓛⓔ\
+ⒶⒷⒺⓇⓃⒶⓉⒽⓎ\
+ähnlich\
+Ähnlichkeit
+
+**Uppercase-first, Case level on**
+
+Abernathy\
+apple\
+ⒶⒷⒺⓇⓃⒶⓉⒽⓎ\
+ⓐⓟⓟⓛⓔ\
+Ähnlichkeit\
+ähnlich
+
+## Script Reordering
+
+Script reordering allows scripts and some other groups of characters to be moved
+relative to each other. This reordering is done on top of the DUCET/CLDR
+standard collation order. Reordering can specify groups to be placed at the
+start and/or the end of the collation order.
+
+By default, reordering codes specified for the start of the order are placed in
+the order given after several special non-script blocks. These special groups of
+characters are space, punctuation, symbol, currency, and digit. Script groups
+can be intermingled with these special non-script groups if those special groups
+are explicitly specified in the reordering.
+
+The special code `others` stands for any script that is not explicitly mentioned
+in the list. Anything that is after others will go at the very end of the list
+in the order given. For example, `[Grek, others, Latn]` will result in an
+ordering that puts all scripts other than Greek and Latin between them.
+
+### Examples:
+
+Note: All examples below use the string equivalents for the scripts and reorder
+codes that would be used in collator rules. The script and reorder code
+constants that would be used in API calls will be different.
+
+**Example 1:**\
+set reorder code - `[Grek]`\
+result - `[space, punctuation, symbol, currency, digit, Grek, others]`
+
+**Example 2:**\
+set reorder code - `[Grek]`\
+result - `[space, punctuation, symbol, currency, digit, Grek, others]`
+
+followed by: set reorder code - `[Hani]`\
+result -` [space, punctuation, symbol, currency, digit, Hani, others]`
+
+That is, setting a reordering always modifies
+the DUCET/CLDR order, replacing whatever was previously set, rather than adding
+on to it. In order to cumulatively modify an ordering, you have to retrieve the
+existing ordering, modify it, and then set it.
+
+**Example 3:**\
+set reorder code - `[others, digit]`\
+result - `[space, punctuation, symbol, currency, others, digit]`
+
+**Example 4:**\
+set reorder code - `[space, Grek, punctuation]`\
+result - `[symbol, currency, digit, space, Grek, punctuation, others]`
+
+**Example 5:**\
+set reorder code - `[Grek, others, Hani]`\
+result - `[space, punctuation, symbol, currency, digit, Grek, others, Hani]`
+
+**Example 6:**\
+set reorder code - `[Grek, others, Hani, symbol, Tglg]`\
+result - `[space, punctuation, currency, digit, Grek, others, Hani, symbol, Tglg]`
+
+followed by:\
+set reorder code - `[NONE]`\
+result - DUCET/CLDR
+
+**Example 7:**\
+set reorder code - `[Grek, others, Hani, symbol, Tglg]`\
+result - `[space, punctuation, currency, digit, Grek, others, Hani, symbol, Tglg]`
+
+followed by:\
+set reorder code - `[DEFAULT]`\
+result - original reordering for the locale which may or may not be DUCET/CLDR
+
+**Example 8:**\
+set reorder code - `[Grek, others, Hani, symbol, Tglg]`\
+result - `[space, punctuation, currency, digit, Grek, others, Hani, symbol, Tglg]`
+
+followed by:\
+set reorder code - `[]`\
+result - original reordering for the locale which may or may not be DUCET/CLDR
+
+**Example 9:**\
+set reorder code - `[Hebr, Phnx]`\
+result - error
+
+Beginning with ICU 55, scripts only reorder together if they are primary-equal,
+for example Hiragana and Katakana.
+
+ICU 4.8-54:
+
+*   Scripts were reordered in groups, each normally starting with a [Recommended
+    Script](http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts).
+*   Reorder codes moved as a group (were “equivalent”) if their scripts shared a
+    primary-weight lead byte.
+*   For example, Hebr and Phnx were “equivalent” reordering codes and were
+    reordered together. Their order relative to each other could not be changed.
+*   Only any one code out of any group could be reordered, not multiple of the
+    same group.
+
+## Sorting of Japanese Text (JIS X 4061)
+
+Japanese standard JIS X 4061 requires two changes to the collation procedures:
+special processing of Hiragana characters and (for performance reasons) prefix
+analysis of text.
+
+### Hiragana Processing
+
+JIS X 4061 standard requires more levels than provided by the UCA. To offer
+conformant sorting order, ICU uses the quaternary level to distinguish between
+Hiragana and Katakana. Hiragana symbols are given smaller values than Katakana
+symbols on quaternary level, thus causing Hiragana sequences to sort before
+corresponding Katakana sequences.
+
+### Prefix Analysis
+
+Another characteristics of sorting according to the JIS X 4061 is a large number
+of contractions followed by expansions (see
+[Contractions Producing Expansions](#contractions-producing-expansions)).
+This causes all the Hiragana and Katakana codepoints to be treated as
+contractions, which reduces performance. The solution we adopted introduces the
+prefix concept which allows us to improve the performance of Japanese sorting.
+More about this can be found in the [customization
+chapter](customization/index.md) .
+
+## Thai/Lao reordering
+
+UCA requires that certain Thai and Lao prevowels be reordered with a code point
+following them. This option is always on in the ICU implementation, as
+prescribed by the UCA.
+
+This rule takes effect when:
+
+1.  A Thai vowel of the range \\U0E40-\\U0E44 precedes a Thai consonant of the
+    range \\U0E01-\\U0E2E
+    or
+
+2.  A Lao vowel of the range \\U0EC0-\\U0EC4 precedes a Lao consonant of the
+    range \\U0E81-\\U0EAE. In these cases the vowel is placed after the
+    consonant for collation purposes.
+
+> :point_right: **Note** There is a difference between java.text.\* classes and ICU in regard to Thai
+> reordering. Java.text.\* classes allow tailorings to turn off reordering by
+> using the '!' modifier. ICU ignores the '!' modifier and always reorders Thai
+> prevowels.
+
+## Space Padding
+
+In many database products, fields are padded with null. To get correct results,
+the input to a Collator should omit any superfluous trailing padding spaces. The
+problem arises with contractions, expansions, or normalization. Suppose that
+there are two fields, one containing "aed" and the other with "äd". German
+phonebook sorting (de@collation=phonebook or BCP 47 de-u-co-phonebk) will
+compare "ä" as if it were "ae" (on a primary level), so the order will be "äd" <
+"aed". But if both fields are padded with spaces to a length of 3, then this
+will reverse the order, since the first will compare as if it were one character
+longer. In other words, when you start with strings 1 and 2
+
+1  | a  | e  | d         | \<space\>
+-- | -- | -- | --------- | ---------
+2  | ä  | d  | \<space\> | \<space\>
+
+they end up being compared on a primary level as if they were 1' and 2'
+
+1' | a  | e  | d  | \<space\> | &nbsp;
+-- | -- | -- | -- | --------- | ---------
+2' | a  | e  | d  | \<space\> | \<space\>
+
+Since 2' has an extra character (the extra space), it counts as having a primary
+difference when it shouldn't. The correct result occurs when the trailing
+padding spaces are removed, as in 1" and 2"
+
+1" | a  | e  | d
+-- | -- | -- | --
+2" | a  | e  | d
+
+## Collator naming scheme
+
+***Starting with ICU 54, the following naming scheme and its API functions are
+deprecated.*** Use ucol_open() with language tag collation keywords instead (see
+[Collation API Details](api.md)). For example,
+ucol_open("de-u-co-phonebk-ka-shifted", &errorCode) for German Phonebook order
+with "ignore punctuation" mode.
+
+When collating or matching text, a number of attributes can be used to affect
+the desired result. The following describes the attributes, their values, their
+effects, their normal usage, and the string comparison performance and sort key
+length implications. It also includes single-letter abbreviations for both the
+attributes and their values. These abbreviations allow a 'short-form'
+specification of a set of collation options, such as "UCA4.0.0_AS_LSV_S", which
+can be used to specific that the desired options are: UCA version 4.0.0; ignore
+spaces, punctuation and symbols; use Swedish linguistic conventions; compare
+case-insensitively.
+
+A number of attribute values are common across different attributes; these
+include **Default** (abbreviated as D), **On** (O), and **Off** (X). Unless
+otherwise stated, the examples use the UCA alone with default settings.
+
+> :point_right: **Note** In order to achieve uniqueness, a collator name always
+> has the attribute abbreviations sorted.
+
+### Main References
+
+1.  For a full list of supported locales in ICU, see [Locale
+    Explorer](http://demo.icu-project.org/icu-bin/locexp) , which also contains
+    an on-line demo showing sorting for each locale. The demo allows you to try
+    different attribute values, to see how they affect sorting.
+
+2.  To see tabular results for the UCA table itself, see the [Unicode Collation
+    Charts](http://www.unicode.org/charts/collation/) .
+
+3.  For the UCA specification, see [UTS #10: Unicode Collation
+    Algorithm](http://www.unicode.org/reports/tr10/) .
+
+4.  For more detail on the precise effects of these options, see [Collation
+    Customization](customization/index.md) .
+
+#### Collator Naming Attributes
+
+Attribute              | Abbreviation | Possible Values
+---------------------- | ------------ | ---------------
+Locale                 | L            | \<language\>
+Script                 | Z            | \<script\>
+Region                 | R            | \<region\>
+Variant                | V            | \<variant\>
+Keyword                | K            | \<keyword\>
+&nbsp;                 | &nbsp;       | &nbsp;
+Strength               | S            | 1, 2, 3, 4, I, D
+Case_Level             | E            | X, O, D
+Case_First             | C            | X, L, U, D
+Alternate              | A            | N, S, D
+Variable_Top           | T            | \<hex digits\>
+Normalization Checking | N            | X, O, D
+French                 | F            | X, O, D
+Hiragana               | H            | X, O, D
+
+#### Collator Naming Attribute Descriptions
+
+The **Locale** attribute is typically the most
+important attribute for correct sorting and matching, according to the user
+expectations in different countries and regions. The default UCA ordering will
+only sort a few languages such as Dutch and Portuguese correctly ("correctly"
+meaning according to the normal expectations for users of the languages).
+Otherwise, you need to supply the locale to UCA in order to properly collate
+text for a given language. Thus a locale needs to be supplied so as to choose a
+collator that is correctly **tailored** for that locale. The choice of a locale
+will automatically preset the values for all of the attributes to something that
+is reasonable for that locale. Thus most of the time the other attributes do not
+need to be explicitly set. In some cases, the choice of locale will make a
+difference in string comparison performance and/or sort key length.
+
+In short attribute names,
+`<language>_<script>_<region>_<variant>@collation=<keyword>` is
+represented by: `L<language>_Z<script>_R<region>_V<variant>_K<keyword>`. Not
+all the elements are required. Valid values for locale elements are general
+valid values for RFC 3066 locale naming.
+
+**Example:**\
+**Locale="sv" (Swedish)** "Kypper" < "Köpfe"\
+**Locale="de" (German)** "Köpfe" < "Kypper"
+
+The **Strength** attribute determines whether accents or
+case are taken into account when collating or matching text. ( (In writing
+systems without case or accents, it controls similarly important features). The
+default strength setting usually does not need to be changed for collating
+(sorting), but often needs to be changed when **matching** (e.g. SELECT). The
+possible values include Default (D), Primary (1), Secondary (2), Tertiary (3),
+Quaternary (4), and Identical (I).
+
+For example, people may choose to ignore accents or ignore accents and case when
+searching for text.
+
+Almost all characters are distinguished by the first three levels, and in most
+locales the default value is thus Tertiary. However, if Alternate is set to be
+Shifted, then the Quaternary strength (4) can be used to break ties among
+whitespace, punctuation, and symbols that would otherwise be ignored. If very
+fine distinctions among characters are required, then the Identical strength (I)
+can be used (for example, Identical Strength distinguishes between the
+**Mathematical Bold Small A** and the **Mathematical Italic Small A.** For more
+examples, look at the cells with white backgrounds in the collation charts).
+However, using levels higher than Tertiary - the Identical strength - result in
+significantly longer sort keys, and slower string comparison performance for
+equal strings.
+
+**Example:**\
+**S=1** role = Role = rôle\
+**S=2** role = Role < rôle\
+**S=3** role < Role < rôle
+
+The **Case_Level** attribute is used when ignoring accents
+**but not** case. In such a situation, set Strength to be Primary, and
+Case_Level to be On. In most locales, this setting is Off by default. There is a
+small string comparison performance and sort key impact if this attribute is set
+to be On.
+
+**Example:**\
+**S=1, E=X** role = Role = rôle\
+**S=1, E=O** role = rôle < Role
+
+The **Case_First** attribute is used to control whether
+uppercase letters come before lowercase letters or vice versa, in the absence of
+other differences in the strings. The possible values are Uppercase_First (U)
+and Lowercase_First (L), plus the standard Default and Off. There is almost no
+difference between the Off and Lowercase_First options in terms of results, so
+typically users will not use Lowercase_First: only Off or Uppercase_First.
+(People interested in the detailed differences between X and L should consult
+the [Collation Customization](customization/index.md) ).
+Specifying either L or U won't affect string comparison performance, but will
+affect the sort key length.
+
+**Example:**\
+**C=X or C=L** "china" < "China" < "denmark" < "Denmark"\
+**C=U** "China" < "china" < "Denmark" < "denmark"
+
+The **Alternate** attribute is used to control the handling of
+the so-called **variable **characters in the UCA: whitespace, punctuation and
+symbols. If Alternate is set to Non-Ignorable (N), then differences among these
+characters are of the same importance as differences among letters. If Alternate
+is set to Shifted (S), then these characters are of only minor importance. The
+Shifted value is often used in combination with Strength set to Quaternary. In
+such a case, white-space, punctuation, and symbols are considered when comparing
+strings, but only if all other aspects of the strings (base letters, accents,
+and case) are identical. If Alternate is not set to Shifted, then there is no
+difference between a Strength of 3 and a Strength of 4.
+
+For more information and examples, see
+[Variable_Weighting](http://www.unicode.org/reports/tr10/#Variable_Weighting) in
+the UCA.
+
+The reason the Alternate values are not simply On and Off is that
+additional Alternate values may be added in the future.
+
+The UCA option
+**Blanked** is expressed with Strength set to 3, and Alternate set to Shifted.
+
+The default for most locales is Non-Ignorable. If Shifted is selected, it may be
+slower if there are many strings that are the same except for punctuation; sort
+key length will not be affected unless the strength level is also increased.
+
+**Example:**\
+**S=3, A=N** di Silva < Di Silva < diSilva < U.S.A. < USA\
+**S=3, A=S** di Silva = diSilva < Di Silva < U.S.A. = USA\
+**S=4, A=S** di Silva < diSilva < Di Silva < U.S.A. < USA
+
+The **Variable_Top** attribute is only meaningful if the
+Alternate attribute is not set to Non-Ignorable. In such a case, it controls
+which characters count as ignorable. The \<hex\> value specifies the "highest"
+character sequence (in UCA order) weight that is to be considered ignorable.
+
+Thus, for example, if a user wanted white-space to be ignorable, but not any
+visible characters, then s/he would use the value Variable_Top=0020 (space). The
+digits should only be a single character. All characters of the same primary
+weight are equivalent, so Variable_Top=3000 (ideographic space) has the same
+effect as Variable_Top=0020.
+
+This setting (alone) has little impact on string comparison performance; setting
+it lower or higher will make sort keys slightly shorter or longer respectively.
+
+**Example:**\
+**S=3, A=S** di Silva = diSilva < U.S.A. = USA\
+**S=3, A=S, T=0020** di Silva = diSilva < U.S.A. < USA
+
+The **Normalization** setting determines whether
+text is thoroughly normalized or not in comparison. Even if the setting is off
+(which is the default for many locales), text as represented in common usage
+will compare correctly (for details, see [UTN
+#5](http://www.unicode.org/notes/tn5/)). Only if the accent marks are in
+non-canonical order will there be a problem. If the setting is On, then the best
+results are guaranteed for all possible text input.There is a medium string
+comparison performance cost if this attribute is On, depending on the frequency
+of sequences that require normalization. There is no significant effect on sort
+key length.If the input text is known to be in NFD or NFKD normalization forms,
+there is no need to enable this Normalization option.
+
+**Example:**\
+**N=X** ä = a + ◌̈ < ä + ◌̣ < ạ + ◌̈\
+**N=O** ä = a + ◌̈ < ä + ◌̣ = ạ + ◌̈
+
+Some **French** dictionary ordering traditions sort strings with
+different accents from the back of the string. This attribute is automatically
+set to On for the Canadian French locale (fr_CA). Users normally would not need
+to explicitly set this attribute. There is a string comparison performance cost
+when it is set On, but sort key length is unaffected.
+
+**Example:**\
+**F=X** cote < coté < côte < côté\
+**F=O** cote < côte < coté < côté
+
+Compatibility with JIS x 4061 requires the introduction of an
+additional level to distinguish **Hiragana** and Katakana characters. If
+compatibility with that standard is required, then this attribute is set On, and
+the strength should be set to at least Quaternary.
+
+This attribute is an implementation detail of the CLDR Japanese tailoring. The
+implementation might change to use a different mechanism to achieve the same
+Japanese sort order. Since ICU 50, this attribute is not settable any more.
+
+**Example:**\
+**H=X, S=4** きゅう = キュウ < きゆう = キユウ\
+**H=O, S=4** きゅう < キュウ < きゆう < キユウ
+
+> :point_right: **Note** If attributes in collator name are not overridden,
+> it is assumed that they are the same as for the given locale.
+> For example, a collator opened with an empty
+> string has the same attribute settings as **AN_CX_EX_FX_HX_KX_NX_S3_T0000**.*
+
+### Summary of Value Abbreviations
+
+Value         | Abbreviation
+------------- | ------------
+Default       | D
+On            | O
+Off           | X
+Primary       | 1
+Secondary     | 2
+Tertiary      | 3
+Quaternary    | 4
+Identical     | I
+Shifted       | S
+Non-Ignorable | N
+Lower-First   | L
+Upper-First   | U
diff --git a/docs/userguide/collation/customization/ignorepunct.md b/docs/userguide/collation/customization/ignorepunct.md
new file mode 100644
index 0000000..6ec6605
--- /dev/null
+++ b/docs/userguide/collation/customization/ignorepunct.md
@@ -0,0 +1,161 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# “Ignore Punctuation” Options
+
+By default, spaces and punctuation characters add primary (base character)
+differences. Such characters sort less-than digits and letters. For example, the
+default collation yields “De Anza” < “de-luge” < “deanza”.
+
+UCA/CLDR/ICU provide several options for “ignore punctuation” collation
+settings, also known as Variable Weighting or Alternate Handling. These options
+change the sorting behavior of “variable” characters algorithmically. “Variable”
+characters are those with low (but non-zero) primary weights up to a threshold,
+the “variable top”. By default, CLDR and ICU treat spaces and punctuation as
+variable. (This can be changed via API.) The DUCET also includes most symbols.
+
+## Non-Ignorable
+
+The default behavior in CLDR & ICU, shown above, is to not ignore punctuation
+(alternate=non-ignorable) but to map variable characters to their normal primary
+collation elements.
+
+All of the following options cause variable characters to be ignored on levels
+1..3. Only when strings compare equal up to the tertiary level may variable
+characters make a difference, depending on the options.
+
+See also
+
+*   [UCA: Variable
+    Weighting](http://www.unicode.org/reports/tr10/#Variable_Weighting)
+*   [LDML: Setting
+    Options](http://unicode.org/repos/cldr/trunk/specs/ldml/tr35-collation.html#Setting_Options)
+
+Here is an overview of the sorting results with these options.
+
+Non-ignorable | Blanked      | Shifted | Shift-Trimmed | Variable-After
+------------- | ------------ | ------- | ------------- | --------------
+delug         | delug        | delug   | delug         | delug
+de-luge       | de-luge      | de-luge | *deluge*      | *deluge*
+delu-ge       | delu-ge (*)  | delu-ge | de-luge       | deluge-
+*deluge*      | *deluge* (*) | *deluge* | delu-ge     | delu-ge
+Deluge        | deluge- (*)  | deluge-  | deluge-     | de-luge
+deluge-       | Deluge       | Deluge   | Deluge      | Deluge
+
+Items with (*) compare equal to the preceding ones, and their relative order
+is arbitrary. These only occur in the Blanked column. This table shows the
+results of a stable sort algorithm with the non-ignorable column as input.
+
+## Blanked
+
+The simplest option is to “ignore punctuation” completely, as if all variable
+characters (and following combining marks) had been removed from the input
+strings before comparing them.
+
+For example: “De Anza” = “De-Anza” = “DeAnza”.
+
+In ICU, this option is selected with alternate=shifted and
+strength=primary|secondary|tertiary. (ICU does not support Blanked combined with
+strength=identical.)
+
+The implementation “blanks” out all weights of the variable characters’
+collation elements.
+
+*With all of the following options, variable characters are ignored on levels
+1..3 but add distinctions on level 4 (quaternary level).*
+
+## Shifted
+
+Among strings that compare tertiary-equal, that is, they contain the same
+letters, accents and casing:
+
+*   Sorts all variable characters less-than (before) regular characters.
+*   Appending a variable character makes a string sort *greater-than* the string
+    without it.
+*   *Inserting* a variable character makes a string sort *less-than* the string
+    without it.
+*   Inserting a variable character *earlier* in a string makes it sort
+    *less-than* inserting the variable character *later* in the string.
+
+The result is similar to [Merging Sort
+Keys](http://www.unicode.org/reports/tr10/#Merging_Sort_Keys) (with shorter
+prefixes sorting less-than longer ones), like in last-name+first-name sorting,
+except only among tertiary-equal strings.
+
+For example: “de-luge” < “delu-ge” < “deluge” < “deluge-”.
+
+In ICU, this option is selected with alternate=shifted and
+strength=quaternary|identical.
+
+The implementation “shifts” the primary weight p of the collation element \[p,
+s, t, q\] of each variable characters down three levels: \[0, 0, 0, p\]. Regular
+characters with primary collation elements get a high quaternary weight, higher
+than that of any variable character.
+
+Note that this behavior is different from collation on secondary and tertiary
+level, because normal collation elements get low secondary & tertiary weights
+but high quaternary weights. Adding an accent difference anywhere makes a string
+sort greater-than the string without it, and adding an accent difference earlier
+makes it sort greater-than adding it later. For example, “deanza” < “deanzä” <
+“deänza” < “dëanza”. (Compare the ‘ä’/‘ë’ positions here with the ‘-’ positions
+above.)
+
+## Shift-Trimmed
+
+*Note: This method is not currently implemented in ICU.*
+
+Among strings that compare tertiary-equal:
+
+*   Sorts variable characters sometimes less-than, sometimes greater-than
+    regular characters.
+*   Inserting a variable character anywhere makes a string sort *greater-than*
+    the string without it. (The string without variable characters gets an empty
+    quaternary level.)
+*   Inserting a variable character *earlier* in a string makes it sort
+    *less-than* inserting the variable character *later* in the string.
+
+For example: “deluge” < “de-luge” < “delu-ge” < “deluge-”.
+
+The Shift-Trimmed method works like Shifted, except that *trailing*
+high-quaternary weights (from regular characters) are removed (trimmed).
+Compared with Shifted, the Shift-Trimmed method sorts strings without variable
+characters before ones with variable characters added, rather than producing the
+equivalent of [Merging Sort
+Keys](http://www.unicode.org/reports/tr10/#Merging_Sort_Keys).
+
+Shift-Trimmed is more complicated to implement than all of the other options:
+When comparing strings, a lookahead (or equivalent) is needed to determine
+whether a non-variable character gets a zero quaternary weight (if no variables
+follow) or a high quaternary weight (if at least one variable follows). When
+building sort keys, trailing high/common quaternary weights are trimmed (backed
+out) at the end of the quaternary level.
+
+## Variable-After
+
+*Note: This method is not currently implemented in ICU.*
+
+Among strings that compare tertiary-equal:
+
+*   Sorts all variable characters greater-than (after) regular characters.
+*   Inserting a variable character anywhere makes a string sort *greater-than*
+    the string without it. (Like Shift-Trimmed.)
+*   Inserting a variable character *earlier* in a string makes it sort
+    *greater-than* inserting the variable character *later* in the string. (Like
+    accent differences.)
+
+For example: “deluge” < “deluge-” < “delu-ge” < “de-luge”.
+
+The implementation “shifts” the primary weight p of the collation element \[p,
+s, t, q\] of each variable characters down three levels: \[0, 0, 0, p\]. Regular
+characters with primary collation elements get a *low* quaternary weight,
+*lower* than that of any variable character. This is consistent with collation
+on secondary and tertiary levels but unlike [Merging Sort
+Keys](http://www.unicode.org/reports/tr10/#Merging_Sort_Keys).
+
+This method extends the [UCA well-formedness condition
+2](http://www.unicode.org/reports/tr10/#WF2) to apply to quaternary weights.
+(UCA versions before UCA 6.2 did not limit WF2 to secondary & tertiary weights,
+which meant that several of the Variable Weighting options technically created
+ill-formed quaternary weights.)
diff --git a/docs/userguide/collation/customization/index.md b/docs/userguide/collation/customization/index.md
new file mode 100644
index 0000000..2a1c874
--- /dev/null
+++ b/docs/userguide/collation/customization/index.md
@@ -0,0 +1,1059 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Collation Customization
+
+ICU uses the [CLDR root collation
+order](http://www.unicode.org/reports/tr35/tr35-collation.html#Root_Collation)
+as a default starting point for ordering. (The CLDR root collation is based on
+the [UCA
+DUCET](http://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table).)
+Not all languages have sorting sequences that correspond with the root collation
+order because no single sort order can simultaneously encompass the specifics of
+all the languages. In particular, languages that share a script may sort the
+same letters differently.
+
+Therefore, ICU provides a data-driven, flexible, and run-time-customizable
+mechanism called "tailoring". Tailoring overrides the default order of code
+points and the values of the ICU Collation Service attributes.
+
+## Collation Rule
+
+A `RuleBasedCollator` is built from a rule string which changes the sort order of
+some characters and strings relative to the default order. An empty string (or
+one with only white space and comments) results in a collator that behaves like
+the root collator.
+
+A tailoring is specified via a string containing a set of rules. ICU implements
+the (CLDR) [LDML collation rule
+syntax](http://www.unicode.org/reports/tr35/tr35-collation.html#Rules). For more
+details see there.
+
+Each rule contains a string of ordered characters that starts with an **anchor
+point** or a **reset value**. For example, `"&a < g"`, places "g"
+after "a" and before "b", and the "a" does not change place. This rule has the
+following sorting consequences:
+
+Without rule | With rule
+------------ | ---------
+Abernathy    | Abernathy
+apple        | apple
+bird         | green
+Boston       | bird
+Graham       | Boston
+green        | Graham
+
+Note that only the word that starts with "g" has changed place. All the words
+sorted after "a" and "A" are sorted after "g".
+This includes "Graham"; "G" would have to be tailored separately, such as with
+`"&a < g <<< G"`.
+
+This is a non-complex example of a tailoring rule. Tailoring rules consist of
+zero or more rules and zero or more options. There must be at least one rule or
+at least one option. The rule syntax is discussed in more detail in the
+following sections.
+
+Note that the tailoring rules override the UCA ordering. In addition, if a
+character is reordered, it automatically reorders any other equivalent
+characters. For example, if the rule "&e<a" is used to reorder "a" in the list,
+"á" is also greater than "é".
+
+## Syntax
+
+The following table summarizes the basic syntax necessary for most usages:
+
+Symbol | Example&nbsp; | Description
+------ | ------------- | ----------------------------------
+`<`    | `a < b`       | Identifies a primary (base letter) difference between "a" and "b"
+`<<`   | `a << ä`      | Signifies a secondary (accent) difference between "a" and "ä"
+`<<<`  | `a<<<A`       | Identifies a tertiary difference between "a" and "A"
+`<<<<` | `か<<<<カ`     | Identifies a quaternary difference between "か" and "カ". (New in ICU 53.)
+`=`    | `x = y`       | Signifies no difference between "x" and "y".
+`&`    | `&Z`          | Instructs ICU to reset at this letter. These rules will be relative to this letter from here on, but will not affect the position of Z itself.
+
+> :point_right: **Note**: ICU permits up to three quaternary relations in a row
+> (except for intervening "=" identity relations).
+
+> :point_right: **Note**: In releases prior to 1.8,
+> ICU used the notations `;` to represent secondary relations and `,` to represent tertiary relations.
+> Starting in release 1.8, use `<<` symbols to represent secondary relations and
+> `<<<` symbols to represent tertiary relations.
+> Rules that use the `;` and `,` notations are still processed by ICU for compatibility;
+> also, some of the data used for tailoring to particular locales
+> has not yet been updated to the new syntax.
+> However, one should consider these symbols deprecated.
+
+> :point_right: **Note**: See the [LDML collation rule syntax](http://www.unicode.org/reports/tr35/tr35-collation.html#Rules)
+> and [Properties and ICU Rule Syntax](../../strings/properties.md) for
+> information regarding syntax characters.
+
+Repeated use of the same relation can be abbreviated, for example
+`&a <* bcd-gp-s` for `&a < b < c < d < e < f < g < p < q < r < s`.
+For details see the
+[LDML collation spec, section
+Orderings](http://www.unicode.org/reports/tr35/tr35-collation.html#Orderings).
+
+### Escaping Rules
+
+Most of the characters can be used as parts of rules. However, whitespace
+characters will be skipped over, and all ASCII characters that are not digits or
+letters are considered to be part of syntax. In order to use these characters in
+rules, they need to be escaped. Escaping can be done in several ways:
+
+*   Single characters can be escaped using backslash **\\** (U+005C).
+
+*   Strings can be escaped by putting them between single quotes **'like
+    this'**.
+
+*   The single quote (ASCII apostrophe) can be quoted using two single quotes
+    **''**, both inside and outside single-quote-escaped strings.
+
+### Simple Tailoring Examples
+
+Serbian (Latin) or Croatian: `& C < č <<< Č < ć <<< Ć`
+
+This rule is needed because the root collation order usually considers accents
+to have secondary differences in order to base character. This rule ensures that 'ć'
+'č' are treated as base letters.
+
+UCA             | Tailoring: `& C < č <<< Č < ć <<< Ć`
+--------------- | --------------
+CUKIĆ RADOJICA  | CUKIĆ RADOJICA
+ČUKIĆ SLOBODAN  | CUKIĆ SVETOZAR
+CUKIĆ SVETOZAR  | CURIĆ MILOŠ
+ČUKIĆ ZORAN     | CVRKALJ ÐURO
+CURIĆ MILOŠ     | ČUKIĆ SLOBODAN
+ĆURIĆ MILOŠ     | ČUKIĆ ZORAN
+CVRKALJ ÐURO    | ĆURIĆ MILOŠ
+
+Serbian (Latin) or Croatian: `& Ð < dž <<< Dž <<< DŽ`
+
+This rule is an example of a contraction. "D" alone is sorted after "C" and "Ž"
+is sorted after "Z", but "DŽ", due to the tailoring rule, is treated as a single
+letter that gets sorted after "Đ" and before "E" ("Đ" sorts as a base letter
+after "D" in the UCA). Another thing to note in this example is capitalization
+of the letter "DŽ". There are three versions, since all three can legally appear
+in text. The fourth version "dŽ" is omitted since it does not occur.
+
+UCA      | Tailoring: `& Ð < dž <<< Dž <<< DŽ`
+-------- | ---------
+dan      | dan
+dubok    | dubok
+džabe    | đak
+džin     | džabe
+Džin     | džin
+DŽIN     | Džin
+đak      | DŽIN
+Evropa   | Evropa
+
+Danish: `&V <<< w <<< W`
+
+The letter 'W' is sorted after 'V', but is treated as a tertiary difference
+similar to the difference between 'v' and 'V'.
+
+UCA | `&V <<< w <<< W`
+--- | ----------------
+va  | va
+Va  | Va
+VA  | VA
+vb  | wa
+Vb  | Wa
+VB  | WA
+vz  | vb
+Vz  | Vb
+VZ  | VB
+wa  | wb
+Wa  | Wb
+WA  | WB
+wb  | vz
+Wb  | Vz
+WB  | VZ
+wz  | wz
+Wz  | Wz
+WZ  | WZ
+
+### Default Options
+
+ICU implements the [LDML collation
+options/settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options).
+For more information see there.
+
+The tailoring inherits all the attribute values from the root collator unless
+they are explicitly redefined in the tailoring. The following summarizes
+the option settings. Default options are **in emphasis**.
+
+#### alternate
+- **`[alternate non-ignorable]`**
+- `[alternate shifted]`
+
+Sets the default value of the UCOL_ALTERNATE_HANDLING attribute. If
+set to shifted, variable code points will be ignored on the primary level.
+For details see the [“Ignore Punctuation” Options](ignorepunct.md) page.
+
+#### maxVariable
+- **`[maxVariable punct]`**
+- `[maxVariable space]`
+
+Sets the variable top to the top of the specified
+reordering group. (New in ICU 53.) All code points with primary weights less
+than or equal to the variable top will be considered variable, and thus affected
+by the alternate handling.
+
+#### variable top
+(deprecated)
+- `& X < [variable top]`
+
+Sets the default value for the variable top. All the code points with primary
+strengths less than variable top will be considered variable.
+*Changing the variable top via this rule syntax is deprecated since ICU 53.*
+It has been replaced by the maxVariable option.
+
+#### normalization
+- **`[normalization off]`**
+- `[normalization on]`
+
+Turns on or off the UCOL_NORMALIZATION_MODE attribute.
+If set to on, a quick check and neccessary normalization will be performed.
+
+#### strength
+- `[strength 1]`
+- `[strength 2]`
+- **`[strength 3]`**
+- `[strength 4]`
+- `[strength I]`
+
+Sets the default strength for the collator.
+
+#### backwards
+- `[backwards 2]`
+
+Sets the default value of the UCOL_FRENCH_COLLATION attribute. If set to on,
+weights on the secondary level will be reversed.
+
+#### caseLevel
+- **`[caseLevel off]`**
+- `[caseLevel on]`
+
+Turns on or off the UCOL_CASE_LEVEL attribute. If set to on a
+level consisting only of case characteristics will be inserted in front of
+tertiary level. To ignore accents but take cases into account, set strength to
+primary and case level to on.
+
+#### caseFirst
+- **`[caseFirst off]`**
+- `[caseFirst upper]`
+- `[caseFirst lower]`
+
+Sets the value for the UCOL_CASE_FIRST attribute. If set to
+upper, causes upper case to sort before lower case. If set to lower, lower case
+will sort before upper case. Useful for locales that have an already supported
+ordering but require different order of cases. Affects case and tertiary levels.
+
+#### numericOrdering
+- **`[numericOrdering off]`**
+- `[numericOrdering on]`
+
+Turns on or off the UCOL_NUMERIC_COLLATION attribute. If
+set to on, then sequences of decimal digits (gc=Nd) sort by their numeric value.
+
+#### hiraganaQ
+(deprecated)
+- **`[hiraganaQ off]`**
+- `[hiraganaQ on]`
+
+Controls special treatment of Hiragana code points on
+quaternary level. If turned on, Hiragana code points will get lower values than
+all the other non-variable code points. Strength must be greater or equal than
+quaternary if you want this attribute to take effect.
+*hiraganaQ is deprecated since ICU 50.* It was an implementation detail of the
+Japanese tailoring. In CLDR 25/ICU 53, the Japanese tailoring expresses the
+differences between Hiragana and Katakana via explicit quaternary (`<<<<`)
+relations.
+
+#### suppressContractions
+- `[suppressContractions [Љ-ґ]]`
+
+Removes context-sensitive mappings (contractions and prefix/context-before mappings)
+associated with each of the code points in the given UnicodeSet. It works on the
+current set of rules: It removes mappings from the root collation as well as
+from previous rules.
+
+This is the only way to *remove* mappings: The rule syntax otherwise only adds
+and overrides mappings. This special command is used in CLDR tailoring data to
+remove Cyrillic root collation contractions that are not necessary in several
+languages.
+
+#### optimize
+- `[optimize [Ά-ώ]]`
+
+Performance optimization for the code points in the UnicodeSet.
+In ICU, where tailoring data only contains the
+mappings that are different from the root collation (otherwise the data would be
+too large), falling back to root collation mappings for the rest of Unicode is
+slightly slower. The optimize command copies mappings for additional characters
+into the tailoring data.
+
+#### reorder
+followed by one or more reorder codes
+- `[reorder Grek Hani space]` 
+
+Reorders scripts relative to each other and relative to a special set of
+non-script blocks (space, punctuation, symbol, currency, and digit). The default
+order is the same as in the DUCET and in the CLDR root collator.
+
+----
+
+A tailoring that consists only of options is also valid and has the same basic
+ordering as the root collation. For example, the Greek tailoring has option
+settings only: `[normalization on][reorder Grek]`
+
+(The examples in this chapter might refer to older versions of data for
+particular languages. Check CLDR or ICU for actual, current tailorings.)
+
+The following tailoring example reorders uppercase and lowercase and uses
+backwards-secondary ordering:
+
+```
+[caseFirst upper]
+[backwards 2]
+& C < č , Č
+& G < ģ , Ģ
+& I < y, Y
+& K < ķ , Ķ
+& L < ļ , Ļ
+& N < ņ , Ņ
+& S < š , Š
+& Z < ž , Ž
+```
+
+#### Values for Reorder Codes
+
+Reordering Group                         | Rule Value
+---------------------------------------- | ----------
+Unicode white space characters           | space
+Unicode punctuation                      | punct
+Unicode symbols except currency symbols  | symbol
+Unicode currency symbols                 | currency
+Unicode decimal digits                   | digit
+Unicode scripts not mentioned ("others") |Zzzz (= Unknown script)
+
+In addition, ISO **4-letter script codes** can be used. Codes for scripts that
+do not have Unicode characters (according to the Unicode Script property values)
+are ignored.
+
+Limitations of ICU 4.8-52: (Except `Kore` is still not usable because it refers
+to multiple scripts that do not sort primary-equal.)
+
+*   For Chinese, use script code `Hani`, *not* `Hans` or `Hant`.
+*   For Japanese, use both `Kana` and `Hani` (*not* `Hira`).
+*   For Korean, use both `Hang` and `Hani` (*not* `Kore`).
+
+#### Semantics of a List of Reorder Codes
+
+This section is relevant for both the `[reorder ...]` rule syntax and the
+`Collator.setReorderCodes()` API.
+
+For an introduction and examples see the section “Script Reordering” in the
+[Collation Concepts chapter](../concepts.md).
+
+On the API, the special groups are represented with `Collator.ReorderCode`s
+(`UColReorderCode`) values rather than `UScript` (`UScriptCode`) values.
+
+In ICU 4.8-54, not every script could be reordered independently. CLDR and ICU
+supported reordering of groups of scripts, each of which started with one of the
+[Recommended
+Scripts](http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts). A
+script that is not Recommended always moved together with the Recommended Script
+that precedes it in DUCET order. (Hiragana sorts together with Katakana, Coptic
+with Greek, etc.) ICU allowed any one script of a (Recommended Script +
+DUCET-following) group in the `[reorder]` list, moving the whole set of scripts
+together. However, it was strongly recommended that only Recommended Scripts be
+used.
+
+Beginning with ICU 55, scripts only reorder together if they are primary-equal,
+for example Hiragana and Katakana.
+
+Zyyy=Common and Zinh=Inherited cannot be reordered.
+
+The special code Zzzz (= Unknown script = `UScript.UNKNOWN` =
+`Collator.ReorderCodes.OTHERS` = "others") stands for any script that is not
+explicitly mentioned in the list of reordering codes. If Zzzz is mentioned in
+the list, then any groups and scripts mentioned later in the list will go at the
+very end of the reordering, in the order given. If Zzzz is not mentioned, then
+all scripts that are not explicitly listed follow at the end in DUCET order.
+
+The special reorder code `Collator.ReorderCodes.NONE` (= `UScript.UNKNOWN`), when
+used alone (same as `[reorder Zzzz]` or not specifying a `[reorder]` rule in a
+tailoring), will remove any reordering for this collator. The result of setting
+no reordering will be to use the DUCET/CLDR order.
+
+On the API (not applicable to rule syntax), the special reorder code
+`Collator.ReorderCodes.DEFAULT` (= `UScript.INHERITED`) will reset the reordering
+for the collator to its default order. The default reordering may be the
+DUCET/CLDR order or may be a reordering that was specified when this collator
+was created from resource data or from rules. The DEFAULT code must be the sole
+code supplied when it used.
+
+For details see the [section “Collation Reordering” in the LDML collation
+spec](http://www.unicode.org/reports/tr35/tr35-collation.html#Script_Reordering).
+
+### Advanced Syntactical Elements
+
+Several other syntactical elements are needed in more specific situations.
+
+#### Order before
+
+- Syntax: `[before 1|2|3]`
+- Example: `&[before 2]a<ā<á<ǎ<à`
+
+Enables users to order characters **before **a given character. In UCA 3.0, the
+example is equivalent to & ㍡<ā<á<ǎ<à (㍡= \\u3361, ideographic telegraph symbol
+for hour nine) and makes accented 'a' letters sort before 'a'. Accents are often
+used to indicate the intonations in Pinyin. In this case, the non-accented
+letters sort after the accented letters.
+
+#### Expansion
+
+- Syntax: `/`
+- Example: `æ/e`
+
+Adds the collation element for 'e' to the collation element for æ.
+After a reset `&ae << æ` is equivalent to `&a << æ/e`. See the Expansion example
+below.
+
+#### Prefix processing
+
+- Syntax: `|`
+- Example: `a|b`
+
+If 'b' is encountered and it follows 'a',
+output the appropriate collation element. If 'b' follows any other letter,
+output the normal collation element for 'b'.
+The collation element for 'a' is not affected.
+
+This element is used to speed up sorting under JIS X 4061. See the
+Prefix example below.
+
+#### Reset to top
+
+- Syntax: `[top]`
+- Example: `&[top] < a < b < c …`
+
+**Deprecated, use indirect positioning instead**
+(`&[last regular]`, see section below)
+Reorders a set of characters 'above' the UCA. `[top]` is a virtual code point having the
+biggest primary weight value that will ever be assigned in the UCA. Above top,
+there is a large number of unassigned primary weights that can be used for a
+'large' tailoring, such as the reordering of the CJK characters according to a
+Far Eastern code page. The first difference after the top is always primary.
+
+### Indirect Positioning of Collation Elements
+
+Since ICU version 2.0, ICU allows for indirect positioning of collation elements
+(CE). Similar to the reset anchor `top`, these reset anchors allow for positioning of the
+tailoring relative to significant sections of the UCA table. You can use the
+`[before]` reset option to position before these sections.
+
+Name                      | Example CE value  | Note
+------------------------- | ----------------- | ------------
+first tertiary ignorable  | `[,,]`            | Start of the UCA table. This value will never change unless CEs are extended with higher level values.
+last tertiary ignorable   | `[,,]`            | This value will never change unless CEs are extended with higher level values.
+first secondary ignorable | `[,, 05]`         | Currently there are no secondary ignorables in the UCA table.
+last secondary ignorable  | `[,, 05]`         | Currently there are no secondary ignorables in the UCA table.
+first primary ignorable   | `[, 87, 05]`      | Mostly for non-spacing combining marks.
+last primary ignorable    | `[, E1 B1, 05]`   | Currently this value points to a non-existing code point, used to facilitate sorting of compatibility characters.
+first variable            | `[05 07, 05, 05]` | The lowest CE that is not primary-ignorable. (see below)
+last variable             | `[17 9B, 05, 05]` | End of variable section.
+first regular             | `[1A 20, 05, 05]` | This is the first regular CE (not primary ignorable and not variable). The majority of code points have regular CEs.
+last regular              | `[78 AA B2, 05, 05]` | Use `&[last regular]` instead of `&[top]`. (see below)
+first implicit            | `[E0 03 03, 05, 05]` | Section of implicitly generated collation elements. (see below)
+last implicit             | `[E3 DC 70 C0, 05, 05]` | End of implicit section. This is the CE of the last unassigned code point (U+10FFFD). (see below)
+first trailing            | `[E5, 05, 05]`    | Start of trailing section. (see below)
+last trailing             | `[FF FF, 05, 05]` | End of trailing collation elements section. This is the highest possible CE, and is the CE for U+FFFF. Not available for tailoring, see `[first trailing]`.
+
+"first variable": The current code point is TAB=U+0009. This is the start of the variable section. "Variable" characters will be ignored on primary/secondary/tertiary levels when the "shifted" option is on.
+
+Tailoring after "last regular" will effectively position characters
+between regular code points and "implicit" CEs (the next section).
+This should be used (only) for tailoring Han characters
+which tends to affect thousands of characters.
+The script reordering implementation assumes that CEs in this section
+are for "Hani" script characters.
+
+"Implicit" means that the UCA default ordering table (DUCET)
+does not explicitly specify CEs for CJK ideographs and unassigned code points;
+instead, their CEs are computed at runtime.
+
+Beginning with ICU 53, tailoring to any unassigned code point,
+including "last implicit", is not supported any more.
+
+"trailing": Tailoring characters after `[first trailing]`
+makes them sort after all other non-tailored code points except for U+FFFD and U+FFFF.
+
+The "trailing" section is reserved for future use, such as for non starting Jamos. See
+<http://www.unicode.org/reports/tr10/#Trailing_Weights>.
+CLDR 1.9/ICU 4.6 and later map U+FFFF to the very end of the trailing section.
+UCA 6.3/CLDR 24/ICU 52 and later map U+FFFD to just before U+FFFF.
+U+FFFD..U+FFFF are not tailorable, and nothing can tailor to them.
+<http://www.unicode.org/reports/tr35/tr35-collation.html#tailored_noncharacter_weights>
+
+Before ICU 4.6, U+FFFF mapped to a completely ignorable CE, and `[last trailing]`
+was the same as `[first trailing]`.
+
+Not all of the indirect-positioning anchors are useful. Most of the 'first'
+elements should be used with the `[before]` directive, in order to make sure
+that your tailoring will sort before an interesting section.
+
+### Complex Tailoring Examples
+
+The following are several fragments of real tailorings, illustrating some of the
+advanced syntactical elements:
+
+#### Expansion Example:
+
+**Swedish:**
+```
+&t<<<þ/h
+&T<<<Þ/H
+```
+
+The letter 'þ' (THORN) is normally treated by UCA/root collation as a separate
+letter that has primary-level sorting after 'z'. However, in Swedish and some
+other Scandinavian languages, 'þ' and 'Þ' should be treated as just a
+tertiary-level difference from the letters "th" and "TH" respectively. This is
+an example of an expansion.
+
+UCA | `&t<<<þ/h, &T<<<Þ/H`
+--- | --------------------
+az  | az
+Az  | Az
+tha | tha
+Tha | þa
+THa | Tha
+thz | THa
+za  | Þa
+Za  | thz
+zz  | þz
+þa  | za
+Þa  | Za
+þz  | zz
+
+#### Prefix Example:
+
+Prefixes are used in Japanese tailorings to reduce the number of contractions. A
+big number of contractions is a performance burden on the commonly-used base
+characters, as their processing is much more complicated than the processing of
+regular elements.
+
+A prefix rule conditionally changes the CE of the character or string (e.g., ー)
+after the | symbol; unlike a contraction, it does not affect the CE of the
+preceding text (e.g., ァ). (By contrast, a contraction like ァー consumes both
+characters and can assign them a CE or expansion unrelated to ァ's CE.) A prefix
+rule is especially useful if the character or string (ー) after the | symbol
+occurs significantly less often than the first character of the prefix (ァ).
+
+```
+&[before 3]ァ <<< ァ|ー = ァ|ー = ぁ|ー
+```
+
+This could have been written as a series of contractions followed by expansion:
+
+```
+&[before 3]ァー <<< ァー = ァー = ぁー
+```
+
+However, in that case ァ, ァ and ぁ would start contractions. Since the prolonged
+sound mark (ー) occurs much less frequently than the other letters of Japanese
+Katakana and Hiragana, it is much more prudent to put the extra processing on it
+by using prefixes.
+
+#### Reset example:
+
+A "reset" always uses only the base character as the insertion point even if
+there is an expansion. So the following rule,
+
+```
+& J <<< K / B & K <<< M
+```
+
+is equivalent to
+
+```
+& J <<< K / B <<< M
+```
+
+Which produces the following sort order:
+
+"JA"
+
+"MA"
+
+"KA"
+
+"KC"
+
+"JC"
+
+"MC"
+
+> :point_right: **Note**: Assuming the letters "J", "K" and "M" have equal primary weights, the second
+> letter contains the differences among these strings. However, the letter "K" is
+> treated as if it always has a letter "B" following it while the letters "J" and
+> "M" do not.
+
+The following is an example of collation elements for these strings resulting
+from the specified rules:
+
+Strings | Collation Elements | &nbsp;         | &nbsp;
+------- | ------------------ | -------------- | ------
+"JA"    | `[005C.00.01]`     | `[0052.00.01]` |
+"MA"    | `[005C.00.03]`     | `[0052.00.01]` |
+"KA"    | `[005C.00.02]`     | `[0053.00.01]` | `[0052.00.01]`
+"KC"    | `[005C.00.02]`     | `[0053.00.01]` | `[0054.00.01]`
+"JC"    | `[005C.00.01]`     | `[0054.00.01]` |
+"MC"    | `[005C.00.03]`     | `[0054.00.01]` |
+
+## Tailoring Issues
+
+ICU uses canonical closure. This means that for each code point in Unicode, if
+the canonically composed form of a tailored string produces different collation
+elements than the canonically decomposed form, then the canonically composed
+form is effectively added to the ordering. If 'a' is tailored, for example, all
+of the accented 'a' characters are also tailored. Canonical closure allows
+collators to process Unicode strings in the FCD form as well as in NFD. (Note:
+Most but not all NFC strings are also in FCD. See
+<http://www.unicode.org/notes/tn5/#FCD>)
+
+However, *compatibility* equivalents are NOT automatically added. If the rule
+"&b < a" is in tailoring, and the order of **ⓐ (circled a)** is important, it
+needs to be tailored **explicitly**.
+
+Redundant tailoring rules are removed, with later rules "winning". The strengths
+around the removed rules are also fixed.
+
+### Example:
+
+The following table summarizes effects of different redundant rules.
+
+&nbsp; | Original                                                  | Equivalent
+------ | --------------------------------------------------------- | ----------
+1      | `& a < b < c < d` `& r < c`                               | `& a < b < d` `& r < c`
+2      | `& a < b < c < d` `& c < m`                               | `& a < b < c < m < d`
+3      | `& a < b < c < d` `& a < m`                               | `& a < m < b < c < d`
+4      | `& a <<< b << c < d` `& a < m`                            | `& a <<< b << c < m < d`
+5      | `& a < b < c < d` `& [before 1] c < m`                    | `& a < b < m < c < d`
+6      | `& a < b <<< c << d <<< e` `& [before 3] e <<< x`         | `& a < b <<< c << d <<< x <<< e`
+7      | `& a < b <<< c << d <<< e` `& [before 2] e <<< x`         | `& a < b <<< c <<< x << d <<< e`
+8      | `& a < b <<< c << d <<< e` `& [before 1] e <<< x`         | `& a <<< x < b <<< c << d <<< e`
+9      | `& a < b <<< c << d <<< e <<< f < g` `& [before 1] g < x` | `& a < b <<< c << d <<< e <<< f < x < g`
+
+If two different reset lists tailor the same character, then it is removed from the first
+one (see 1 in the table above).
+If the second list resets to a character tailored in the first list, then the second
+list is inserted in the first (see 2).
+If both lists reset to the same character, then the same thing
+happens (see 3). Whenever such an insertion occurs, the second strength
+"postpones" the position (see 4).
+
+If there is a `[before N]` on the reset, then the reset character is
+effectively replaced by the item that would be before it, either in a previous
+tailoring (if the letter occurs in one - see 5) or in the UCA. The N determines
+the 'distance' before, based on the strength of the difference (see 6-8).
+However, this is subject to postponement (see 9), so be careful!
+
+### Reset semantics
+
+The reset semantic in ICU 1.8 and above is different from the previous ICU
+releases. Prior to version 1.8, the reset relation modifier was applicable only
+to the entry immediately following the reset entry. Also, the relation modifier
+applied to all entries that occurred until the next reset or primary relation.
+
+For example,
+
+```
+&xyz << e <<< f
+```
+
+was equivalent to
+
+```
+&x << e/yz <<< f
+```
+
+prior to ICU version 1.8.
+
+Starting with ICU version 1.8, the modifier is equivalent to
+
+```
+&x << e/yz <<< f/yz
+```
+
+The new semantic produces more intuitive results, especially when the character
+after the reset is decomposable. Since all rules are converted to NFD before
+they are interpreted, this can result in contractions that the rule-writer might
+not be aware of. Expansion propagates only until the next reset or primary
+relation occurs.
+
+For example, the following rule:
+
+```
+&ab = c <<< d << e <<< f < g <<< h
+```
+
+was equivalent to the following prior to ICU 1.8 and in Java:
+
+```
+&a = c/b <<< d << e <<< f < g <<< h
+```
+
+Starting with 1.8, it is equivalent to
+
+```
+&a = c / b <<< d / b << e / b <<< f / b < g <<< h
+```
+
+## Known Limitations
+
+The following are known limitations of the ICU collation implementation. These
+are theoretical limitations, however, since there are no known languages for
+which these limitations are an issue. However, for completeness they should be
+fixed in a future version after 1.8.1. The examples given are designed for
+simplicity in testing, and do not match any real languages.
+
+### Expansion
+
+The goal of expansion is to sort as if the expansion text were inserted right
+after the character. For example, with the rule
+
+```
+&a <<< c / e
+```
+
+The text "...**c**..." should sort as if it were right after "...**ae**..." with
+a tertiary difference. There are a few cases where this is not currently true.
+
+#### Recursive Expansion
+
+Given the rules
+
+```
+&a <<< c / e
+&g <<< e / I
+```
+
+Expansion should sort the text "...**c**..." as if it were just after
+"...**ae**...", and that should also sort as if it were just after
+"...**agi**...". This requires that the compilation of expansions be recursive
+(and check for loops as well!). ICU currently does not do this.
+
+Rules         | Desired Order | Current Order
+------------- | ------------- | -------------
+`& a = b / c` | add           | b
+`& d = c / e` | b             | add
+&nbsp;        | adf           | adf
+
+#### Contractions Spanning Expansions
+
+ICU currently always pre-compiles the expansion into an internal format (a list
+of one or more collation elements) when the rule is compiled. If there is a
+contraction that spans the end of the expanded text and the start of the
+original text, however, that contraction will not match. A text case that
+illustrates this is:
+
+Rules           | Desired Order | Current Order
+--------------- | ------------- | -------------
+`& a <<< c / e` | ad            | ad
+`& g <<< eh`    | c             | c
+&nbsp;          | af            | ch
+&nbsp;          | g             | af
+&nbsp;          | ch            | g
+&nbsp;          | h             | h
+
+Since the pre-compiled expansions are a huge performance gain, we will probably
+keep the implementation the way it is, but in the future allow additional syntax
+to indicate those few expansions that need to behave as if the text were
+inserted because of the existence of another contraction. Note that such
+expansions need to be recursively expanded (as in #1), but rather than at
+pre-compile time, these need to be done at runtime.
+
+While it is possible to automatically detect these cases, it would be better to
+allow explicit control in case spanning is not desired. An example of such
+syntax might be something like:
+
+```
+&a <<< c // e
+```
+
+**Notes:** ICU does handle the case where there is a contraction that is
+completely inside the expansion.
+
+Suppose that someone had the rules:
+
+```
+&a = c / e
+&x = ae
+```
+
+These do not cause **c** to sort as if it were **ae**, nor should they.
+
+### Normalization
+
+The Unicode Collation Algorithm specifies that all text sort as if it were first
+normalized into NFD. For performance reasons, ICU collation data is
+pre-processed so that there is no need to perform normalization on strings that
+are in [FCD](http://unicode.org/notes/tn5/#FCD) and do not contain any composite
+combining marks. Composite combining marks are: { U+0344, U+0F73, U+0F75, U+0F81
+}
+[`[[:^lccc=0:]&[:toNFD=/../:]]`](http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3A%5Elccc%3D0%3A%5D%26%5B%3AtoNFD%3D%2F..%2F%3A%5D&abb=on&g=)
+(These characters must be decomposed for discontiguous contractions to work
+properly. Use of these characters is discouraged by the Unicode Standard.). The
+vast majority of strings are in this form.
+
+#### Nulls in Contractions
+
+Nulls should not be used in contractions that could invoke normalization.
+
+Rules                | Desired Order | Current Order
+-------------------- | ------------- | -------------
+`& a <<< '\u0000'^`  | a             | '\\u0000'^
+&nbsp;               | '\\u0000'^    | a
+
+#### Contractions Spanning Normalization
+
+The following rule specifies that a grave accent followed by a **b** is a
+contraction, and sorts as if it were an **e**.
+
+```
+& e <<< ` b
+```
+
+On this basis, "...àb..." should sort as if it were just after "...ae...".
+Because of the preprocessing, however, the contraction will not match if this
+text is represented with the pre-composed character à, but **will** match if
+given the decomposed sequence **a + grave accent**. The same thing happens if
+the contraction spans the start of a normalized sequence.
+
+Rules        | Desired Order | Current Order
+------------ | ------------- | -------------
+& e <<< \` b | à             | à
+&nbsp;       | ad            | àb
+&nbsp;       | àb            | ad
+&nbsp;       | af            | af
+&nbsp;       | &nbsp;        |
+`& g <<< ca` | f             | cà
+&nbsp;       | ca            | f
+&nbsp;       | cà            | ca
+&nbsp;       | h             | h
+
+### Variable Top
+
+ICU lets you set the top of the variable range. This can be done, for example,
+to allow you to ignore just SPACES, and not punctuation.
+
+#### Variable Top Exclusion
+
+There is currently a limitation that causes variable top to (perhaps) exclude
+more characters than it should. This happens if you not only set variable top,
+but also tailor a number of characters around it with primary differences. The
+exact number that you can tailor depends on the internal "gaps" between the
+characters in the pre-compiled UCA table. Normally there is a gap of one. There
+are larger gaps between scripts (such as between Latin and Greek), and after
+certain other special characters. For example, if variable top is set to be at
+SPACE ('\\u0020'), then it works correctly with up to 70 characters also
+tailored after space. However, if variable top is set to be equal to HYPHEN
+('\\u2010'), only one other value can be accommodated.
+
+In the following, the goal is for x to be ignored and z not to be ignored.
+
+Rules              | Desired Order SHIFTED = ON | Current Order
+------------------ | -------------------------- | -------------
+`& \u2010`         | -                          | -
+`< x`              | z                          | z
+`< [variable top]` | zb                         | zb
+`< z`              | a                          | xb
+&nbsp;             | b                          | a
+&nbsp;             | -b                         | b
+&nbsp;             | xb                         | -b
+&nbsp;             | c                          | c
+
+> :point_right: **Note**: With ICU 1.8.1, the
+> user is advised not to tailor the variable top to customize more than two
+> primary relations (for example, `"& x < y < [variable top]"`). Starting in ICU
+> 2.0, setVariableTop() allows the user to set the variable top programmatically
+> to a legal single character or a valid contracting sequence. In addition, the
+> string that variable top is set to should not be treated as either inclusive or
+> exclusive in the rules.
+
+### Case Level/First/Second
+
+In ICU, it is possible to override the tertiary settings programmatically. This
+is used to change the default case behavior to be all upper first or all lower
+first. It can also be used for a separate case level, or to ignore all other
+tertiary differences (such as between circled and non-circled letters, or
+between half-width and full-width katakana). The case values are derived
+directly from the Unicode character properties, and not set by the rules.
+
+#### Mixed Case Contractions
+
+There is currently a limitation that all contractions of multiple characters can
+only have three special case values: upper, lower, and mixed. All mixed-case
+contractions are grouped together, and are not affected by the upper first vs.
+lower first flag.
+
+Rules      | Desired Order UPPER_FIRST | Current Order
+---------- | ------------------------- | -------------
+`& c < ch` | C                         | c
+`<<< cH`   | CH                        | CH
+`<<< Ch`   | Ch                        | cH
+`<<< CH`   | cH                        | Ch
+&nbsp;     | ch                        | ch
+
+## Building on Existing Locales
+
+All of the collation rules are additive; that is, they override what any
+previous rule expressed. That means that you can build on existing rules for
+given locales. Here is an example of this, which fetches the rules for a
+particular locale (Danish), then overrides some part (sorting '%' after 'm').
+The syntax is Java, but C/C++ has similar features.
+
+```Java
+ULocale myLocale = new ULocale("da");
+try {
+
+    RuleBasedCollator col = (RuleBasedCollator) Collator.getInstance(myLocale);
+    String rules = col.getRules();
+    String myRules = "& m < '%'";
+    RuleBasedCollator col2 = new RuleBasedCollator(rules + myRules);
+
+    // check the values
+
+    List<String> expected = Arrays.asList("a;m;%;z;aa".split(";"));
+    TreeSet<String> sorted = new TreeSet<String>(col2);
+    sorted.addAll(expected);
+    ArrayList<String> actual = new ArrayList<String>(sorted);
+    assertEquals("Customized rules with %", expected, actual);
+
+} catch (Exception e) {
+    throw new IllegalArgumentException("Failed to create customized rules", e);
+}
+```
+
+The root collator has an empty rules string (`getRules()` returns `""`): Any
+collator's tailoring rules string defines how a collator *differs* from the root
+collator, and the tailoring rules string was the input for building the
+tailoring collator. By contrast, the root collator itself is built from a file
+with explicit mappings (ICU4C source/data/unidata/FractionalUCA.txt)
+from characters/contractions to collation elements. This file represents the
+[DUCET](http://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table)
+as [modified by
+CLDR](http://www.unicode.org/reports/tr35/tr35-collation.html#Root_Collation).
+
+There are "extended" versions of `getRules()` which, when called with
+`delta=UCOL_FULL_RULES` (C/C++) or `fullrules=true` (Java), return "full rules"
+which are a concatenation of the "UCA rules" and the collator's tailoring. The
+"UCA rules" are published as UCA_Rules.txt in every [UCA
+release](http://www.unicode.org/Public/UCA/).
+
+*   "UCA rules" is a historical misnomer. The UCA specifies an Algorithm which
+    applies to all collators, and provides the DUCET as its Default table.
+*   ICU's root collator implements the CLDR-modified collation element table.
+    The "UCA rules" returned from ICU functions are equivalently modified rules
+    compared with those for the DUCET.
+
+The "UCA rules" are an *approximation* of the root collator's sort order, but
+there are some differences because not all of the details of the root collator
+mappings can be expressed in rule syntax. In particular, a collator built from
+ICU4C source/data/unidata/UCARules.txt
+has at least the following issues compared with the real root collator:
+
+*   inefficient (long) collation element weights
+*   CODAN (numeric collation) will not work (the 0 digit's primary weight is
+    hardcoded, or specified in FractionalUCA.txt)
+*   script reordering will not work
+*   alternate=shifted will not work
+*   the sort order has some differences from the regular root collator,
+    including additional tertiary differences
+
+The "full rules" are almost never used, or useful, at runtime. They are included
+in ICU for historical reasons and for UCA consistency tests. They might be
+usable for emulating the CLDR/ICU sort order with a collation implementation not
+based on CLDR/ICU.
+
+Collation rule strings in general are not commonly used but are a significant
+portion of the data size in ICU collation resource bundles, especially for CJK
+languages. The rule strings can be omitted from those resource bundles by adding
+the `--omitCollationRules` option to the relevant `genrb` invocations
+(for ICU 53..63, in icu4c/source/data/Makefile.in)
+or, since ICU 64, with a [data filter config file](../../icu_data/buildtool.md).
+(See for example the relevant
+[ICU integration test instructions](http://site.icu-project.org/processes/release/tasks/integration#TOC-Verify-that-ICU4C-tests-pass-without-collation-rule-strings).)
+
+If the tailoring rules are needed but the 150kB or so of "UCA rules" are not,
+then the line
+
+```
+UCARules:process(uca_rules){"../unidata/UCARules.txt"}
+```
+
+in
+[source/data/coll/root.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/coll/root.txt)
+can be commented out or deleted.
+
+## Cautions
+
+The following are not known rule limitations, but rather cautions.
+
+### Resets
+
+Since resets always work on the existing state, the user is required to make
+sure that the rule entries are in the proper order.
+
+Rules     | Order | Comment
+--------- | ----- | -------
+`& a < b` | a     | The rules mean: put **b** after **a**, then put **c** after **a** (inserting **before** the **b**).
+`& a < c` | c     |
+&nbsp;    | b     |
+
+### Postpone Insertion
+
+When using a reset to insert a value X with a certain strength difference after
+a value Y, it actually is inserted just before the next item of the same
+strength or higher following Y. Thus, the following are equivalent:
+
+```
+... m < a = c <<< d << e <<< f < g <<< h & a << x
+... m < a = c <<< d << x << e <<< f < g <<< h
+```
+
+> :point_right: **Note**: This is different from the Java semantics.
+> In Java, the value is inserted immediately after the reset character.
+
+### Jamo Tailoring
+
+If Jamo characters are tailored, that causes the code to go through a slow path,
+which will have a significant effect on performance.
+
+### Compatibility Decompositions
+
+When tailoring a letter, the customization affects all of its canonical
+equivalents. That is, if tailoring rule sorts an **'a'** after**'e '**, for
+example, then "**"à", "á", ...** are also sorted after '**e**'.his is not true
+for compatibility equivalents. If the desired sorting order is for a
+**superscript-a** ("ª") to be after "**e"**, it is necessary to specify the rule
+for that.
+
+### Case Differences
+
+Similarly, when tailoring an "**a" to be sorted** after "**e"**, including
+"**A"** to be after "**e" **as well, it is required to have a specific rule for
+that sorting sequence.
+
+### Automatic Expansions
+
+ICU will automatically form expansions whenever a reset is to a multi-character
+value that is not a contraction. For example, `& ab <<< c` is equivalent to
+`& a <<< c / b`. The user may be unaware of this happening, since it may not be
+obvious that the reset is to a multi-character value. For example, `& à<<< d` is
+equivalent to & a <<< d / \`
diff --git a/docs/userguide/collation/examples.md b/docs/userguide/collation/examples.md
new file mode 100644
index 0000000..c688f0f
--- /dev/null
+++ b/docs/userguide/collation/examples.md
@@ -0,0 +1,317 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Collation Examples
+
+## Simple Collation Sample Customization
+
+The following program demonstrates how to compare and create sort keys with
+default locale.
+
+In **C:**
+
+```C
+#include <stdio.h>
+#include <memory.h>
+#include <string.h>
+#include "unicode/ustring.h"
+#include "unicode/utypes.h"
+#include "unicode/uloc.h"
+#include "unicode/ucol.h"
+#define MAXBUFFERSIZE 100
+#define BIGBUFFERSIZE 5000
+UBool collateWithLocaleInC(const char* locale, UErrorCode *status)
+{
+    UChar         dispName    [MAXBUFFERSIZE]; 
+    int32_t       bufferLen   = 0;
+    UChar         source            [MAXBUFFERSIZE];
+    UChar         target            [MAXBUFFERSIZE];
+    UCollationResult result   = UCOL_EQUAL;
+    uint8_t             sourceKeyArray    [MAXBUFFERSIZE];
+    uint8_t             targetKeyArray    [MAXBUFFERSIZE]; 
+    int32_t       sourceKeyOut      = 0, 
+                targetKeyOut = 0;
+    UCollator     *myCollator = 0;
+    if (U_FAILURE(*status))
+    {
+        return FALSE;
+    }
+    u_uastrcpy(source, "This is a test.");
+    u_uastrcpy(target, "THIS IS A TEST.");
+    myCollator = ucol_open(locale, status);
+    if (U_FAILURE(*status)){
+        bufferLen = uloc_getDisplayName(locale, 0, dispName, MAXBUFFERSIZE, status);
+        /*Report the error with display name... */
+        fprintf(stderr,
+        "Failed to create the collator for : \"%s\"\n", dispName);
+        return FALSE;
+    }
+    result = ucol_strcoll(myCollator, source, u_strlen(source), target, u_strlen(target));
+    /* result is 1, secondary differences only for ignorable space characters*/
+    if (result != UCOL_LESS)
+    {
+        fprintf(stderr,
+        "Comparing two strings with only secondary differences in C failed.\n");
+        return FALSE;
+    }
+    /* To compare them with just primary differences */
+    ucol_setStrength(myCollator, UCOL_PRIMARY);
+    result = ucol_strcoll(myCollator, source, u_strlen(source), target, u_strlen(target));
+    /* result is 0 */
+    if (result != 0)
+    {
+        fprintf(stderr,
+        "Comparing two strings with no differences in C failed.\n");
+        return FALSE;
+    }
+
+    /* Now, do the same comparison with keys */
+    sourceKeyOut = ucol_getSortKey(myCollator, source, -1, sourceKeyArray, MAXBUFFERSIZE);
+    targetKeyOut = ucol_getSortKey(myCollator, target, -1, targetKeyArray, MAXBUFFERSIZE);
+    result = 0;
+    result = strcmp(sourceKeyArray, targetKeyArray);
+    if (result != 0)
+    {
+        fprintf(stderr,
+        "Comparing two strings with sort keys in C failed.\n");
+        return FALSE;
+    }
+    ucol_close(myCollator);
+    return TRUE;
+}
+```
+
+In **C++:**
+
+```C++
+#include <stdio.h>
+#include "unicode/unistr.h"
+#include "unicode/utypes.h"
+#include "unicode/locid.h"
+#include "unicode/coll.h"
+#include "unicode/tblcoll.h"
+#include "unicode/coleitr.h"
+#include "unicode/sortkey.h"
+UBool collateWithLocaleInCPP(const Locale& locale, UErrorCode& status)
+{
+    UnicodeString dispName; 
+    UnicodeString source("This is a test.");
+    UnicodeString target("THIS IS A TEST.");
+    Collator::EComparisonResult result    = Collator::EQUAL;
+    CollationKey sourceKey;
+    CollationKey targetKey; 
+    Collator      *myCollator = 0;
+    if (U_FAILURE(status))
+    {
+        return FALSE;
+    }
+    myCollator = Collator::createInstance(locale, status);
+    if (U_FAILURE(status)){
+        locale.getDisplayName(dispName);
+        /*Report the error with display name... */
+        fprintf(stderr,
+        "%s: Failed to create the collator for : \"%s\"\n", dispName);
+        return FALSE;
+    }
+    result = myCollator->compare(source, target);
+    /* result is 1, secondary differences only for ignorable space characters*/
+    if (result != UCOL_LESS)
+    {
+        fprintf(stderr,
+        "Comparing two strings with only secondary differences in C failed.\n");
+        return FALSE;
+    }
+    /* To compare them with just primary differences */
+    myCollator->setStrength(Collator::PRIMARY);
+    result = myCollator->compare(source, target);
+    /* result is 0 */
+    if (result != 0)
+    {
+        fprintf(stderr,
+        "Comparing two strings with no differences in C failed.\n");
+        return FALSE;
+    }
+    /* Now, do the same comparison with keys */
+    myCollator->getCollationKey(source, sourceKey, status);
+    myCollator->getCollationKey(target, targetKey, status);
+    result = Collator::EQUAL;
+
+    result = sourceKey.compareTo(targetKey);
+    if (result != 0)
+    {
+        fprintf(stderr,
+        "%s: Comparing two strings with sort keys in C failed.\n");
+        return FALSE;
+    }
+    delete myCollator;
+    return TRUE;
+}
+```
+
+### Main Function
+
+```C++
+extern "C" UBool collateWithLocaleInC(const char* locale, UErrorCode *status);
+int main()
+{
+   UErrorCode status = U_ZERO_ERROR;
+   fprintf(stdout, "\n");
+   if (collateWithLocaleInCPP(Locale("en", "US"), status) != TRUE)
+   {
+        fprintf(stderr,
+        "Collate with locale in C++ failed.\n");
+   } else 
+   {
+       fprintf(stdout, "Collate with Locale C++ example worked!!\n");
+   }
+   status = U_ZERO_ERROR;
+   fprintf(stdout, "\n");
+   if (collateWithLocaleInC("en_US", &status) != TRUE)
+   {
+        fprintf(stderr,
+        "%s: Collate with locale in C failed.\n");
+   } else 
+   {
+       fprintf(stdout, "Collate with Locale C example worked!!\n");
+   }
+   return 0;
+}
+```
+
+In **Java:**
+
+```Java
+import com.ibm.icu.text.Collator;
+import com.ibm.icu.text.CollationElementIterator;
+import com.ibm.icu.text.CollationKey;
+import java.util.Locale;
+
+public class CollateExample
+{
+
+    public static void main(String arg[]) 
+    {
+        CollateExample example = new CollateExample();
+        try {
+            if (!example.collateWithLocale(Locale.US)) {
+                System.err.println("Collate with locale example failed.");
+            } 
+            else {
+                System.out.println("Collate with Locale example worked!!");
+            }
+        } catch (Exception e) {
+            System.err.println("Collating with locale failed");
+            e.printStackTrace();
+        }
+    }
+
+    public boolean collateWithLocale(Locale locale) throws Exception
+    {
+        String source = "This is a test.";
+        String target = "THIS IS A TEST.";
+        Collator myCollator = Collator.getInstance(locale);
+
+        int result = myCollator.compare(source, target);
+        // result is 1, secondary differences only for ignorable space characters
+        if (result >= 0) {
+            System.err.println(
+                "Comparing two strings with only secondary differences failed.");
+            return false;
+        }
+        // To compare them with just primary differences
+        myCollator.setStrength(Collator.PRIMARY);
+        result = myCollator.compare(source, target);
+        // result is 0
+        if (result != 0) {
+            System.err.println(
+                           "Comparing two strings with no differences failed.");
+            return false;
+        }
+        // Now, do the same comparison with keys
+        CollationKey sourceKey = myCollator.getCollationKey(source);
+        CollationKey targetKey = myCollator.getCollationKey(target);
+        result = sourceKey.compareTo(targetKey);
+        if (result != 0) {
+            System.err.println("Comparing two strings with sort keys failed.");
+            return false;
+        }
+        return true;
+    }   
+}
+```
+
+## Language-sensitive searching
+
+String searching is a well-researched area, and there are algorithms that can
+optimize the searching process. Perhaps the best is the Boyer-Moore method. For a
+full description of this concept, please see Laura
+Werner's text searching article for more details
+(<http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>).
+
+However, implementing collation-based search with the Boyer-Moore method
+while getting correct results is very tricky,
+and ICU no longer uses this method.
+
+Please see the (String Search Service)[string-search.md] chapter.
+
+## Using large buffers to manage sort keys
+
+A good solution for the problem of not knowing the sort key size in advance is
+to allocate a large buffer and store all the sort keys there, while keeping a
+list of indexes or pointers to that buffer.
+
+Following is sample code that will take a pointer to an array of UChar pointer,
+an array of key indexes. It will allocate and fill a buffer with sort keys and
+return the maximum size for a sort key. Once you have done this to your string,
+you just need to allocate a field of maximum size and copy your sortkeys from
+the buffer to fields.
+
+```C++
+uint32_t 
+
+fillBufferWithKeys(UCollator *coll, UChar **source, uint32_t *keys, uint32_t sourceSize, 
+                            uint8_t **buffer, uint32_t *maxSize, UErrorCode *status) 
+{
+  if(status == NULL || U_FAILURE(*status)) {
+    return 0;
+  }
+
+  uint32_t bufferSize = 16384;
+  uint32_t increment = 16384;
+  uint32_t currentOffset = 0;
+  uint32_t keySize = 0;
+  uint32_t i = 0;
+  *maxSize = 0;
+
+  *buffer = (uint8_t *)malloc(bufferSize * sizeof(uint8_t));
+  if(buffer == NULL) {
+    *status = U_MEMORY_ALLOCATION_ERROR;
+    return 0;
+  }
+
+  for(i = 0; i < sourceSize; i++) {
+    keys[i] = currentOffset;
+    keySize = ucol_getSortKey(coll, source[i], -1, *buffer+currentOffset, bufferSize-currentOffset);
+    if(keySize > bufferSize-currentOffset) {
+      *buffer = (uint8_t *)realloc(*buffer, bufferSize+increment);
+      if(buffer == NULL) {
+        *status = U_MEMORY_ALLOCATION_ERROR;
+        return 0;
+      }
+      bufferSize += increment;
+      keySize = ucol_getSortKey(coll, source[i], -1, *buffer+currentOffset, bufferSize-currentOffset);
+    }
+    /* here you can hook code that does something interesting with the keySize - 
+     * remembers the maximum or similar...
+     */
+    if(keySize > *maxSize) {
+      *maxSize = keySize;
+    }
+    currentOffset += keySize;
+  }
+
+  return currentOffset;
+}
+```
diff --git a/docs/userguide/collation/faq.md b/docs/userguide/collation/faq.md
new file mode 100644
index 0000000..7a26da1
--- /dev/null
+++ b/docs/userguide/collation/faq.md
@@ -0,0 +1,55 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Collation FAQ
+
+## Q. Should I turn Full Normalization on all the time?
+
+**A.** You can if you want, but you don't typically need to. The key is that
+normalization for most characters is already built into ICU's collation by
+default. Everything that can be done without affecting performance is already
+there, and will work with most languages. So the normalization parameter in ICU
+really only changes whether full normalization is invoked.
+
+The outlying cases are situations where a language uses multiple accents
+(non-spacing marks) on the same base letter, such as Vietnamese or Arabic. In
+those cases, full normalization needs to be turned on. If you use the right
+locale (or language) when creating a collation in ICU, then full normalization
+will be turned on or off according to what the language typically requires.
+
+## Q. Are there any cases where I would want to override the Full Normalization setting?
+
+**A.** The only case where you really need to worry about that parameter is for
+very unusual cases, such as sorting an list containing of names according to
+English conventions, but where the list contains, for example, some Vietnamese
+names. One way to check for such a situation is to open a collator for each of
+the languages you expect to find, and see if any of them have the full
+normalization flags set.
+
+## Q. How can collation rules mimic word sorting?
+
+Word sort is a way of sorting where certain interpunction characters are
+completely ignored, while other are considered. An example of word sort below
+ignores hyphens and apostrophes:
+
+Word Sort | String Sort
+--------- | -----------
+billet    | bill's
+bills     | billet
+bill's    | bills
+cannot    | can't
+cant      | cannot
+can't     | cant
+con       | co-op
+coop      | con
+co-op     | coop
+
+This specific behavior can be mimicked using a tailoring that makes these
+characters completely ignorable. In this case, an appropriate rule would be
+`"&\\u0000 = '' = '-'"`.
+
+Please note that we don't think that such solution is correct, since different
+languages have different word elements. Instead one should use shifted mode for
+comparison.
diff --git a/docs/userguide/collation/index.md b/docs/userguide/collation/index.md
new file mode 100644
index 0000000..8963b85
--- /dev/null
+++ b/docs/userguide/collation/index.md
@@ -0,0 +1,142 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Collation
+
+## Overview
+
+Information is displayed in sorted order to enable users to easily find the
+items they are looking for. However, users of different languages might have
+very different expectations of what a "sorted" list should look like. Not only
+does the alphabetical order vary from one language to another, but it also can
+vary from document to document within the same language. For example, phonebook
+ordering might be different than dictionary ordering. String comparison is one
+of the basic functions most applications require, and yet implementations often
+do not match local conventions. The ICU Collation Service provides string
+comparison capability with support for appropriate sort orderings for each of
+the locales you need. In the event that you have a very unusual requirement, you
+are also provided the facilities to customize orderings.
+
+Starting in release 1.8, the ICU Collation Service is compliant to the Unicode
+Collation Algorithm (UCA) ([Unicode Technical Standard
+#10](http://www.unicode.org/unicode/reports/tr10/)) and based on the Default
+Unicode Collation Element Table (DUCET) which defines the same sort order as ISO
+14651.
+
+The ICU Collation Service also contains several enhancements that are not
+available in UCA. These have been adopted into the [CLDR Collation
+Algorithm](http://www.unicode.org/reports/tr35/tr35-collation.html#CLDR_Collation_Algorithm).
+For example:
+
+*   Additional case handling (as specified by CLDR): ICU allows case differences
+    to be ignored or flipped. Uppercase letters can be sorted before lowercase
+    letters, or vice-versa.
+*   Easy customization (as specified by CLDR): Services can be easily tailored
+    to address a wide range of collation requirements.
+*   The [default (root) sort
+    order](http://www.unicode.org/reports/tr35/tr35-collation.html#Root_Collation)
+    has been tailored slightly for improved functionality and performance.
+
+In other words, ICU implements the CLDR Collation Algorithm which is an
+extension of the Unicode Collation Algorithm (UCA) which is an extension of ISO
+14651.
+
+There are several benefits to using the collation algorithms defined in these
+standards, including:
+
+*   The algorithms have been designed and reviewed by experts in multilingual
+    collation, and therefore are robust and comprehensive.
+
+*   Applications that share sorted data but do not agree on how the data should
+    be ordered fail to perform correctly. By conforming to the CLDR/UCA/14651
+    standards for collation and using CLDR language-specific collation data,
+    independently developed applications sort data identically and perform
+    properly.
+
+In addition, Unicode contains a large set of characters. This can make it
+difficult for collation to be a fast operation or require collation to use
+significant memory or disk resources. The ICU collation implementation is
+designed to be fast, have a small memory footprint and be highly customizable.
+
+There are many challenges when accommodating the world's languages and writing
+systems and the different orderings that are used. However, the ICU Collation
+Service provides an excellent means for comparing strings in a locale-sensitive
+fashion.
+
+For example, here are some of the ways languages vary in ordering strings:
+
+*   The letters A-Z can be sorted in a different order than in English. For
+    example, in Lithuanian, "y" is sorted between "i" and "k".
+
+*   Combinations of letters can be treated as if they were one letter. For
+    example, in traditional Spanish "ch" is treated as a single letter, and
+    sorted between "c" and "d".
+
+*   Accented letters can be treated as minor variants of the unaccented letter.
+    For example, "é" can be treated equivalent to "e".
+
+*   Accented letters can be treated as distinct letters. For example, "Å" in
+    Danish is treated as a separate letter that sorts just after "Z".
+
+*   Unaccented letters that are considered distinct in one language can be
+    indistinct in another. For example, the letters "v" and "w" are two
+    different letters according to English. However, "v" and "w" are
+    traditionally considered variant forms of the same letter in Swedish.
+
+*   A letter can be treated as if it were two letters. For example, in German
+    phonebook (or "lists of names") order "ä" is compared as if it were "ae".
+
+*   Thai requires that the order of certain letters be reversed.
+
+*   Some French dictionary ordering traditions sort accents in backwards order,
+    from the end of the string. For example, the word "côte" sorts before "coté"
+    because the acute accent on the final "e" is more significant than the
+    circumflex on the "o".
+
+*   Sometimes lowercase letters sort before uppercase letters. The reverse is
+    required in other situations. For example, lowercase letters are usually
+    sorted before uppercase letters in English. Danish letters are the exact
+    opposite.
+
+*   Even in the same language, different applications might require different
+    sorting orders. For example, in German dictionaries, "öf" would come before
+    "of". In phone books the situation is the exact opposite.
+
+*   Sorting orders can change over time due to government regulations or new
+    characters/scripts in Unicode.
+
+To accommodate the many languages and differing requirements, ICU collation
+supports customizing sort orderings - also known as **tailoring**. More details
+regarding tailoring are discussed in the [Customization
+chapter.](customization/index.md)
+
+The basic ICU Collation Service is provided by two main categories of APIs:
+
+*   String comparison - most commonly used: APIs return result of comparing two
+    strings (greater than, equal or less than). This is used as a comparator
+    when sorting lists, building tree maps, etc.
+
+*   Sort key generation - used when a very large set of strings are
+    compared/sorted repeatedly: APIs return a zero-terminated array of bytes per
+    string known as a sort key. The keys can be compared directly using strcmp
+    or memcmp standard library functions, saving repeated lookup and computation
+    of each string's collation properties. For example, database applications
+    use index tables of sort keys to index strings quickly. Note, however, that
+    this only improves performance for large numbers of strings because sorting
+    via the comparison functions is very fast. For more information, see
+    [Sortkeys vs Comparison](concepts.md#sortkeys-vs-comparison).
+
+ICU provides an AlphabeticIndex API for generating language-appropriate
+sorted-section labels like in dictionaries and phone books.
+
+ICU also provides a higher-level [string search](icu-string-search-service.md)
+API which can be used, for example, for case-insensitive or accent-insensitive
+search in an editor or in a web page. ICU string search is based on the
+low-level [collation element iteration](architecture.md).
+
+## Programming Examples
+
+Here are some [API usage conventions](api.md) for the ICU Collation Service
+APIs.
diff --git a/docs/userguide/collation/string-search.md b/docs/userguide/collation/string-search.md
new file mode 100644
index 0000000..c2733da
--- /dev/null
+++ b/docs/userguide/collation/string-search.md
@@ -0,0 +1,318 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# String Search Service
+
+## Overview
+
+String searching, also known as string matching, is a very important subject in
+the wider domain of text processing and analysis. Many software applications use
+the basic string search algorithm in the implementations on most operating
+systems. With the popularity of Internet, the quantity of available data from
+different parts of the world has increased dramatically within a short time.
+Therefore, a string search algorithm that is language-aware has become more
+important. A bitwise match that uses the `u_strstr` (C), `UnicodeString::indexOf`
+(C++) or `String.indexOf` (Java) APIs will not yield the correct result specific
+to a particular language's requirements. The APIs will not yield the correct
+result because all the issues that are important to language-sensitive collation
+are also applicable to text searching. The following lists those issues which
+are applicable to text searching:
+
+1.  Accented letters\
+    In English, accents are treated as minor variations of a letter. In French,
+    accented letters have much more significance as they can actually change the
+    meaning of a word. Very often, an accented letter is actually a distinct
+    letter. For example, letter 'å' (\\u00e5) may be just a letter 'a' with an
+    accent symbol to English speakers. However, it is actually a distinct letter
+    in Danish; in Danish searching for 'a' should generally not match 'å' and
+    vice versa. In some cases, such as in traditional German, an accented letter
+    is short-hand for something longer. In sorting, an 'ä' (\\u00e4) is treated
+    as 'ae'. Note that primary- and secondary-level distinctions for *searching*
+    may not be the same as those for sorting; in ICU, many languages provide a
+    special "search" collator with the appropriate level settings for search.
+
+2.  Conjoined letters\
+    Special handling is required when a single letter is treated equivalent to
+    two distinct letters and vice versa. For example, in German, the letter 'ß'
+    (\\u00df) is treated as 'ss' in sorting. Also, in most languages, 'æ'
+    (\\u00e6) is considered equivalent to the letter 'a' followed by the letter
+    'e'. Also, the ligatures are often treated as distinct letters by
+    themselves. For example, 'ch' is treated as a distinct letter between the
+    letter 'c' and the letter 'd' in Spanish.
+
+3.  Ignorable punctuation\
+    As in collation, it is important that the user is able to choose to ignore
+    punctuation symbols while the user searches for a pattern in the string. For
+    example, a user may search for "blackbird" and want to include entries such
+    as "black-bird".
+
+## ICU String Search Model
+
+The ICU string search service provides similar APIs to the other text iterating
+services. Allowing users to specify the starting position and direction within
+the text string to be searched. For more information, please see the [Boundary
+Analysis](../boundaryanalysis/index.md) chapter. The user can locate one or all
+occurrences of a pattern in a string. For a given collator, a pattern match is
+located at the offsets <start, end> in a string if the collator finds that the
+sub-string between the start and end is equal.
+
+The string search service supports two different types of canonical match
+behavior.
+
+Let S' be the sub-string of a text string S between the offsets start and end
+<start, end>.
+A pattern string P matches a text string S at the offsets <start, end> if
+
+1.  option 1. P matches some canonical equivalent string of S'. Suppose the
+    collator used for searching has a tertiary collation strength, all accents
+    are non-ignorable. If the pattern "a\\u0300" is searched in the target text
+    "a\\u0325\\u0300", a match will be found, since the target text is
+    canonically equivalent to "a\\u0300\\u0325"
+
+2.  option 2. P matches S' and if P starts or ends with a combining mark, there
+    exists no non-ignorable combining mark before or after S' in S respectively.
+    Following the example above, the pattern "a\\u0300" will not find a match in
+    "a\\u0325\\u0300", since there exists a non-ignorable accent '\\u0325' in
+    the middle of 'a' and '\\u0300'. Even with a target text of
+    "a\\u0300\\u0325" a match will not be found because of the non-ignorable
+    trailing accent \\u0325.
+
+One restriction is to be noted for option 1. Currently there are no composite
+characters that consists of a character with combining class greater than 0
+before a character with combining class equals to 0. However, if such a
+character exists in the future, the string search service may not work correctly
+with option 1 when such characters are encountered.
+
+Furthermore, option 1 could generate more than one "encompassing" matches. For
+example, in Danish, 'å' (\\u00e5) and 'aa' are considered equivalent. So the
+pattern "baad" will match "a--båd--man" (a--b\\u00e5d--man) at the start offset
+at 3 and the end offset 5. However, the start offset can be 1 or 2 and the end
+offset can be 6 or 7, because "-" (hyphen) is ignorable for a certain collation.
+The ICU implementation always returns the offsets of the shortest match
+sub-string. To be more exact, the string search added a "tightest" match
+condition. In other words, if the pattern matches at offsets <start, end> as
+well as offsets <start + 1, end>, the offsets <start, end> are not considered a
+match. Likewise, if the pattern matches at offsets <start, end> as well as
+offsets <start, end + 1>, the offsets <start, end + 1> are not considered a
+match. Therefore, when the option 1 is chosen in Danish collator, 'baad' will
+match in the string "a--båd--man" (a--b\\u00e5d--man) ONLY at offsets <3,5>.
+
+The default behavior is that described in option 2 above. To obtain the behavior
+described in option 1, you must set the normalization mode to ON in the collator
+used for search.
+
+> :point_right: **Note**: The "tightest match" behavior described above
+> is defined as "Minimal Match" in
+> [Section 8 Searching and Matching in UTS #10 Unicode Collation Collation Algorithm](http://www.unicode.org/reports/tr10/#Searching).
+> "Medial Match" and "Maximal Match" are not yet implemented by the ICU String Search service.
+
+The string search service also supports two varieties of “asymmetric search” as
+described in *[Section 8.2 Asymmetric Search in UTS #10 Unicode Collation
+Collation Algorithm](http://www.unicode.org/reports/tr10/#Asymmetric_Search)*.
+With asymmetric search, for example, unaccented characters are treated as
+“wildcards” that may match any character with the same primary weight, this
+behavior can be applied just to characters in the search pattern, or to
+characters in both the search pattern and the searched text. With the former
+behavior, searching with French behavior for 'e' might match 'e', 'è', 'é', 'ê',
+and so one, while search for 'é' would only match 'é'.
+
+Both a locale or collator can be used to specify the language-sensitive rules
+for searches. When a locale is specified, a collator will be created internally
+and the StringSearch instance that is created is responsible for the ownership
+of the collator. All the collation attributes will be considered during the
+string search operation. However, the users only can set the collator attributes
+using the collator APIs. Normalization is usually done within collation and the
+process is outside the scope of the string search service.
+
+As in other iterator interfaces, the string search service provides APIs to
+perform string matching for the first pattern occurrence, immediate next,
+previous match, and the last pattern occurrence. There are also options to allow
+for overlapping matching. For example, in English, if the string is "ababab" and
+the pattern is "abab", overlapping matching produces results of offsets <0, 3>
+and <2, 5>. Otherwise, the mutually exclusive matching produces the result
+offset <0, 3> only. To find a whole word match, the user can provide a
+locale-specific `BreakIterator` object to a `StringSearch` instance to correctly
+locate the word boundaries. For example, if "c" exists in the string "abc", a
+match is returned. However, the behavior can be overwritten by supplying a word
+`BreakIterator`.
+
+The minimum unit of match is aligned to an extended grapheme cluster in the ICU
+string search service implementation defined by [UAX #29 Unicode Text
+Segmentation](http://unicode.org/reports/tr29/). Therefore, all matches will
+begin and end on extended grapheme cluster boundaries. If the given input search
+pattern starts with non-base character, no matches will be returned.
+When there are contractions in the collation sequence and the contraction
+happens to span across the boundary of a match, it is not considered a match.
+For example, in traditional Spanish where 'ch' is a contraction, the "har"
+pattern will not match in the string "uno charo". Boundaries that are
+discontiguous contractions will yield a match result similar to those described
+above, where the end of the match returned will be one character before the
+immediate following base letter. In addition, only the first match will be
+located if a pattern contains only combining marks and the search string
+contains more than one occurrences of the pattern consecutively. For example, if
+the user searches for the pattern "´" (\\u00b4) in the string "A´´B",
+(A\\u00b4\\u00b4B) the result will be offsets <1, 2>.
+
+### Example
+
+**In C:**
+
+```C
+    char *tgtstr = "The quick brown fox jumps over the lazy dog.";
+    char *patstr = "fox";
+    UChar target[64];
+
+    UChar pattern[16];
+    int pos = 0;
+    UErrorCode status = U_ZERO_ERROR;
+    UStringSearch *search = NULL;
+
+    u_uastrcpy(target, tgtstr);
+    u_uastrcpy(pattern, patstr);
+
+
+    search = usearch_open(pattern, -1, target, -1, "en_US", 
+                          NULL, &status);
+
+
+    if (U_FAILURE(status)) {
+        fprintf(stderr, "Could not create a UStringSearch.\n");
+        return;
+    }
+
+    for(pos = usearch_first(search, &status);
+        U_SUCCESS(status) && pos != USEARCH_DONE;
+        pos = usearch_next(search, &status))
+    {
+        fprintf(stdout, "Match found at position %d.\n", pos);
+    }
+
+    if (U_FAILURE(status)) {
+        fprintf(stderr, "Error searching for pattern.\n");
+    }
+```
+
+**In C++:**
+
+```C++
+    UErrorCode status = U_ZERO_ERROR;
+    UnicodeString target("Jackdaws love my big sphinx of quartz.");
+    UnicodeString pattern("sphinx");
+    StringSearch search(pattern, target, Locale::getUS(), NULL, status);
+
+
+    if (U_FAILURE(status)) {
+        fprintf(stderr, "Could not create a StringSearch object.\n");
+        return;
+    }
+
+    for(int pos = search.first(status);
+        U_SUCCESS(status) && pos != USEARCH_DONE;
+        pos = search.next(status))
+    {
+        fprintf(stdout, "Match found at position %d.\n", pos);
+    }
+
+    if (U_FAILURE(status)) {
+        fprintf(stderr, "Error searching for pattern.\n");
+    }
+```
+
+**In Java:**
+
+```Java
+    StringCharacterIterator target = new StringCharacterIterator(
+                                         "Pack my box with five dozen liquor jugs.");
+    String pattern = "box";
+
+    try {
+        StringSearch search = new StringSearch(pattern, target, Locale.US);
+
+
+        for(int pos = search.first();
+            pos != StringSearch.DONE;
+            pos = search.next())
+        {
+            System.out.println("Match found for pattern at position " + pos); 
+        }
+    } catch (Exception e) {
+        System.err.println("StringSearch failure: " + e.toString());
+    }
+```
+
+## Performance and Other Implications
+
+The ICU string search service is designed to be on top of the ICU collation
+service. Therefore, all the performance implications that apply to a collator
+are also applicable to the string search service. To obtain the best
+performance, use the default collator attributes described in the Performance
+and Storage Implications on Attributes section in the [Collation Service
+Architecture](architecture.md#-performance-and-storage-implications-on-attributes)
+chapter. In addition, users need to be aware of
+the following `StringSearch` specific considerations:
+
+### Search Algorithm
+
+ICU4C releases up to 3.8 used the Boyer-Moore search algorithm in the string
+search service. There were some known issues in these previous releases.
+(See ICU tickets [ICU-5024](https://unicode-org.atlassian.net/browse/ICU-5024),
+[ICU-5382](https://unicode-org.atlassian.net/browse/ICU-5382),
+[ICU-5420](https://unicode-org.atlassian.net/browse/ICU-5420))
+
+In ICU4C 4.0, the string
+search service was updated with the simple linear search algorithm, which
+locates a match by shifting a cursor in the target text one by one, and these
+issues were fixed. In ICU4C 4.0.1, the Boyer-Moore search code was reintroduced
+as a separated API set as a technology preview. In a later release, this code was deleted.
+
+The Boyer-Moore searching
+algorithm is based on automata or combinatorial properties of strings and
+pre-processes the pattern and known to be much faster than the linear search
+when search pattern length is longer. According to performance evaluation
+between these two implementations, the Boyer-Moore search is faster than the
+linear search when the pattern text is longer than 3 or 4 characters.
+However, it is very tricky to get correct results with a collation-based Boyer-Moore search.
+
+### Change Iterating Direction
+
+The ICU string search service provides a set of very dynamic APIs that allow
+users to change the iterating direction randomly. For example, users can search
+for a particular word going forward by calling the `usearch_next` (C),
+`StringSearch::next` (C++) or `StringSearch.next` (Java) APIs and then search
+backwards at any point of the search operation by calling the `usearch_previous`
+(C), `StringSearch::previous` (C++) or `StringSearch.previous` (Java) APIs. Another
+way to change the iterating direction is by calling the `usearch_reset` (C),
+`StringSearch::previous` (C++) or `StringSearch.previous` (Java) APIs. Though the
+direction change can occur without calling the reset APIs first, this operation
+comes with a reduction in speed.
+
+> :point_right: **Note**: The backward search is not available with the
+> ICU4C Boyer-Moore search technology preview introduced in ICU4C 4.0.1
+> and only available with the linear search implementation.
+
+### Thai and Lao Character Boundaries
+
+In collation, certain Thai and Lao vowels are swapped with the next character.
+For example, the text string "A ขเ" (A \\u0e02\\u0e40) is processed internally
+in collation as
+"A เข" (A \\u0e40\\u0e02). Therefore, if the user searches for the pattern "Aเ"
+(A\\u0e40) in "A ขเ" (A \\u0e02\\u0e40) the string search service will match
+starting at offset 0. Since this normalization process is internal to collation,
+there is no notification that the swapping has happened. The return result
+offsets in this example will be <0, 2> even though the range would encompass one
+extra character.
+
+### Case Level Search
+
+Case level string search is currently done with the strength set to tertiary.
+When searching with the strength set to primary and the case level attribute
+turned on, results given may not be correct. The case level attribute is
+different from tertiary strength in that accents are ignored but case
+differences are not. Suppose you wanted to search for “A” in the text
+“ABC\\u00C5a”. The match found should be at 0 and 3 if using the case level
+attribute. However, searching with the case level attribute turned on finds
+matches at 0, 3, and 4, which includes the lower case 'a'. To ensure that case
+level differences are not ignored, string search must be done with at least
+tertiary strength.
diff --git a/docs/userguide/conversion/compression.md b/docs/userguide/conversion/compression.md
new file mode 100644
index 0000000..9922baa
--- /dev/null
+++ b/docs/userguide/conversion/compression.md
@@ -0,0 +1,92 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Compression
+
+## Overview of SCSU
+
+Compressing Unicode text for transmission or storage results in minimal
+bandwidth usage and fewer storage devices. The compression scheme compresses
+Unicode text into a sequence of bytes by using characteristics of Unicode text.
+The compressed sequence can be used on its own or as further input to a general
+purpose file or disk-block based compression scheme. Note that the combination
+of the Unicode compression algorithm plus disk-block based compression produces
+better results than either method alone.
+
+Strings in languages using small alphabets contain runs of characters that are
+coded close together in Unicode. These runs are typically interrupted only by
+punctuation characters, which are themselves coded in proximity to each other in
+Unicode (usually in the Basic Latin range).
+
+For additional detail about the compression algorithm, which has been approved
+by the Unicode Consortium, please refer to [Unicode Technical Report #6 (A
+Standard Compression Scheme for
+Unicode)](https://www.unicode.org/unicode/reports/tr6/).
+
+The Standard Compression Scheme for Unicode (SCSU) is used to:
+
+*   express all code points in Unicode
+
+*   approximate the storage size of traditional character sets
+
+*   facilitate the use of short strings
+
+*   provide transparency for characters between `U+0020`-`U+00FF`, as well as `CR`, `LF`
+    and `TAB`
+
+*   support very simple decoders
+
+*   support simple as well as sophisticated encoders
+
+It does not attempt to avoid the use of control bytes (including `NUL`) in the
+compressed stream.
+
+The compression scheme is mainly intended for use with short to medium length
+Unicode strings. The resulting compressed format is intended for storage or
+transmission in bandwidth limited environments. It can be used stand-alone or as
+input to traditional general purpose data compression schemes. It is not
+intended as processing format or as general purpose interchange format.
+
+## BOCU-1
+
+A MIME compatible encoding called BOCU-1 is also available in ICU. Details about
+this encoding can be found in the [Unicode Technical Note
+#6](https://www.unicode.org/notes/tn6/). Both SCSU and BOCU-1 are IANA
+registered names.
+
+## Usage
+
+The compression service in ICU is a part of Conversion framework, and follows
+the semantics of converters. For more information on how to use ICU's conversion
+service, please refer to the Usage Model section in the [Using
+Converters](converters.md) chapter.
+
+```c++
+uint16_t germanUTF16[]={
+    0x00d6, 0x006c, 0x0020, 0x0066, 0x006c, 0x0069, 0x0065, 0x00df, 0x0074
+};
+
+uint8_t germanSCSU[]={
+    0xd6, 0x6c, 0x20, 0x66, 0x6c, 0x69, 0x65, 0xdf, 0x74
+};
+char target[100];
+UChar uTarget[100];
+UErrorCode status = U_ZERO_ERROR;
+UConverter *conv;
+int32_t len;
+
+/* set up the SCSU converter */
+conv = ucnv_open("SCSU", &status);
+assert(U_SUCCESS(status));
+
+/* compress the string using SCSU */
+len = ucnv_fromUChars(conv, target, 100, germanUTF16, -1, &status);
+assert(U_SUCCESS(status));
+
+len = ucnv_toUChars(conv, uTarget, 100, germanSCSU, -1, &status);
+
+/* close the converter */
+ucnv_close(conv);
+```
diff --git a/docs/userguide/conversion/converters.md b/docs/userguide/conversion/converters.md
new file mode 100644
index 0000000..10c5731
--- /dev/null
+++ b/docs/userguide/conversion/converters.md
@@ -0,0 +1,786 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Using Converters
+
+## Overview
+
+When designing applications around Unicode characters, it is sometimes required
+to convert between Unicode encodings or between Unicode and legacy text data.
+The vast majority of modern Operating Systems support Unicode to some degree,
+but sometimes the legacy text data from older systems need to be converted to
+and from Unicode. This conversion process can be done with an ICU converter.
+
+## ICU converters
+
+ICU provides comprehensive character set conversion services, mapping tables,
+and implementations for many encodings. Since ICU uses Unicode (UTF-16)
+internally, all converters convert between UTF-16 (with the endianness according
+to the current platform) and another encoding. This includes Unicode encodings.
+In other words, internal text is 16-bit Unicode, while "external text" used as
+source or target for a conversion is always treated as a byte stream.
+
+ICU converters are available for a wide range of encoding schemes. Most of them
+are based on mapping table data that is handled by few generic implementations.
+Some encodings are implemented algorithmically in addition to (or instead of)
+using mapping tables, especially Unicode encodings. The partly or entirely
+table-based encoding schemes include: All ICU converters map only single Unicode
+character code points to and from single codepage character code points. ICU
+converters **do not** deal directly with combining characters, bidirectional
+reordering, or Arabic shaping, for example. Such processes, if required, must be
+handled separately. For example, while in Unicode, the ICU BiDi APIs can be used
+for bidirectional reordering after a conversion to Unicode or before a
+conversion from Unicode.
+
+ICU converters are not designed to perform any encoding autodetection. This
+means that the converters do not autodetect "endianness", the 6 Unicode encoding
+signatures, or the Shift-JIS vs. EUC-JP, etc. There are two exceptions: The
+UTF-16 and UTF-32 converters work according to Unicode's specification of their
+Character Encoding Schemes, that is, they read the BOM to figure out the actual
+"endianness".
+
+The ICU mapping tables mostly come from an [IBM® codepage
+repository](http://www.ibm.com/software/globalization/cdra). For non-IBM
+codepages, there is typically an equivalent codepage registered with this
+repository. However, the textual data format (.ucm files) is generic, and data
+for other codepage mapping tables can also be added.
+
+## Using the Default Codepage
+
+ICU has code to determine the default codepage of the system or process. This
+default codepage can be used to convert `char *` strings to and from Unicode.
+
+Depending on system design, setup and APIs, it may not always be possible to
+find a default codepage that fully works as expected. For example,
+
+1.  On Windows there are three encodings in use at the same time. Unicode
+    (UTF-16) is always used inside of Windows, while for `char *` encodings there
+    are two classes, called "ANSI" and "OEM" codepages. ICU will use the ANSI
+    codepage. Note that the OEM codepage is used by default for console window
+    output.
+
+2.  On some UNIX-type systems, non-standard names are used for encodings, or
+    non-standard encodings are used altogether. Although ICU supports over 200
+    encodings in its standard build and many more aliases for them, it will not
+    be able to recognize such non-standard names.
+
+3.  Some systems do not have a notion of a system or process codepage, and may
+    not have APIs for that.
+
+If you have means of detecting a default codepage name that are more appropriate
+for your application, then you should set that name with `ucnv_setDefaultName()`
+as the first ICU function call. This makes sure that the internally cached
+default converter will be instantiated from your preferred name.
+
+Starting in ICU 2.0, when a converter for the default codepage cannot be opened,
+a fallback default codepage name and converter will be used. On most platforms,
+this will be US-ASCII. For z/OS (OS/390), ibm-1047,swaplfnl is the default
+fallback codepage. For AS/400 (iSeries), ibm-37 is the default fallback
+codepage. This default fallback codepage is used when the operating system is
+using a non-standard name for a default codepage, or the converter was not
+packaged with ICU. The feature allows ICU to run in unusual computing
+environments without completely failing.
+
+## Usage Model
+
+A "Converter" refers to the C structure "UConverter". Converters are cheap to
+create. Any data that is shared between converters of the same kind (such as the
+mappings, the name and the properties) are automatically cached and shared in
+memory.
+
+### Converter Names
+
+Codepages with encoding schemes have been given many names by various vendors
+and platforms over the years. Vendors have different ways specify which codepage
+and encoding are being used. IBM uses a CCSID (Coded Character Set IDentifier).
+Windows uses a CPID (CodePage IDentifier). Macintosh has a TextEncoding. Many
+Unix vendors use [IANA](http://www.iana.org/assignments/character-sets)
+character set names. Many of these names are aliases to converters within ICU.
+
+In order to help identify which names are recognized by certain platforms, ICU
+provides several converter alias functions. The complete description of these
+functions can be found in the [ICU API
+Reference](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ucnv_8h.html) .
+
+| Function Names | Short Description |
+| -------------- | ----------------- |
+| `ucnv_countAvailable`, `ucnv_getAvailableName` | Get a list of available converter names that can be opened. |
+| `ucnv_openAllNames` | Get a list of all known converter names. |
+| `ucnv_getName` | Get the name of an open converter. |
+| `ucnv_countAliases`, `ucnv_getAlias` | Get the list of aliases for the specified converter. |
+| `ucnv_countStandards`, `ucnv_getStandard` | Get the list of known standards. |
+| `ucnv_openStandardNames` | Get a filtered list of aliases for a converter that is known by the specified standard. |
+| `ucnv_getStandardName` | Get the preferred alias name specified by a given standard. |
+| `ucnv_getCanonicalName` | Get the converter name from the alias that is recognized by the specified standard. |
+| `ucnv_getDefaultName` | Get the default converter name that is currently used by ICU and the operating system. |
+| `ucnv_setDefaultName` | Use this function to override the default converter name. |
+
+Even though IANA specifies a list of aliases, it usually does not specify the
+mappings or the actual character set for the aliases. Sometimes vendors will map
+similar glyph variants to different Unicode code points or sometimes they will
+assign completely different glyphs for the same codepage code point. Because of
+these ambiguities, you can sometimes get U_AMBIGUOUS_ALIAS_WARNING for the
+returned UErrorCode when more than one converter uses the requested alias. This
+is only a warning, and the results can still be used. This UErrorCode value is
+just a reminder that you may not get what you expected. The above functions can
+help you to determine which converter you actually wanted.
+
+EBCDIC based converters do have the option to swap the newline and linefeed
+character mappings. This can be useful when transferring EBCDIC documents
+between z/OS (MVS, os/390 and the rest of the zSeries family) and another EBCDIC
+machine like OS/400 on iSeries. The ",swaplnlf" or `UCNV_SWAP_LFNL_OPTION_STRING`
+from ucnv.h can be appended to a converter alias in order to achieve this
+behavior. You can view other available options in ucnv.h.
+
+You can always skip many of these aliasing and mapping problems by just using
+Unicode.
+
+### Creating a Converter
+
+There are four ways to create a converter:
+
+1.  **By name**: Converters can be created using different types of names. No
+    distinction is made when the converter is created, as to which name is being
+    employed. There are many types of aliases possible. Among these are
+    [IANA](http://www.iana.org/assignments/character-sets) ("shift_jis",
+    "koi8-r", or "iso-8859-3"), host specific names ("cp1252" which is the name
+    for a Microsoft® Windows™ or a similar IBM® codepage). Finally, ICU's own
+    internal canonical names for a converter can be used. These include "UTF-8"
+    or "ISO-8859-1" for built-in conversion types, and names such as
+    "ibm-949_P110-2000" (Shift-JIS with '\\' <-> '¥' mapping) or
+    "ibm-949_P11A-2000" (Shift-JIS with '\\' <-> '\\' mapping) for data-file
+    based conversions.
+    ```C
+    UConverter *conv = ucnv_open("shift_jis", &myError);
+    ```
+    As a convenience, converter names can be passed in as Unicode. (for example,
+    if a user passed in the string from a Unicode-based user interface).
+    However, the actual names are restricted to an invariant ASCII/EBCDIC
+    subset.
+    ```C
+    UChar *name = ...; UConverter *conv = ucnv_openU(name, &myError);
+    ```
+    Converter names are case-insensitive. In addition, beginning with ICU 3.6,
+    leading zeroes are ignored in sequences of digits (if further digits
+    follow), and all non-alphanumeric characters are ignored. Thus the strings
+    "UTF-8", "utf_8", "u\*T@f08" and "Utf 8" are equivalent. (Before ICU 3.6,
+    leading zeroes were not ignored, and only spaces, dashes and underscores
+    were ignored.) The `ucnv_compareNames()` function provides such string
+    comparisons.
+    Unlike the names of resources or other types of ICU data, converter names
+    can **not** be qualified with a path that indicates the directory or common
+    data file containing the corresponding converter data. The requested
+    converter's data must be present either in the main ICU data library or as a
+    separate file located in the ICU data directory. However, you can always
+    create a package of converters with pkgdata and open a converter from the
+    package with `ucnv_openPackage()`
+    ```C
+    UConverter *conv = ucnv_openPackage("./myPackage.dat", "customConverter", &myError);
+    ```
+
+2.  **By number**: The design of the ICU is to accommodate codepages provided by
+    different vendors. For example, the IBM CDRA (Character Data Representation
+    Architecture which is an IBM architecture that defines a set of identifiers)
+    has an ID type called the CCSID (Coded Character Set Identifier). The ICU
+    API for opening a codepage by number must be given a vendor along with the
+    number. Currently, only IBM (`UCNV_IBM`) is supported. For example, the US
+    EBCDIC codepage (IBM #37) can be opened with the following code:
+    ```C
+    ucnv_openCCSID(37, UCNV_IBM, &myErr);
+    ```
+
+3.  **By iteration**: An application might not know ahead of time which codepage
+    to use, and thus might need to query ICU to determine the entire list of
+    installed converters. The ICU returns a list of its canonical (internal)
+    names. From each names, the standard IANA name can be determined, and also a
+    list of aliases which point to that name can be determined. For example, ICU
+    might return among the canonical names "ibm-367". That name itself may or
+    may not provide the application or its users with the information needed.
+    (367 is actually the decimal form of a number that is calculated by
+    appending certain hex digits together.) However, the IANA name can be
+    requested from this canonical name, which should return something like
+    "us-ascii". The alias list for ibm-367 can be iterated over as well, which
+    returns additional names like "ascii", "646", "ansi_x3.4-1968" etc. If this
+    is not sufficient information, once a converter is opened, it can be queried
+    for its type, min and max char size, etc. This information is not available
+    without actually opening the converter (a fairly lightweight process.)
+    ```C
+    /* Returns count of the number of available names */
+    int count = ucnv_countAvailable();
+    /* get the canonical name of the 36th available converter */
+    const char *convName1 = ucnv_getAvailableName(36);
+    /* get the 3rd alias for a given codepage. */
+    const char *asciiAlias = ucnv_getAlias("ibm-367", 3, &myError);
+    /* Get the IANA name of the converter */
+    const char *ascii = ucnv_getStandardName("ibm-367", "IANA");
+    /* Get the one of the non preferred IANA name of the converter. */
+    UEnumeration *asciiEnum =
+    ucnv_openStandardNames("ibm-367", "IANA", &myError);
+    uenum_next(asciiEnum, &myError); /* skip preferred IANA alias */
+    /* get one of the non-preferred IANA aliases */
+    const char *ascii2 = uenum_next(asciiEnum, &myError);
+    uenum_close(asciiEnum);
+    ```
+
+4.  **By using the default converter**: The default converter can be opened by
+    passing a NULL as the name of the converter.
+    ```C
+    ucnv_open(NULL, &myErr);
+    ```
+
+> :point_right: **Note**: ICU chooses this converter based on the best information available to it.
+The purpose of this converter is to interface with the OS using a codepage (i.e. `char *`).
+Do not use it as a way of determining the best overall converter to use.
+Usually any Unicode encoding form is the best way to store and send text data,
+so that important data does not get lost in the conversion.\
+Also, if the OS supports Unicode-based API's (such as Win32),
+it is better to use only those Unicode API's.
+As an example, the new Windows 2000 locales (such as Hindi) do not
+define the default codepage to something that supports Hindi.
+The default converter is used in expressions such as: `UnicodeString text("abc");`
+to convert 'abc', and in the u_uastrcpy() C functions.\
+Code operating at the [OS level](../design.md) MAY choose to
+change the default converter with `ucnv_setDefaultName()`.
+However, be aware that this change has inconsistent results if it is done after
+ICU components are initialized.
+
+### Closing a Converter
+
+Closing a converter frees memory occupied by that instance of the converter.
+However it does not release the larger shared data tables the converter might
+use. OS-level code may call `ucnv_flushCache()` to explicitly free memory occupied
+by [unused tables](../design.md) .
+
+```C
+ucnv_close(conv)
+```
+
+### Converter Life Cycle
+
+Note that a Converter is created with a certain type (for instance, ISO-8859-3)
+which does not change over the life of that [object](../design.md) . Converters
+should be allocated one per thread. They are cheap to create, as the shared data
+doesn't need to be reallocated.
+
+This is the typical life cycle of a converter, as shown step-by-step:
+
+1.  First, open up the converter with a specified name (or alias name).
+    ```C
+    UConverter *conv = ucnv_open("shift_jis", &status);
+    ```
+
+2.  Target here is the `char s[]` to write into, and targetSize is how big the
+    target buffer is. Source is the UChars that are being converted.
+    ```C
+    int32_t len = ucnv_fromUChars(conv, target, targetSize, source, u_strlen(source), &status);
+    ```
+
+3.  Clean up the converter.
+    ```C
+    ucnv_close(conv);
+    ```
+
+### Sharing Converters Between Threads
+
+A converter cannot be shared between threads at the same time. However, if it is
+reset it can be used for unrelated chunks of data. For example, use the same
+converter for converting data from Unicode to ISO-8859-3, and then reset it. Use
+the same converter for converting data from ISO-8859-3 back into Unicode.
+
+### Converting Large Quantities of Data
+
+If it is necessary to convert a large quantity of data in smaller buffers, use
+the same converter to convert each buffer. This will make sure any state is
+preserved from one chunk to the next. Doing this conversion is known as
+streaming or buffering, and is mentioned Buffered Conversion section (§) later
+in this chapter.
+
+### Cloning a Converter
+
+Cloning a converter returns a clone of the converter object along with any
+internal state that the converter might be storing. Cloning routines must be
+used with extreme care when using converters for stateful or multibyte
+encodings. If the converter object is carrying an internal state, and the
+newly-created clone is used to convert a new chunk of text, the converter
+produces incorrect results. Also note that the caller owns the cloned object and
+has to call `ucnv_close()` to dispose of the object. Calling `ucnv_reset()` before
+cloning will reset the converter to its original state.
+
+```C
+UConverter* newCnv = ucnv_safeClone(oldCnv, 0, &bufferSize, &err)
+```
+
+## Converter Behavior
+
+### Conversion
+
+1.  The converters always consume the source buffer as far as possible, and
+    advance the source pointer.
+
+2.  The converters write to the target all converted output as far as possible,
+    and then write any remaining output to the internal services buffer. When
+    the conversion routines are called again, the internal buffer is flushed out
+    and written to the target buffer before proceeding with any further
+    conversion.
+
+3.  In conversions to Unicode from Multi-byte encodings or conversions from
+    Unicode involving surrogates, if a) only a partial byte sequence is
+    retrieved from the source buffer, b) the "flush" parameter is set to "TRUE"
+    and c) the end of source is reached, then the callback is called with
+    U_TRUNCATED_CHAR_FOUND.
+
+### Reset
+
+Converters can be reset explicitly or implicitly. Explicit reset is done by
+calling:
+
+1.  `ucnv_reset()`: Resets the converter to initial state in both directions.
+
+2.  `ucnv_resetToUnicode()`: Resets the converter to initial state to Unicode
+    direction.
+
+3.  `ucnv_resetFromUnicode()`: Resets the converter to initial state from Unicode
+    direction.
+
+The converters are reset implicitly when the conversion functions are called
+with the "flush" parameter set to "TRUE" and the source is consumed.
+
+### Error
+
+#### Conversion from Unicode
+
+Not all characters can be converted from Unicode to other codepages. In most
+cases, Unicode is a superset of the characters supported by any given codepage.
+
+The default behavior of ICU in this case is to substitute the illegal or
+unmappable sequence, with the appropriate substitution sequence for that
+codepage. For example, ISO-8859-1, along with most ASCII-based codepages, has
+the character 0x1A (Control-Z) as the substitution sequence. When converting
+from Unicode to ISO-8859-1, any characters which cannot be converted would be
+replaced by 0x1A's.
+
+SubChar1 is sometimes used as substitution character in MBCS conversions. For
+more information on SubChar1 please see the [Conversion Data](data.md) chapter.
+
+In stateful converters like ISO-2022-JP, if a substitution character has to be
+written to the target, then an escape/shift sequence to change the state to
+single byte mode followed by a substitution character is written to the target.
+
+The substitution character can be changed by calling the `ucnv_setSubstChars()`
+function with the desired codepage byte sequence. However, this has some
+limitations: It only allows setting a single character (although the character
+can consist of multiple bytes), and it may not work properly for some stateful
+converters (like HZ or ISO 2022 variants) when setting a multi-byte substitution
+character. (It will work for EBCDIC_STATEFUL ones.) Moreover, for setting a
+particular character, the caller needs to know the correct byte sequence for
+that character in the converter's codepage. (For example, a space (U+0020) is
+encoded as 0x20 in ASCII-based codepages, 0x40 in EBCDIC-based ones, 0x00 0x20
+or 0x20 0x00 in UTF-16 depending on the stream's endianness, etc.)
+
+The `ucnv_setSubstString()` function (new in ICU 3.6) lifts these limitations. It
+takes a Unicode string and verifies that it can be converted to the codepage
+without error and that it is not too long (32 bytes as of ICU 3.6). The string
+can contain zero, one or more characters. An empty string has the effect of
+using the skip callback. See the Error Callbacks below. Stateful converters are
+fully supported. The same Unicode string will give equivalent results with all
+converters that support its conversion.
+
+Internally, `ucnv_setSubstString()` stores the byte sequence from the test
+conversion if the converter is stateless, or the Unicode string itself if the
+converter is stateful. If the Unicode string is stored, then it is converted on
+the fly during substitution, handling all state transitions.
+
+The function `ucnv_getSubstChars()` can be used to retrieve the substitution byte
+sequence if it is the default one, set by `ucnv_setSubstChars()`, or if
+`ucnv_setSubstString()` stored the byte sequence for a stateless converter. The
+Unicode string set for a stateful converter cannot be retrieved.
+
+#### Conversion to Unicode
+
+In conversion to Unicode, errors are normally due to ill-formed byte sequences:
+Unused byte values, or lead bytes not followed by trail bytes according to the
+encoding scheme. Well-formed but unmappable sequences are unusual but possible.
+
+The ICU default behavior is to emit an U+FFFD REPLACEMENT CHARACTER per
+offending sequence.
+
+If the conversion table .ucm file contains a <subchar1> entry (such as in the
+ibm-943 table), a U+001A C0 control ("SUB") is emitted for single-byte
+illegal/unmappable input rather than U+FFFD REPLACEMENT CHARACTER. For details
+on this behavior look for "001A" in the [Conversion Data](data.md) chapter.
+
+*   This behavior originates from mainframes with dedicated
+    single-byte-to-single-byte and double-to-double conversions.
+*   Emitting U+001A for single-byte errors can be avoided by (a) removing the
+    <subchar1> mapping or (b) using a similar conversion table that does not
+    have this mapping (e.g., windows-932 instead of ibm-943) or (c) writing a
+    custom callback function.
+
+### Error Codes
+
+Here are some of the `UErrorCode`s which have significant meaning for conversion:
+
+#### U_INDEX_OUTOFBOUNDS_ERROR
+
+In `getNextUChar()` - all source data
+has been consumed without producing a Unicode character
+
+#### U_INVALID_CHAR_FOUND
+No mapping was found from the source to the target encoding. For example, U+0398
+(Capital Theta) has no mapping into ISO-8859-1, and so U_INVALID_CHAR_FOUND
+will result.
+
+#### U_TRUNCATED_CHAR_FOUND
+
+All of the source data was read, and a
+character sequence was incomplete. For example, only half of a double-byte
+sequence may have been encountered. When converting FROM Unicode, this error
+would occur when a conversion ends with a low surrogate (U+D800) at the end of
+the source, with no corresponding high surrogate.
+
+#### U_ILLEGAL_CHAR_FOUND
+
+A character sequence was found in the source which is disallowed in the source
+encoding scheme. For example, many MBCS encodings have only certain byte
+sequences which are allowed as lead bytes. When converting from Unicode, if a
+low surrogate is NOT followed immediately by a high surrogate, or a high
+surrogate without its preceding low surrogate, an illegal sequence results.
+Note: Most, but not all, converters forbid surrogate code points or unpaired
+surrogate code units. (Lead surrogate without trail, or trail without lead.)
+Some converters permit surrogate code points/unpaired surrogates because their
+charset specification permits it. For example, LMBCS, SCSU and
+BOCU-1.
+
+#### U_INVALID_TABLE_FORMAT
+
+An error occurred trying to read the backing data
+for the converter. The data could be corrupt, or the wrong
+version.
+
+#### U_BUFFER_OVERFLOW_ERROR
+
+More output (target) characters were produced
+than fit in the target buffer. If in `to/fromUnicode()`, then process the target
+buffer and call the function again to retrieve the overflowed characters.
+
+### Error Callbacks
+
+What actually happens is that an "error callback function" is called at the
+point where the conversion failure occurred. The function can deal with the
+failed characters as it sees fit. Possible options at the callback's disposal
+include ignoring the bad sequence, converting it to a different sequence, and
+returning an error to the caller. The callback can also consume any data past
+where the error occurred, whether or not that data would have caused an error.
+Only one callback is installed at a time, per direction (to or from unicode).
+
+A number of canned functions are provided by ICU, and an application can write
+new ones. The "callbacks" are either From Unicode (to codepage), or To Unicode
+(from codepage). Here is a list of the canned callbacks in ICU:
+
+1.  UCNV_**FROM_U**_CALLBACK_SUBSTITUTE: This callback is installed by default.
+    It will write the codepage's substitute sequence or a user-set substitute
+    sequence, or convert a user-set substitute UnicodeString to the codepage.
+    See "Error / Conversion from Unicode" above.
+
+2.  UCNV_**TO_U**_CALLBACK_SUBSTITUTE: This callback is installed by default. It
+    will write U+FFFD or sometimes U+001A. See "Error / Conversion to Unicode"
+    above.
+
+3.  UCNV_FROM_U_CALLBACK_SKIP, UCNV_TO_U_CALLBACK_SKIP: Simply ignores any
+    invalid characters in the input, no error is returned.
+
+4.  UCNV_FROM_U_CALLBACK_STOP, UCNV_TO_U_CALLBACK_STOP: Stop at the error.
+    Return the error to the caller. (When using the 'BUFFER' mode of conversion,
+    the source and target pointers returned can be examined to determine where
+    the error occurred. ucnv_getInvalidUChars() and ucnv_getInvalidChars()
+    return the actual text which failed).
+
+5.  UCNV_FROM_U_CALLBACK_ESCAPE, UCNV_TO_U_CALLBACK_ESCAPE: This callback is
+    especially useful for debugging. Missing codepage characters are replaced by
+    strings such as '%U094D' with the Unicode value, and missing Unicode chars
+    are replaced with text of the form '%X0A' where the codepage had the
+    unconvertible byte hex 0A.
+    When a callback is set, a "context" pointer is also provided. How this
+    pointer is created depends on the specific callback. There is usually a
+    createContext() function for that specific callback, where the caller can
+    set certain options for the callback. Consult the documentation for the
+    specific callback you are using. For ICU's canned callbacks, this pointer
+    may be set to NULL. The functions for setting a different callback also
+    return the old callback, and the old context pointer. These may be stored so
+    that the old callback is re-installed when an operation is finished.
+    Additionally the following options can be passed as the context parameter to
+    UCNV_FROM_U_CALLBACK_ESCAPE callback function to produce different outputs.
+    | UCNV_ESCAPE_ICU     | %U12345 |
+    | ------------------- | ------- |
+    | UCNV_ESCAPE_JAVA    | \\u1234 |
+    | UCNV_ESCAPE_C       | \\udbc9\\udd36 for Plane 1 and \\u1234 for Plane 0 codepoints |
+    | UCNV_ESCAPE_XML_DEC | \&#4460; number expressed in Decimal |
+    | UCNV_ESCAPE_XML_HEX | \&#x1234; number expressed in Hexadecimal |
+
+Here are some examples of how to use callbacks.
+
+```C
+UConverter *u;
+void *oldContext, *newContext;
+UConverterFromUCallback oldAction, newAction;
+u = ucnv_open("shift_jis", &myError);
+... /* do some conversion with u from unicode.. */
+ucnv_setFromUCallBack(u, MY_FROMU_CALLBACK, newContext, &oldAction, &oldContext, &myError);
+... /* do some other conversion from unicode */
+/* Now, set the callback back */
+ucnv_setFromUCallBack(u, oldAction, oldContext, &newAction, &newContext, &myError);
+```
+
+### Custom Callbacks
+
+Writing a callback is somewhat involved, and will be covered more completely in
+a future version of this document. One might look at the source to the provided
+callbacks as a starting point, and address any further questions to the mailing
+list.
+
+Basically, callback, unlike other ICU functions which expect to be called with
+U_ZERO_ERROR as the input, is called in an exceptional error condition. The
+callback is a kind of 'last ditch effort' to rectify the error which occurred,
+before it is returned back to the caller. This is why the implementation of STOP
+is very simple:
+
+```C
+void UCNV_FROM_U_CALLBACK_STOP(...) { }
+```
+
+The error code such as U_INVALID_CHAR_FOUND is returned to the user. If the
+callback determines that no error should be returned to the user, then the
+callback must set the error code to U_ZERO_ERROR. Note that this is a departure
+from most ICU functions, which are supposed to check the error code and return
+immediately if it is set.
+
+> :point_right: **Note**: See the functions `ucnv_cb_write...()` for
+functions which a callback may use to perform its task.
+
+#### Ignore Default_Ignorable_Code_Point
+
+Unicode has a number of characters that are not by themselves meaningful but
+assist with line breaking (e.g., U+00AD Soft Hyphen & U+200B Zero Width Space),
+bi-directional text layout (U+200E Left-To-Right Mark), collation and other
+algorithms (U+034F Combining Grapheme Joiner), or indicate a preference for a
+particular glyph variant (U+FE0F Variation Selector 16). These characters are
+"invisible" by default, that is, they should normally not be shown with a glyph
+of their own, except in special circumstances. Examples include showing a hyphen
+for when a Soft Hyphen was used for a line break, or modifying the glyph of a
+character preceding a Variation Selector.
+
+Unicode has a character property to identify such characters, as well as
+currently-unassigned code points that are intended to be used for similar
+purposes: Default_Ignorable_Code_Point, or "DI" for short:
+<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:DI:]>
+
+Most charsets do not have most or any of these characters.
+
+**ICU 54 and above by default skip default-ignorable code points if they are
+unmappable**. (Ticket #[10551](http://bugs.icu-project.org/trac/ticket/10551))
+
+**Older versions of ICU** replaced unmappable default-ignorable code points like
+any other unmappable code points, by a question mark or whatever substitution
+character is defined for the charset.
+
+For best results, a custom from-Unicode callback can be used to ignore
+Default_Ignorable_Code_Point characters that cannot be converted, so that they
+are removed from the charset output rather than replaced by a visible character.
+
+This is a code snippet for use in a custom from-Unicode callback:
+
+```C
+#include "unicode/uchar.h"
+...
+(from-Unicode callback)
+    switch(reason) {
+    case UCNV_UNASSIGNED:
+        if(u_hasBinaryProperty(codePoint, UCHAR_DEFAULT_IGNORABLE_CODE_POINT)) {
+            // Ignore/drop default ignorable code points that cannot be converted,
+            // rather than treating them like errors/writing a substitution character etc.
+            // For example, U+200B Zero Width Space,
+            // U+200E Left-To-Right Mark, U+FE0F Variation Selector 16.
+            *pErrorCode = U_ZERO_ERROR;
+            return;
+        } else {
+            ...
+```
+
+## Modes of Conversion
+
+When a converter is instantiated, it can be used to convert both in the Unicode
+to Codepage direction, and also in the Codepage to Unicode direction. There are
+three ways to use the converters, as well as a convenience function which does
+not require the instantiation of a converter.
+
+1.  **Single-String**: Simplest type of conversion to or from Unicode. The data
+    is entirely contained within a single string.
+
+2.  **Character**: Converting from the codepage to a single Unicode codepoint,
+    one at a time.
+
+3.  **Buffer**: Convert data which may not fit entirely within a single buffer.
+    Usually the most efficient and flexible.
+
+4.  **Convenience**: Convert a single buffer from one codepage to another
+    through Unicode, without requiring the instantiation of a converter.
+
+### 1. Single-String
+
+Data must be contained entirely within a single string or buffer.
+
+```C
+conv = ucnv_open("shift_jis", &status);
+/* Convert from Unicode to Shift JIS */
+len = ucnv_fromUChars(conv, target, targetLen, source, sourceLen, &status);
+ucnv_close(conv);
+conv = ucnv_open("iso-8859-3", &status);
+/* Convert from ISO-8859-3 to Unicode */
+len = ucnv_toUChars(conv, target, targetSize, source, sourceLen, &status);
+ucnv_close(conv);
+```
+
+### 2. Character
+
+In this type, the input data is in the specified codepage. With each function
+call, only the next Unicode codepoint is converted at a time. This might be the
+most efficient way to scan for a certain character, or other processing of a
+single character at a time, because converters are stateful. This works even for
+multibyte charsets, and for stateful ones such as iso-2022-jp.
+
+```C
+conv = ucnv_open("Big-5", &status);
+UChar32 target;
+while(source < sourceLimit) {
+    target = ucnv_getNextUChar(conv, &source, sourceLimit, &status);
+    ASSERT(status);
+    processChar(target);
+}
+```
+
+### 3. Buffered or Streamed
+
+This is used in situations where a large document may be read in off of disk and
+processed. Also, many codepages take multiple bytes to encode a character, or
+have state. These factors make it impossible to convert arbitrary chunks of data
+without maintaining state across chunks. Even conversion from Unicode may
+encounter a leading surrogate at the end of one buffer, which needs to be paired
+with the trailing surrogate in the next buffer.
+
+A basic API principle of the ICU to/from Unicode functions is that they will
+ALWAYS attempt to consume all of the input (source) data, unless the output
+buffer is full or some other error occurs. In other words, there is no need to
+ever test whether all of the source data has been consumed.
+
+The basic loop that is used with the ICU buffer conversion routines is the same
+in the to and from Unicode directions. In the following pseudocode, either
+'source' (for fromUnicode) or 'target' (for toUnicode) are UTF-16 UChars.
+
+```C
+UErrorCode err = U_ZERO_ERROR;
+while (... /*input data available*/ ) {
+    ... /* read input data into buffer */
+    source = ... /* beginning of read data */;
+    sourceLimit = source + readLength; // end + 1
+    UBool flush = (further input data still available) // (i.e. feof())
+    /* loop until all source has been processed */
+    do {
+        /* set up target pointers */
+        target = ... /* beginning of output buffer */;
+        targetLimit = target + sizeOfOutput;
+        err = U_ZERO_ERROR; /* so that the to/from does not fail */
+        ucnv_to/fromUnicode(converter, &target, targetLimit, &source, sourceLimit, NULL, flush, &err);
+        ... /* write (target-beginningOfOutputBuffer) items starting at beginning of output buffer */
+    } while (err == U_BUFFER_OVERFLOW_ERROR);
+    if(U_FAILURE(error)) {
+        ... /* process error */
+        break; /* out of the 'while' loop that reads source data */
+    }
+}
+/* loop to read input data */
+if(U_FAILURE(error)) {
+    ... /* process error further */
+}
+```
+
+The above code optimizes for processing entire chunks of input data. An
+efficient size for the output buffer can be calculated as follows. (in bytes):
+
+```C
+ucnv_getMinCharSize() * inputBufferSize * sizeof(UChar)
+ucnv_getMaxCharSize() * inputBufferSize
+```
+
+There are two loops used, an outer and an inner. The outer loop fetches input
+data to keep the source buffer full, and the inner loop 'writes' out data to
+keep the output buffer empty.
+
+Note that while this efficiently handles data on the input side, there are some
+cases where the size of the output buffer is fixed. For instance, in network
+applications it is sometimes desirable to fill every output packet completely
+(not including the last packet in the sequence). The above loop does not ensure
+that every output buffer is completely full. For example, if a 4 UChar input
+buffer was used, and a 3 byte output buffer with fromUnicode(), the loop would
+typically write 3 bytes, then 1, then 3, and so on. If, instead of efficient use
+of the input data, the goal is filling output buffers, a slightly different loop
+can be used.
+
+In such a scenario, the inner write does not occur unless a buffer overflow
+occurs OR 'flush' is true. So, the 'write' and resetting of the target and
+targetLimit pointers would only happen
+`if(err == U_BUFFER_OVERFLOW_ERROR || flush == TRUE)`
+
+The flush parameter on each conversion call should be set to FALSE, until the
+conversion call is called for the last time for the buffer. This is because the
+conversion is stateful. On the last conversion call, the flush parameter should
+be set to TRUE. More details are mentioned in the API reference in
+[ucnv.h](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ucnv_8h.html) .
+
+### 4. Pre-flighting
+
+Preflighting is the process of asking the conversion API for the size of target
+buffer required. (For a more general discussion, see the Preflighting section
+(§) in the [Strings](../strings/index.md) chapter.)
+
+This is accomplished by calling the `ucnv_fromUChars` and `ucnv_toUChars` functions.
+
+```C
+UChar uchar2;
+char input_char_buffer = "This is some text";
+targetsize = ucnv_toUChars(myConverter, NULL, targetcapacity, input_char_buffer, sizeof(input_char_buffer), &err);
+if(err==U_BUFFER_OVERFLOW_ERROR) {
+    err=U_ZERO_ERROR;
+    uchar2=(UChar*)malloc((targetsize) * sizeof(UChar));
+    targetsize = ucnv_toUChars(myConverter, uchar2, targetsize,
+    input_char_buffer, sizeof(input_char_buffer), &err);
+    if(U_FAILURE(err)) {
+        printf("ucnv_toUChars() FAILED %s\\n", myErrorName(err));
+    } else {
+        printf("ucnv_toUChars() o.k.\\n");
+    }
+}
+```
+
+> :point_right: **Note**: This is inefficient since the conversion is performed twice, once for finding
+the size of target and once for writing to the target.
+
+### 5. Convenience
+
+ICU provides some convenience functions for conversions:
+
+```C
+ucnv_toUChars(myConverter, target_uchars, targetsize, input_char_buffer, sizeof(input_char_buffer), &err);
+ucnv_fromUChars(cnv, cTarget, (cTargetLimit-cTarget), uSource, (uSourceLimit-uSource), &errorCode);
+char target[100];
+UnicodeString str("ABCDEF", "iso-8859-1");
+int32_t targetsize = str.extract(0, str.length(), target, sizeof(target), "SJIS");
+target[targetsize] = 0; /* NULL termination */
+```
+
+## Conversion Examples
+
+See the [ICU Conversion
+Examples](https://github.com/unicode-org/icu/blob/master/icu4c/source/samples/ucnv/convsamp.cpp)
+for more information.
diff --git a/docs/userguide/conversion/data.md b/docs/userguide/conversion/data.md
new file mode 100644
index 0000000..ed31a2c
--- /dev/null
+++ b/docs/userguide/conversion/data.md
@@ -0,0 +1,673 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Conversion Data
+
+## Introduction
+
+### Algorithmic vs. Data-based
+
+In a comprehensive conversion library, there are three kinds of codepage
+converter implementations: converters that use algorithms, mapping data, or
+those converters that use both.
+
+1.  Most codepages have a simple and straightforward structure but have an
+    arbitrary relationship between input and output character codes. Mapping
+    tables are necessary to define the conversion. If the codepage characters
+    use more than one byte each, then the mapping table must also define the
+    structure of the codepage.
+
+2.  Algorithmic converters work by transforming the input stream with built-in
+    algorithms and possibly small, hard coded tables. The conversion can be
+    complex, but the actual mapping of a character code is done numerically if
+    the converter is purely algorithmic.
+
+3.  In some cases, a converter needs to be algorithmic for its basic operations
+    but also relies on mapping data.
+
+ICU provides converter implementations for all three groups of codepages. Since
+ICU always converts, to or from Unicode, the purely algorithmic converters are
+the ones for Unicode encodings (such as UTF-8, UTF-16BE, UTF-16LE, UTF-32BE,
+UTF-32LE, SCSU, BOCU-1 and UTF-7). Since Unicode is based on US-ASCII and
+ISO-8859-1 ("ISO Latin-1"), these encodings also use algorithmic converters for
+performance reasons.
+
+Most other codepages use simple byte sequences but are not encodings of Unicode.
+They are converted with generic code using mapping data tables. ICU also
+supports a few encodings, like ISO-2022 and its variants, that employ an
+algorithmic structure to switch between a set of codepages. The converters for
+these encodings are algorithmic but use mapping tables for the embedded
+codepages.
+
+### Stateful vs. Stateless
+
+Character encodings are either stateful or stateless:
+
+1.  Stateless encodings define a byte sequence for each character. Complete
+    character byte sequences can be used in any order, and the same complete
+    character byte sequences always encodes the same characters. It is
+    preferable to always encode one character using the same byte sequence.
+
+2.  Stateful encodings define byte sequences that change the state of the text
+    stream. Depending on the current state, the same byte sequence may encode a
+    different character and the same character may be encoded with different
+    byte sequences.
+
+This distinction between stateless and stateful encodings is important, because
+it determines if any available ICU converter implementation is used. The
+following are some more important considerations related to stateless versus
+stateful encodings:
+
+1.  A runtime converter object is always stateful, even for "stateless"
+    encodings. They are always stateful because an input buffer may end with a
+    partial byte sequence that is to be continued in the next input buffer in
+    the following conversion call. The information about this is stored in the
+    converter object. Similarly, if the input is Unicode text, then an input
+    buffer may end with the first of a pair of surrogates. The converter object
+    also stores overflow bytes or code units if the result of a character
+    mapping did not fit entirely into the output buffer.
+
+2.  Stateless encodings are stateful in our converter implementation to
+    interpret "complete byte sequences". They are "stateful" because many
+    encodings can have the same byte value used in different positions of byte
+    sequences for different characters; a specific byte value may be a lead byte
+    or a trail byte. For instance, the lead and trail byte values overlap in
+    codepages like Shift-JIS. If a program does not start reading at a character
+    boundary, it may instead interpret the byte sequences from two or more
+    separate characters as one character. Often, character boundaries can be
+    detected reliably only by reading the non-Unicode text linearly from the
+    beginning. This can be a problem for non-Unicode text processing, where text
+    insertion, deletion, and searching are common. The UTF-8/16/32 encodings do
+    not have this problem because the single, lead, or trail units have disjoint
+    values and character boundary can be easily found.
+
+3.  Some stateful encodings only switch between two states: one with one byte
+    per character and one with two bytes per character. This type of encoding is
+    very common in mainframe systems based on Extended Binary Coded Decimal
+    Interchange Code (EBCDIC) and is actually handled in ICU with almost the
+    same code and type of mapping tables as stateless codepages.
+
+4.  The classifications of algorithmic vs. data-based converters and of
+    stateless vs. stateful encodings are independent of each other: UTF-8,
+    UTF-16, and UTF-32 encodings are algorithmic but stateless; UTF-7 and SCSU
+    encodings are algorithmic and stateful; Windows-1252 and Shift-JIS encodings
+    are data-based and stateless; ISO-2022-JP encoding is algorithmic,
+    data-based, and stateful.
+
+### Scope of this chapter
+
+The following sections in this chapter discuss the mapping data tables that are
+used in ICU. For related material, please see:
+
+1.  [ICU character set collection](http://icu-project.org/charts/charset/)
+
+2.  [Unicode Technical Report 22](http://www.unicode.org/unicode/reports/tr22/)
+
+3.  "Cross Mapping Tables" in [Unicode Online
+    Data](http://www.unicode.org/unicode/onlinedat/online.html)
+
+## ICU Mapping Table Data Files
+
+### Overview
+
+As stated above, most ICU converters rely on character mapping tables. ICU 1.8
+has one single data structure for all character mapping tables, which is used by
+a generic Multi-Byte Character Set (MBCS) converter implementation. The
+implementation is flexible enough to handle stateless encodings with the
+following parameters:
+
+1.  Support for variable-length, byte-based encodings with 1 to 4 bytes per
+    character.
+
+2.  Support for all Unicode characters (code points 0..0x10ffff). Since ICU 1.8
+    uses the UTF-16 encoding as its Unicode encoding form, surrogate pairs are
+    completely supported.
+
+3.  Efficient distinction between unassigned (unmappable) and illegal byte
+    sequences.
+
+4.  It is not possible to convert from Unicode to byte sequences with leading
+    zero bytes.
+
+5.  Simple stateful encodings are also handled using only Shift-In and Shift-Out
+    (SI/SO) codes and one single-byte and one double-byte state.
+
+> :point_right: **Note**: *In the context of conversion tables, "unassigned" code points or codepage byte
+sequences are valid but do not have a **mapping**. This is different from
+"unassigned" code points in a character set like Unicode or Shift-JIS which are
+codes that do not have assigned **characters**.*
+
+Prior to version 1.8, ICU used more specific, more limited, converter
+implementations for Single Byte Character Set (SBCS), Double Byte Character Set
+(DBCS), and the stateful Extended Binary Coded Decimal Interchange Code (EBCDIC)
+codepages. Mapping table data is provided in text files. ICU comes with several
+dozen .ucm files (UniCode Mapping, in icu/source/data/mappings/) that are
+translated at build time by its makeconv tool (source code in
+icu/source/tools/makeconv). The makeconv tool writes one binary, memory-mappable
+.cnv file per .ucm file. The resulting .cnv files are included by default in the
+common data file for use at runtime.
+
+The format of the .ucm files is similar to the format of the UPMAP files as
+provided by IBM® in the codepage repository and as used in the uconvdef tool on
+AIX. UPMAP is a text file that specifies the mapping of a codepage character to
+and from Unicode.
+
+The format of the .cnv files is ICU-specific. The .cnv file format may change
+between ICU versions even for the same .ucm files. The .ucm file format may be
+extended to include more features.
+
+The following sections concentrate on the .ucm file format. The .cnv file format
+is described in the source code in the icu/source/common/ucnvmbcs.c directory
+and is updated using the MBCS converter implementation.
+
+These conversion tables can have more than one name. ICU allows multiple names
+("aliases") for the same encoding. It matches a requested encoding name against
+a list of names in icu/source/data/mappings/convrtrs.txt and when it finds a
+match, ICU opens a converter with the name in the leftmost position in the
+matching line. The name matching is not case-sensitive and ICU ignores spaces,
+dashes, and underscores. At build time, the gencnval tool located in the
+icu/source/tools/gencnval directory, generates a binary form of the convrtrs.txt
+file as a data file for runtime for the cnvalias.icu file ("Converter Aliases
+data file").
+
+### .ucm File Format
+
+.ucm files are line-oriented text files. Empty lines and comments starting with
+'#' are ignored.
+
+A .ucm file contains two sections:
+
+1.  a header with general specifications of the codepage
+
+2.  a mapping table section between the "CHARMAP" and "END CHARMAP" lines.
+
+For example:
+
+```
+<code_set_name> "IBM-943"
+<char_name_mask> "AXXXX"
+<mb_cur_min> 1
+<mb_cur_max> 2
+<uconv_class> "MBCS"
+<subchar> \xFC\xFC
+<subchar1> \x7F
+<icu:state> 0-7f, 81-9f:1, a0-df, e0-fc:1
+<icu:state> 40-7e, 80-fc
+#
+CHARMAP
+#
+#
+#ISO 10646 IBM-943
+#_________ _________
+<U0000> \x00 |0
+<U0001> \x01 |0
+<U0002> \x02 |0
+<U0003> \x03 |0
+...
+<UFFE4> \xFA\x55 |1
+<UFFE5> \x81\x8F |0
+<UFFFD> \xFC\xFC |2
+END CHARMAP
+```
+
+The header fields are:
+
+1.  code_set_name - The name of the codepage. The makeconv tool generates the
+    .cnv file name from the .ucm filename but uses this header field for the
+    converter name that it writes into the .cnv file for ucnv_getName. The
+    makeconv tool prints a warning message if this header field does not match
+    the file name. The file name is not case-sensitive.
+
+2.  char_name_mask - This is ignored by makeconv tool. "AXXXX" specifies that
+    the POSIX-style character "name" consists of one letter (Alpha) followed by
+    4 hexadecimal digits. Since ICU only uses Unicode character "names" (for
+    example, code points) the format is fixed (see below).
+
+3.  mb_cur_min - The minimum number of bytes per character.
+
+4.  mb_cur_max - The maximum number of bytes per character.
+
+5.  uconv_class - This can be either "SBCS", "DBCS", "MBCS", or
+    "EBCDIC_STATEFUL"
+    The most general converter class/type/category is MBCS, which requires that
+    the codepage structure has the following <icu:state> lines. The other types
+    of converters are subsets of MBCS. The makeconv tool uses predefined state
+    tables for these other converters when their structure is not explicitly
+    specified. The following describes how the converter types are interpreted:
+
+    a.  MBCS: Generic ICU converter type, requires a state table
+
+    b.  SBCS: Single-byte, 8-bit codepages
+
+    c.  DBCS: Double-byte EBCDIC codepages
+
+    d.  EBCDIC_STATEFUL: Mixed Single-Byte or Double-Byte EBCDIC codepages (stateful, using SI/SO)
+
+The following shows the exact implied state tables for non-MBCS types. A state
+table may need to be overwritten in order to allow supplementary characters
+(U+10000 and up).
+
+1.  subchar - The substitution character byte sequence for this codepage. This sequence must be a valid byte sequence according to the codepage structure.
+
+2.  subchar1 - This is the single byte substitution character when subchar is defined. Some IBM converter libraries use different substitution characters for "narrow" and "wide" characters (single-byte and double-byte). ICU uses only one substitution character per codepage because it is common industry  practice.
+
+3.  icu:state - See the "State Table Syntax in .ucm Files" section for a  detailed description of how to specify a codepage structure.
+
+4.  icu:charsetFamily - This specifies if the codepage is ASCII or EBCDIC based.
+
+The subchar and subchar1 fields have been known to cause some confusion. The
+following conditions outline when each are used:
+
+1.  Conversion from Unicode to a codepage occurs and an unassigned code point is
+    found
+
+    a.  If a subchar1 byte is defined and a subchar1 mapping is defined for the code point (with a |2 precision indicator),
+        output the subchar1
+
+    b.  Otherwise output the regular subchar
+
+2.  Conversion from a codepage to Unicode occurs and an unassigned codepoint is found
+
+    a.  If the input sequence is of length 1 and a subchar1 byte is specified for the codepage, output U+001A
+
+    b.  Otherwise output U+FFFD
+
+In the CHARMAP section of a .ucm file, each line contains a Unicode code point
+(like <U(*1-6 hexadecimal digits for the code point*)> ), a codepage character
+byte sequence (each byte like \\x*hh* (2 hexadecimal digits} ), and an optional
+"precision" or "fallback" indicator.
+
+The precision indicator either must be present in all mappings or in none of
+them. The indicator is a pipe symbol ‘|’ followed by a 0, 1, 2, 3, or 4 that has
+the following meaning:
+
+*   |0 - A "normal", roundtrip mapping from a Unicode code point and back.
+*   |1 - A "fallback" mapping only from Unicode to the codepage, but not back.
+*   |2 – A subchar1 mapping. The code point is unmappable, and if a substitution
+    is performed, then the subchar1 should be used rather than the subchar.
+    Otherwise, such mappings are ignored.
+*   |3 - A "reverse fallback" mapping only from the codepage to Unicode, but not
+    back to the codepage.
+*   |4 - A "good one-way" mapping only from Unicode to the codepage, but not
+    back.
+
+Fallback mappings from Unicode typically do not map codes for the same
+character, but for "similar" ones. This mapping is sometimes done if a character
+exists in Unicode but not in the codepage. To replace it, ICU maps a codepage
+code to a similar-looking code for human-readable output. This mapping feature
+is not useful for text data transmission especially in markup languages where a
+Unicode code point can be escaped with its code point value. The ICU application
+programming interface (API) ucnv_setFallback() controls this fallback behavior.
+
+"Reverse fallbacks" are technically similar, but the same Unicode character can
+be encoded twice in the codepage. ICU always uses reverse fallbacks at runtime.
+
+A subset of the fallback mappings from Unicode is always used at runtime: Those
+that map private-use Unicode code points. Fallbacks from private-use code points
+are often introduced as replacements for previous roundtrip mappings for the
+same pair of codes. These replacements are used when a Unicode version assigns a
+new character that was previously mapped to that private-use code point. The
+mapping table is then changed to map the same codepage byte sequence to the new
+Unicode code point (as a new roundtrip) and the mapping from the old private-use
+code point to the same codepage code is preserved as a fallback.
+
+A "good one-way" mapping is like a fallback, but ICU always uses "good one-way"
+mappings at runtime, regardless of the fallback API flag.
+
+The idea is that fallbacks normally lose information, such as mapping from a
+compatibility variant of a letter to the ASCII version; however, fallbacks from
+PUA and reverse fallbacks are assumed to be for "the same character", just an
+older code for it.
+
+Something similar happens with from-Unicode Variation Selector sequences. It is
+possible to round-trip (|0) either the unadorned character or the sequence with
+a variation selector, and add a "good one-way" mapping (|4) from the other
+version. That "good one-way" mapping does not lose much information, and it is
+used even if the "use fallback" API flag is false. Alternatively, both mappings
+could be fallbacks (|1) that should be controlled by the "use fallback"
+attribute.
+
+### State table syntax in .ucm files
+
+The conversion to Unicode uses a state machine to achieve the above capabilities
+with reasonable data file sizes. The state machine information itself is loaded
+with the conversion data and defines the structure of the codepage, including
+which byte sequences are valid, unassigned, and illegal. This data cannot (or
+not easily) be computed from the pure mapping data. Instead, the .ucm files for
+MBCS encodings have additional entries that are specific to the ICU makeconv
+tool. The state tables for SBCS, DBCS, and EBCDIC_STATEFUL are implied, but they
+can be overridden (see the examples below). These state tables are specified in
+the header section of the .ucm file that contains the <icu:state> element. Each
+line defines one aspect of the state machine. The state machine uses a table of
+as many rows as there are states (= as many as there are <icu:state> lines).
+Each row has 256 entries; one for each possible byte value.
+
+The state table lines in the .ucm header conform to the following Extended
+Backus-Naur Form (EBNF)-like grammar (whitespace is allowed between all tokens):
+
+```
+row=[[firstentry ','] entry (',' entry)*]
+firstentry="initial" | "surrogates"
+           (initial state (default for state 0), output is all surrogate pairs)
+```
+
+Each state table row description (that follows the <icu:state>) begins with an
+optional initial or surrogates keyword and is followed by one or more column
+entries. For the purpose of codepage state tables, the states=rows in the table
+are numbered beginning at 0 for the first line in the .ucm file header. The
+numbers are assigned implicitly by the makeconv tool in order of the <icu:state>
+lines.
+
+A row may be empty (nothing following the <icu:state>) — that is equivalent to
+"all illegal" or 0-ff.i and is useful for trail byte states for all-illegal byte
+sequences.
+
+```
+entry=range ':' nextstate] ['.' [action]]
+range = number ['-' number]
+nextstate = number (0..7f)
+action = 'u' | 's' | 'p' | 'i'
+             (unassigned, state change only, surrogate pair, illegal)
+number = (1- or 2-digit hexadecimal number)
+```
+
+Each column entry contains at least one hexadecimal byte value or value range
+and is separated by a comma. The column entry specifies how to interpret an
+input byte in the row's state. If neither a next state nor an action is
+explicitly specified (only the byte range is given) then the byte value
+terminates the byte sequence, results in a valid mapping to a Unicode BMP
+character, and resets the state number to 0. The first line with <icu:state> is
+called state 0.
+
+The next state can be explicitly specified with a separating colon ( : )
+followed by the number of the state (=number/index of the row, starting at 0).
+This specification is mostly used for intermediate byte values (such as bytes
+that are not the last ones in a sequence). The state machine needs to proceed to
+the next state and read another byte. In this case, no other action is
+specified.
+
+If the byte value(s) terminate(s) a byte sequence, then the byte sequence
+results in the following depending on the action that is announced with a period
+( . ) followed by a letter:
+
+| letter | meaning |
+|--|---------|
+| u | Unassigned. The byte sequence is valid but does not encode a character. |
+| none | (no letter) - Valid. If no action letter is specified, then the byte sequence is valid and encodes a Unicode character up to U+ffff |
+| p | Surrogate Pair. The byte sequence is valid and the result may map to a UTF-16 encoded surrogate pair |
+| i | Illegal. The byte sequence is illegal. This is the default for all byte values in a row that are not otherwise specified with column entries|
+| s | State change only. The byte sequence does not encode any character but may change the state number. This may be used with simple, stateful encodings (for example, SI/SO codes), but currently it is not used by ICU.|
+
+If an action is specified without a next state, then the next state number
+defaults to 0. In other words, a byte value (range) terminates a sequence if
+there is an action specified for it, or when there is neither an action nor a
+next state. In this case, the byte value defaults to "valid, next state is 0"
+(equivalent to :0.).
+
+If a byte value is not specified in any column entry row, then it is illegal in
+the current state. If a byte value is specified in more than one column entry of
+the same row, then ICU uses the last state. These specifications allow you to
+assign common properties for a wide byte value range followed by a few
+exceptions. This is easier than having to specify mutually exclusive ranges,
+especially if many of them have the same properties.
+
+The optional keyword at the beginning of a state line has the following effect:
+
+| keyword | effect |
+|---------|--------|
+| initial | The state machine can start reading byte sequences in this state. State 0 is always an initial state. Only initial states can be next states for final byte values. In an initial state, the Unicode mappings for all final bytes are also stored directly in the state table.
+| surrogates | All Unicode mappings for final bytes in non-initial states are stored in a separate table of 16-bit Unicode (UTF-16) code units. Since most legacy codepages map only to Unicode code points up to U+ffff (the Basic Multilingual Plane, BMP), the default allocation per mapping result is one 16-bit unit. Individual byte values can be specified to map to surrogate pairs (= two 16-bit units) with action letter p. The surrogates keyword specifies the values for the entire state (row). Surrogate pair mapping entries can still hold single units depending on the actual mapping data, but single-unit mapping entries cannot hold a pair of units. Mapping to single-unit entries is the default because the mapping is faster, uses half as much memory in the code units table, and is sufficient for most legacy codepages.|
+
+When converting to Unicode, the state machine starts in state number 0. In each
+iteration, the state machine reads one input (codepage) byte and either proceeds
+to the next state as specified, or treats it as a final byte with the specified
+action and an optional non-0 next (initial) state. This means that a state table
+needs to have at least as many state rows as the maximum number of bytes per
+character, which is the maximum length of any byte sequence.
+
+Exception: For EBCDIC_STATEFUL codepages, double-byte sequences start in state
+1, with the SI/SO bytes switching from state 0 to state 1 or from state 1 to
+state 0. See the default state table below.
+
+### Extension and delta tables
+
+ICU 2.8 adds an additional "extension" data structure to its conversion tables.
+The new data structure supports a number of new features. When any of the
+following features are used, then all mappings must use a precision indicator.
+
+#### Converting multiple characters as a unit
+
+Before ICU 2.8, only one Unicode code point could be converted to or from one
+complete codepage byte sequence. The new data structure supports the conversion
+between multiple Unicode code points and multiple complete codepage byte
+sequences. (A "complete codepage byte sequence" is a sequence of bytes which is
+valid according to the state table.)
+
+Syntax: Simply write more than one Unicode code point on a mapping line, and/or
+more than one complete codepage byte sequence. Plus signs (+) are optional
+between code points and between bytes. For example,
+ibm-1390_P110-2003.ucm contains
+
+    <U304B><U309A> \xEC\xB5 |0
+
+and test3.ucm contains
+
+    <U101234>+<U50005>+<U60006> \x07+\x00+\x01\x02\x0f+\x09 |0
+
+For more examples see the ICU conversion data and the
+icu/source/test/testdata/test*.ucm test data files.
+
+ICU 2.8 supports up to 19 UChars on the Unicode side of a mapping and up to 31
+bytes on the codepage side.
+
+The longest match possible is converted in order to properly handle tables where
+the source sides of some mappings are prefixes of the source sides of other
+mappings.
+
+As a side effect, if conversion offsets are written and a potential match
+crosses buffer boundaries, then some of the initial offsets for the following
+output may be unknown (-1) because their input was stored in the converter from
+a previous buffer while looking for a longer match.
+
+Conversion tables for SI/SO-stateful (usually EBCDIC_STATEFUL) codepages cannot
+include mappings with SI or SO bytes or where there are SBCS characters in a
+multi-character byte sequence. In other words, for these tables there must be
+exactly one byte in a mapping or else a sequence of one or more DBCS characters.
+
+#### Delta (extension-only) conversion table files
+
+Physically, a binary conversion table (.cnv) file automatically contains both a
+traditional "base table" data structure for the 1:1 mappings and a new
+"extension table" for the m:n mappings if any are encountered in the .ucm file.
+An extension table can also be requested manually by splitting the CHARMAP into
+two. The first CHARMAP section will be used for the base table, and the second
+only for the extension table. M:n mappings in the first CHARMAP will be moved to
+the extension table.
+
+In order to save space for very similar conversion tables, it is possible to
+create delta .cnv files that contain only an extension table and the name of
+another .cnv file with a base table. The base file must be split into two
+CHARMAPs such that the base file's base table does not contain any mappings that
+contradict any of the delta file's mappings.
+
+The delta (extension-only) file uses only a single CHARMAP section. In addition,
+it nees a line in the header that both causes building just a delta file and
+specifies the name of the base file. For example, windows-936-2000.ucm contains
+
+    <icu:base> “ibm-1386_P100-2002”
+
+makeconv ignores all mappings for the delta file that are also in the base
+file's base table. If the two conversion tables are sufficiently similar, then
+the delta file will contain only a relatively small set of mappings, which
+results in a small .cnv file. At runtime, both the delta file and its base file
+are loaded, and the base file's base table is used together with the extension
+file. The base file works as a standalone file, using its own extension table
+for its full set of mappings. The base file must be in the same ICU data package
+as the delta file.
+
+The hard part is to split the base file's mappings into base and extension
+CHARMAPs such that the base table does not overlap with any delta file, while
+all shared mappings should be in the base table. (The base table data structure
+is more compact than the extension table data structure.)
+
+ICU provides the ucmkbase tool in the
+[ucmtools](https://github.com/unicode-org/icu-data/tree/master/charset/source/ucmtools)
+collection to do this.
+
+For example, the following illustrates how to use ucmkbase to make a base .ucm
+file for three Shift-JIS conversion table variants. (ibm-943_P15A-2003.ucm
+becomes the base.)
+
+```
+C:\tmp\icu\ucm>ren ibm-943_P15A-2003.ucm ibm-943_P15A-2003.orig
+C:\tmp\icu\ucm>ucmkbase ibm-943_P15A-2003.orig ibm-943_P130-1999.ucm ibm-942_P12A-1999.ucm > ibm-943_P15A-2003.ucm
+```
+
+After this, the two delta .ucm files only need to get the following line added
+before the start of their CHARMAPs:
+
+```
+<icu:base> "ibm-943_P15A-2003"
+```
+
+The ICU tools and runtime code handle DBCS-only conversion tables specially,
+allowing them to be built into delta files with MBCS or EBCDIC_STATEFUL base
+files without using their single-byte mappings, and without ucmkbase moving the
+single-byte mappings of the base file into the base file's extension table. See
+for example ibm-16684_P110-2003.ucm and ibm-1390_P110-2003.ucm.
+
+#### Other enhancements
+
+ICU 2.8 adds support for the specification of which unassigned Unicode code
+points should be mapped to subchar1 rather than the default subchar. See the
+discussion of subchar1 above for more details.
+
+The extension table data structure also removes one minor limitation on ICU
+conversion tables: Fallback mappings to a single byte 00 are now allowed and
+handled properly. ICU versions before 2.8 could only handle roundtrips to/from
+00.
+
+### Examples for codepage state tables
+
+The following shows the exact implied state tables for non-MBCS types, A state
+table may need to be overwritten in order to allow supplementary characters
+(U+10000 and up).
+
+US-ASCII
+```
+0-7f
+```
+
+This single-row state table describes US-ASCII. Byte values from 0 to 0x7f are
+valid and map to Unicode characters up to U+ffff. Byte values from 0x80 to 0xff
+are illegal.
+
+Shift-JIS
+```
+0-7f, 81-9f:1, a0-df, e0-fc:1
+40-7e, 80-fc
+```
+
+This two-row state table describes the Shift-JIS structure which encodes some
+characters with one byte each and others with two bytes each. Bytes 0 to 0x7f
+and 0xa0 to 0xdf are valid single-byte encodings. Bytes 0x81 to 0x9f and 0xe0 to
+0xfc are lead bytes. (For example, they are followed by one of the bytes that is
+specified as valid in state 1). A byte sequence of 0x85 0x61 is valid while a
+single byte of 0x80 or 0xff is illegal. Similarly, a byte sequence of 0x85 0x31
+is illegal.
+
+EUC-JP
+```
+0-8d, 8e:2, 8f:3, 90-9f, a1-fe:1
+a1-fe
+a1-e4
+a1-fe:1, a1:4, a3-af:4, b6:4, d6:4, da-db:4, ed-f2:4
+a1-fe.u
+```
+
+This fairly complicated state table describes EUC-JP. Valid byte sequences are
+one, two, or three bytes long. Two-byte sequences have a lead byte of 0x8e and
+end in state 2, or have lead bytes 0xa1 to 0xfe and end in state 1. Three-byte
+sequences have a lead byte of 0x8f and continue in state 3. Some final byte
+value ranges are entirely unassigned, therefore they end in state 4 with an
+action letter of u for "unassigned" to save significant memory for the code
+units table. Assigned three-byte sequences end in state 1 like most two-byte
+sequences.
+
+SBCS default state table:
+```
+0-ff
+```
+SBCS by default implies the structure for single-byte, 8-bit codepages.
+
+DBCS default state table:
+```
+0-3f:3, 40:2, 41-fe:1, ff:3
+41-fe
+40
+
+```
+
+**Important**:
+These are four states — the fourth has an empty line (equivalent to 0-ff.i)!
+DBCS codepages, by default, are defined with the EBCDIC double-byte structure.
+Valid sequences are pairs of bytes from 0x41 to 0xfe and the one pair 0x40/0x40
+for the double-byte space. The structure is defined such that all illegal byte
+sequences are always two in length. Therefore, every byte in the initial state
+is a lead byte.
+
+EBCDIC_STATEFUL default state table:
+```
+0-ff, e:1.s, f:0.s
+initial, 0-3f:4, e:1.s, f:0.s, 40:3, 41-fe:2, ff:4
+0-40:1.i, 41-fe:1., ff:1.i
+0-ff:1.i, 40:1.
+0-ff:1.i
+```
+
+This is the structure of Mixed Single-byte and Double-byte EBCDIC codepages,
+which are stateful and use the Shift-In/Shift-Out (SI/SO) bytes 0x0f/0x0e. The
+initial state 0 is almost the same as for SBCS except for SI and SO. State 1 is
+also an initial state and is the basis for a state-shifted version of the DBCS
+structure above. All double-byte sequences return to state 1 and SI switches
+back to state 0. SI and SO are also allowed in their own states with no effect.
+
+> :point_right:  **Note**: *If a DBCS or EBCDIC_STATEFUL codepage maps supplementary (non-BMP) Unicode
+characters, then a modified state table needs to be specified in the .ucm file.
+The state table needs to use the surrogates designation for a table row or .p
+for some entries.<br/> The reuse of a final or intermediate state (shown for EUC-JP) is valid for as
+long as there is no circle in the state chain. The mappings will be unique
+because of the different path to the shared state (sharing a state saves some
+memory; each state table row occupies 1kB in the .cnv file). This table also
+shows the redefinition of byte value ranges within one state row (State number
+3)as shorthand. State 3 defines bytes a1-fe to go to state 1, but the following
+entries redefine and override certain bytes to go to state 4.*
+
+An initial state never needs a surrogates designation or .p because Unicode
+mapping results in initial states that are stored directly in the state table,
+providing enough room in each cell. The size of a generated .cnv mapping table
+file depends primarily on the number and distribution of the mappings and on the
+number of valid, multi-byte sequences that the state table allows. Each state
+table row takes up one kilobyte.
+
+For single-byte codepages, the state table cells contain all two-Unicode
+mappings. Code point results for multi-byte sequences are stored in an array
+with enough room for all valid byte sequences. For all byte sequences that end
+in a surrogates or .p state, Unicode allocates two code units.
+
+If possible, valid state table entries may be changed to .u to reduce the number
+of valid, assignable sequences and to make the .cnv file smaller. If additional
+states are necessary, then each additional state itself adds 1kB to the file
+size, diminishing the file size savings. See the EUC-JP example above.
+
+For codepages with up to two bytes per character, the makeconv tool
+automatically compacts the bytes, if possible, by introducing one more trail
+byte state. This state replaces valid entries in the original trail state with
+unassigned entries and changes each lead byte entry to work with the new state
+if there are no mappings with that lead byte.
+
+For codepages with up to three or four bytes per character, compaction must be
+done manually. However, if the verbose option is set on the command line, the
+makeconv tool will print useful information about unassigned byte sequences.
diff --git a/docs/userguide/conversion/detection.md b/docs/userguide/conversion/detection.md
new file mode 100644
index 0000000..5e2abf5
--- /dev/null
+++ b/docs/userguide/conversion/detection.md
@@ -0,0 +1,345 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Character Set Detection
+
+## Overview
+
+Character set detection is the process of determining the character set, or
+encoding, of character data in an unknown format. This is, at best, an imprecise
+operation using statistics and heuristics. Because of this, detection works best
+if you supply at least a few hundred bytes of character data that's mostly in a
+single language. In some cases, the language can be determined along with the
+encoding.
+
+Several different techniques are used for character set detection. For
+multi-byte encodings, the sequence of bytes is checked for legal patterns. The
+detected characters are also check against a list of frequently used characters
+in that encoding. For single byte encodings, the data is checked against a list
+of the most commonly occurring three letter groups for each language that can be
+written using that encoding. The detection process can be configured to
+optionally ignore html or xml style markup, which can interfere with the
+detection process by changing the statistics.
+
+The input data can either be a Java input stream, or an array of bytes. The
+output of the detection process is a list of possible character sets, with the
+most likely one first. For simplicity, you can also ask for a Java Reader that
+will read the data in the detected encoding.
+
+There is another character set detection C++ library, the [Compact Encoding
+Detector](https://github.com/google/compact_enc_det), that may have a lower
+error rate, particularly when working with short samples of text.
+
+## CharsetMatch
+
+The CharsetMatch class holds the result of comparing the input data to a
+particular encoding. You can use an instance of this class to get the name of
+the character set, the language, and how good the match is. You can also use
+this class to decode the input data.
+
+To find out how good the match is, you use the getConfidence() method to get a
+*confidence value*. This is an integer from 0 to 100. The higher the value, the
+more confidence there is in the match For example:
+
+CharsetMatch match = ...;
+int confidence;
+confidence = match.getConfidence();
+if (confidence < 50 ) {
+// handle a poor match...
+} else {
+// handle a good match...
+}
+
+In C, you can use the
+`ucsdet_getConfidence(const UCharsetMatch *ucsm, UErrorCode *status)`
+method to get a confidence value
+
+```C
+const UCharsetMatch *ucm;
+UErrorCode status = U_ZERO_ERROR;
+int32_t confidence = ucsdet_getConfidence(ucm, &status);
+if (confidence <50) {
+    // handle a poor match...
+} else {
+    // handle a good match...
+}
+```
+
+To get the name of the character set, which can be used as an encoding name in
+Java, you use the getName() method:
+
+```Java
+CharsetMatch match = ...;
+byte characterData[] = ...;
+String charsetName;
+String unicodeData;
+charsetName = match.getName();
+unicodeData = new String(characterData, charsetName);
+```
+
+To get the name of the character set in C :
+
+```C
+const UCharsetMatch *ucm;
+UErrorCode status = U_ZERO_ERROR;
+const char *name = ucsdet_getName(ucm, &status);
+```
+
+To get the three letter ISO code for the detected language, you use the
+getLanguage() method. If the language could not be determined, getLanguage()
+will return null. Note that language detection does not work with all charsets,
+and includes only a very small set of possible languages. It should not used if
+robust, reliable language detection is required.
+
+```Java
+CharsetMatch match = ...;
+String languageCode;
+languageCode = match.getLanguage();
+if (languageCode != null) {
+    // handle the language code...
+}
+```
+
+The `ucsdet_getLanguage(const UCharsetMatch *ucsm, UErrorCode *status)` method
+can be used in C to get the language code. If the language could not be
+determined, the method will return an empty string.
+
+```C
+const UCharsetMatch *ucm;
+UErrorCode status = U_ZERO_ERROR;
+const char *language = ucsdet_getLanguage(ucm, &status);
+```
+
+If you want to get a Java String containing the converted data you can use the
+getString() method:
+
+```Java
+CharsetMatch match = ...;
+String unicodeData;
+unicodeData = match.getString();
+```
+
+If you want to limit the number of characters in the string, pass the maximum
+number of characters you want to the getString() method:
+
+```Java
+CharsetMatch match = ...;
+String unicodeData;
+unicodeData = match.getString(1024);
+```
+
+To get a java.io.Reader to read the converted data, use the getReader() method:
+
+```Java
+CharsetMatch match = ...;
+Reader reader;
+StringBuffer sb = new StringBuffer();
+char[] buffer = new char[1024];
+int bytesRead = 0;
+reader = match.getReader();
+while ((bytesRead = reader.read(buffer, 0, 1024)) >= 0) {
+    sb.append(buffer, 0, bytesRead);
+}
+reader.close();
+```
+
+## CharsetDetector
+
+The CharsetDetector class does the actual detection. It matches the input data
+against all character sets, and computes a list of CharsetMatch objects to hold
+the results. The input data can be supplied as an array of bytes, or as a
+java.io.InputStream.
+
+To use a CharsetDetector object, first you construct it, and then you set the
+input data, using the setText() method. Because setting the input data is
+separate from the construction, it is easy to reuse a CharsetDetector object:
+
+```Java
+CharsetDetector detector;
+byte[] byteData = ...;
+InputStream streamData = ...;
+detector = new CharsetDetector();
+detector.setText(byteData);
+// use detector with byte data...
+detector.setText(streamData);
+// use detector with stream data...
+```
+
+If you want to know which character set matches your input data with the highest
+confidence, you can use the detect() method, which will return a CharsetMatch
+object for the match with the highest confidence:
+
+```Java
+CharsetDetector detector;
+CharsetMatch match;
+byte[] byteData = ...;
+detector = new CharsetDetector();
+detector.setText(byteData);
+match = detector.detect();
+```
+
+If you want to know which character set matches your input data in C, you can
+use the `ucsdet_detect(UCharsetDetector *csd , UErrorCode *status)` method.
+
+```C
+UCharsetDetector *csd;
+const UCharsetMatch *ucm;
+static char buffer[BUFFER_SIZE] = {....};
+int32_t inputLength = ... // length of the input text
+UErrorCode status = U_ZERO_ERROR;
+ucsdet_setText(csd, buffer, inputLength, &status);
+ucm = ucsdet_detect(csd, &status);
+```
+
+If you want to know all of the character sets that could match your input data
+with a non-zero confidence, you can use the detectAll() method, which will
+return an array of CharsetMatch objects sorted by confidence, from highest to
+lowest.:
+
+```Java
+CharsetDetector detector;
+CharsetMatch matches[];
+byte[] byteData = ...;
+detector = new CharsetDetector();
+detector.setText(byteData);
+matches = detector.detectAll();
+for (int m = 0; m < matches.length; m += 1) {
+    // process this match...
+}
+```
+
+> :point_right: **Note**: The
+`ucsdet_detectALL(UCharsetDetector *csd , int32_t *matchesFound, UErrorCode *status)`
+method can be used in C in order to detect all of the
+character sets where matchesFound is a pointer to a variable that will be set to
+the number of charsets identified that are consistent with the input data.
+
+The CharsetDetector class also implements a crude *input filter* that can strip
+out html and xml style tags. If you want to enable the input filter, which is
+disabled when you construct a CharsetDetector, you use the enableInputFilter()
+method, which takes a boolean. Pass in true if you want to enable the input
+filter, and false if you want to disable it:
+
+```Java
+CharsetDetector detector;
+CharsetMatch match;
+byte[] byteDataWithTags = ...;
+detector = new CharsetDetector();
+detector.setText(byteDataWithTags);
+detector.enableInputFilter(true);
+match = detector.detect();
+```
+
+To enable an input filter in C, you can use
+`ucsdet_enableInputFilter(UCharsetDetector *csd, UBool filter)` function.
+
+```C
+UCharsetDetector *csd;
+const UCharsetMatch *ucm;
+static char buffer[BUFFER_SIZE] = {....};
+int32_t inputLength = ... // length of the input text
+UErrorCode status = U_ZERO_ERROR;
+ucsdet_setText(csd, buffer, inputLength, &status);
+ucsdet_enableInputFilter(csd, TRUE);
+ucm = ucsdet_detect(csd, &status);
+```
+
+If you have more detailed knowledge about the structure of the input data, it is
+better to filter the data yourself before you pass it to CharsetDetector. For
+example, you might know that the data is from an html page that contains CSS
+styles, which will not be stripped by the input filter.
+
+You can use the inputFilterEnabled() method to see if the input filter is
+enabled:
+
+```Java
+CharsetDetector detector;
+detector = new CharsetDetector();
+// do a bunch of stuff with detector
+// which may or may not enable the input filter...
+if (detector.inputFilterEnabled()) {
+    // handle enabled input filter
+} else {
+    // handle disabled input filter
+}
+```
+
+> :point_right: **Note**: The ICU4C API provide uscdet_isInputFilterEnabled(const UCharsetDetector\*
+csd) function to check whether the input filter is enabled.
+
+The CharsetDetector class also has two convenience methods that let you detect
+and convert the input data in one step: the getReader() and getString() methods:
+
+```Java
+CharsetDetector detector;
+byte[] byteData = ...;
+InputStream streamData = ...;
+String unicodeData;
+Reader unicodeReader;
+detector = new CharsetDetector();
+unicodeData = detector.getString(byteData, null);
+unicodeReader = detector.getReader(streamData, null);
+```
+
+> :point_right: **Note**: The second argument to the getReader() and getString() methods is a
+String called declaredEncoding, which is not currently used. There is also a
+setDeclaredEncoding() method, which is also not currently used.
+
+The following code is equivalent to using the convenience methods:
+
+```Java
+CharsetDetector detector;
+CharsetMatch match;
+byte[] byteData = ...;
+InputStream streamData = ...;
+String unicodeData;
+Reader unicodeReader;
+detector = new CharsetDetector();
+detector.setText(byteData);
+match = detector.detect();
+unicodeData = match.getString();
+detector.setText(streamData);
+match = detector.detect();
+unicodeReader = match.getReader();CharsetDetector
+```
+
+## Detected Encodings
+
+The following table shows all the encodings that can be detected. You can get
+this list (without the languages) by calling the getAllDetectableCharsets()
+method:
+
+| **Character Set** | **Languages** |
+| ----------------- | ------------- |
+| UTF-8             | &nbsp;        |
+| UTF-16BE          | &nbsp;        |
+| UTF-16LE          | &nbsp;        |
+| UTF-32BE          | &nbsp;        |
+| UTF-32LE          | &nbsp;        |
+| Shift_JIS         | Japanese      |
+| ISO-2022-JP       | Japanese      |
+| ISO-2022-CN       | Simplified Chinese |
+| ISO-2022-KR       | Korean        |
+| GB18030           | Chinese       |
+| Big5              | Traditional Chinese |
+| EUC-JP            | Japanese      |
+| EUC-KR            | Korean        |
+| ISO-8859-1        | Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish |
+| ISO-8859-2        | Czech, Hungarian, Polish, Romanian |
+| ISO-8859-5        | Russian       |
+| ISO-8859-6        | Arabic        |
+| ISO-8859-7        | Greek         |
+| ISO-8859-8        | Hebrew        |
+| ISO-8859-9        | Turkish       |
+| windows-1250      | Czech, Hungarian, Polish, Romanian |
+| windows-1251      | Russian       |
+| windows-1252      | Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish |
+| windows-1253      | Greek         |
+| windows-1254      | Turkish       |
+| windows-1255      | Hebrew        |
+| windows-1256      | Arabic        |
+| KOI8-R            | Russian       |
+| IBM420            | Arabic        |
+| IBM424            | Hebrew        |
diff --git a/docs/userguide/conversion/index.md b/docs/userguide/conversion/index.md
new file mode 100644
index 0000000..06838c3
--- /dev/null
+++ b/docs/userguide/conversion/index.md
@@ -0,0 +1,141 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Conversion
+
+## Conversion Overview
+
+A converter is used to convert from one character encoding to another. In the
+case of ICU, the conversion is always between Unicode and another encoding, or
+vice-versa. A text encoding is a particular mapping from a given character set
+definition to the actual bits used to represent the data.
+
+Unicode provides a single character set that covers the major languages of the
+world, and a small number of machine-friendly encoding forms and schemes to fit
+the needs of existing applications and protocols. It is designed for best
+interoperability with both ASCII and ISO-8859-1 (the most widely used character
+sets) to make it easier for Unicode to be used in almost all applications and
+protocols.
+
+Hundreds of encodings have been developed over the years, each for small groups
+of languages and for special purposes. As a result, the interpretation of text,
+input, sorting, display, and storage depends on the knowledge of all the
+different types of character sets and their encodings. Programs have been
+written to handle either one single encoding at a time and switch between them,
+or to convert between external and internal encodings.
+
+There is no single, authoritative source of precise definitions of many of the
+encodings and their names. However,
+[IANA](http://www.iana.org/assignments/character-sets) is the best source for
+names, and our Character Set repository is a good source of encoding definitions
+for each platform.
+
+The transferring of text from one machine to another one often causes some loss
+of information. Some platforms have a different interpretation of the text than
+the other platforms. For example, Shift-JIS can be interpreted differently on
+Windows™ compared to UNIX®. Windows maps byte value 0x5C to the backslash
+symbol, while some UNIX machines map that byte value to the Yen symbol. Another
+problem arises when a character in the codepage looks like the Unicode Greek
+letter Mu or the Unicode micro symbol. Some platforms map this codepage byte
+sequence to one Unicode character, while another platform maps it to the other
+Unicode character. Fallbacks can partially fix this problem by mapping both
+Unicode characters to the same codepage byte sequence. Even though some
+character information is lost, the text is still readable.
+
+ICU's converter API has the following main features:
+
+1.  Unicode surrogate support
+
+2.  Support for all major encodings
+
+3.  Consistent text conversion across all computer platforms
+
+4.  Text data can be streamed (buffered) through the API
+
+5.  Fast text conversion
+
+6.  Supports fallbacks to the codepage
+
+7.  Supports reverse fallbacks to Unicode
+
+8.  Allows callbacks for handling and substituting invalid or unmapped byte
+    sequences
+
+9.  Allows a user to add support for unsupported encodings
+
+This section deals with the processes of converting encodings to and from
+Unicode.
+
+## Recommendations
+
+1.  **Use Unicode encodings whenever possible.** Together with Unicode for
+    internal processing, it makes completely globalized systems possible and
+    avoids the many problems with non-algorithmic conversions. (For a discussion
+    of such problems, see for example ["Character Conversions and Mapping
+    Tables"](http://icu-project.org/docs/papers/conversions_and_mappings_iuc19.ppt)
+    on <http://icu-project.org/docs/> and the [XML Japanese
+    Profile](http://www.w3.org/TR/japanese-xml/) .)
+
+    1.  Use UTF-8 and UTF-16.
+
+    2.  Use UTF-16BE, SCSU and BOCU-1 as appropriate.
+
+    3.  In special environments, other Unicode encodings may be used as well,
+        such as UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF-7, UTF-EBCDIC, and
+        CESU-8. (For turning Unicode filenames into ASCII-only filename strings,
+        the IMAP-mailbox-name encoding can be used.)
+
+    4.  Do not exchange text with single/unpaired surrogates.
+
+2.  **Use legacy charsets only when absolutely necessary**. For best data
+    fidelity:
+
+    1.  ISO-8859-1 is relatively unproblematic — if its limited character
+        repertoire is sufficient — because it is converted trivially (1:1) to
+        Unicode, avoiding conversion table problems for its small set of
+        characters. (By contrast, proper conversion from US-ASCII requires a
+        check for illegal byte values 0x80..0xff, which is an unnecessary
+        complication for modern systems with 8-bit bytes. ISO-8859-1 is nearly
+        as ubiquitous for modern systems as US-ASCII was for 7-bit systems.)
+
+    2.  If you need to communicate with a certain platform, then use the same
+        conversion tables as that platform itself, or at least ones that are
+        very, very close.
+
+    3.  ICU's conversion table repository contains hundreds of Unicode
+        conversion tables from a number of common vendors and platforms as well
+        as comparisons between these conversion tables:
+        <http://icu-project.org/charts/charset/> .
+
+    4.  Do not trust codepage documentation that is not machine-readable, for
+        example nice-looking charts: They are usually incomplete and out of
+        date.
+
+    5.  ICU's default build includes about 200 conversion tables. See the [ICU
+        Data](../icudata.md) chapter for how to add or remove conversion tables
+        and other data.
+
+    6.  In ICU, you can (and should) also use APIs that map a charset name
+        together with a standard/platform name. This allows you to get different
+        converters for the same ambiguous charset name (like "Shift-JIS"),
+        depending on the standard or platform specified. See the
+        [convrtrs.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt)
+        alias table, the [Using Converters](converters.md) chapter and [API
+        references](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ucnv_8h.html) .
+
+    7.  For data exchange (rather than pure display), turn off fallback
+        mappings: ucnv_setFallback(cnv, FALSE);
+
+    8.  For some text formats, especially XML and HTML, it is possible to set an
+        "escape callback" function that turns unmappable Unicode code points
+        into corresponding escape sequences, preventing data loss. See the API
+        references and the [ucnv sample
+        code](https://github.com/unicode-org/icu/tree/master/icu4c/source/samples/ucnv/)
+        .
+
+    9.  **Never modify a conversion table.** Instead, use existing ones that
+        match precisely those in systems with which you communicate. "Modifying"
+        a conversion table in reality just creates a new one, which makes the
+        whole situation even less manageable.
diff --git a/docs/userguide/datetime/calendar/examples.md b/docs/userguide/datetime/calendar/examples.md
new file mode 100644
index 0000000..f73e835
--- /dev/null
+++ b/docs/userguide/datetime/calendar/examples.md
@@ -0,0 +1,254 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Calendar Examples
+
+## Calendar for Default Time Zone
+
+These C++, C , and Java examples get a Calendar based on the default time zone
+and add days to a date.
+
+**C++**
+```C++
+UErrorCode status = U_ZERO_ERROR;
+GregorianCalendar* gc = new GregorianCalendar(status);
+if (U_FAILURE(status)) {
+    puts("Couldn't create GregorianCalendar");
+    return;
+}
+// set up the date
+gc->set(2000, Calendar::FEBRUARY, 26);
+gc->set(Calendar::HOUR_OF_DAY, 23);
+gc->set(Calendar::MINUTE, 0);
+gc->set(Calendar::SECOND, 0);
+gc->set(Calendar::MILLISECOND, 0);
+// Iterate through the days and print it out.
+for (int32_t i = 0; i < 30; i++) {
+    // print out the date.  
+    // You should use the DateFormat to properly format it
+    printf("year: %d, month: %d (%d in the implementation), day: %d\n",
+    gc->get(Calendar::YEAR, status),
+    gc->get(Calendar::MONTH, status) + 1,
+    gc->get(Calendar::MONTH, status),
+    gc->get(Calendar::DATE, status));
+    if (U_FAILURE(status)) {
+        puts("Calendar::get failed");
+        return;
+    }
+    // Add a day to the date
+    gc->add(Calendar::DATE, 1, status);
+    if (U_FAILURE(status)) {
+        puts("Calendar::add failed");
+        return;
+    }
+}
+delete gc;
+```
+
+**C**
+```C
+UErrorCode status = U_ZERO_ERROR;
+int32_t i;
+UCalendar* cal = ucal_open(NULL, -1, NULL, UCAL_GREGORIAN, &status);
+if (U_FAILURE(status)) {
+    puts("Couldn't create GregorianCalendar");
+    return;
+}
+// set up the date
+ucal_set(cal, UCAL_YEAR, 2000);
+ucal_set(cal, UCAL_MONTH, UCAL_FEBRUARY); /* FEBRUARY */
+ucal_set(cal, UCAL_DATE, 26);
+ucal_set(cal, UCAL_HOUR_OF_DAY, 23);
+ucal_set(cal, UCAL_MINUTE, 0);
+ucal_set(cal, UCAL_SECOND, 0);
+ucal_set(cal, UCAL_MILLISECOND, 0);
+// Iterate through the days and print it out.
+for (i = 0; i < 30; i++) {
+    // print out the date.
+    // You should use the udat_* API to properly format it
+    printf("year: %d, month: %d (%d in the implementation), day: %d\n",
+           ucal_get(cal, UCAL_YEAR, &status),
+           ucal_get(cal, UCAL_MONTH, &status) + 1,
+           ucal_get(cal, UCAL_MONTH, &status),
+           ucal_get(cal, UCAL_DATE, &status));
+    if (U_FAILURE(status)) {
+        puts("Calendar::get failed");
+        return;
+    }
+    // Add a day to the date
+    ucal_add(cal, UCAL_DATE, 1, &status);
+    if (U_FAILURE(status)) {
+        puts("Calendar::add failed");
+        return;
+    }
+}
+ucal_close(cal);
+```
+
+**Java**
+```Java
+Calendar cal = new GregorianCalendar();
+if (cal == null) {
+    System.out.println("Couldn't create GregorianCalendar");
+    return;
+}
+// set up the date
+cal.set(Calendar.YEAR, 2000);
+cal.set(Calendar.MONTH, Calendar.FEBRUARY); /* FEBRUARY */
+cal.set(Calendar.DATE, 26);
+cal.set(Calendar.HOUR_OF_DAY, 23);
+cal.set(Calendar.MINUTE, 0);
+cal.set(Calendar.SECOND, 0);
+cal.set(Calendar.MILLISECOND, 0);
+// Iterate through the days and print it out.
+for (int i = 0; i < 30; i++) {
+    // print out the date.
+    System.out.println(" year: " + cal.get(Calendar.YEAR) + 
+                       " month: " + (cal.get(Calendar.MONTH) + 1) +
+                       " day : " + cal.get(Calendar.DATE)
+    );
+    cal.add(Calendar.DATE, 1);
+}
+```
+
+These C++, C , and Java examples demonstrates converting dates from one calendar
+(Gregorian) to another calendar (Japanese).
+
+**C++**
+
+```C++
+UErrorCode status = U_ZERO_ERROR;
+UDate time;
+Calendar *cal1, *cal2;
+// Create a new Gregorian Calendar.
+cal1 = Calendar::createInstance("en_US@calender=gregorian", status);
+if (U_FAILURE(status)) {
+    printf("Error creating Gregorian calendar.\n");
+    return;
+}
+// Set the Gregorian Calendar to a specific date for testing.
+cal1->set(1980, UCAL_SEPTEMBER, 3);
+// Display the date.
+printf("Gregorian Calendar:\t%d/%d/%d\n",
+        cal1->get(UCAL_MONTH, status) + 1,
+        cal1->get(UCAL_DATE, status),
+        cal1->get(UCAL_YEAR, status));
+if (U_FAILURE(status)) {
+    printf("Error getting Gregorian date.");
+    return;
+}
+// Create a Japanese Calendar.
+cal2 = Calendar::createInstance("ja_JP@calendar=japanese", status);
+if (U_FAILURE(status)) {
+    printf("Error creating Japnese calendar.\n");
+    return;
+}
+// Set the date.
+time = cal1->getTime(status);
+if (U_FAILURE(status)) {
+    printf("Error getting time.\n");
+    return;
+}
+cal2->setTime(time, status);
+if (U_FAILURE(status)) {
+    printf("Error setting the date for Japanese calendar.\n");
+    return;
+}
+// Set the timezone
+cal2->setTimeZone(cal1->getTimeZone());
+// Display the date.
+printf("Japanese Calendar:\t%d/%d/%d\n",
+        cal2->get(UCAL_MONTH, status) + 1,
+        cal2->get(UCAL_DATE, status),
+        cal2->get(UCAL_YEAR, status));
+if (U_FAILURE(status)) {
+    printf("Error getting Japanese date.");
+    return;
+}
+delete cal1;
+delete cal2;
+```
+
+**C**
+```C
+UErrorCode status = U_ZERO_ERROR;
+UDate time;
+UCalendar *cal1, *cal2;
+// Create a new Gregorian Calendar.
+cal1 = ucal_open(NULL, -1, "en_US@calendar=gregorian", UCAL_TRADITIONAL,
+                 &status);
+if (U_FAILURE(status)) {
+    printf("Couldn't create Gregorian Calendar.");
+    return;
+}
+// Set the Gregorian Calendar to a specific date for testing.
+ucal_setDate(cal1, 1980, UCAL_SEPTEMBER, 3, &status);
+if (U_FAILURE(status)) {
+    printf("Error setting date.");
+    return;
+}
+// Display the date.
+printf("Gregorian Calendar:\t%d/%d/%d\n",
+        ucal_get(cal1, UCAL_MONTH, &status) + 1,
+        ucal_get(cal1, UCAL_DATE, &status),
+        ucal_get(cal1, UCAL_YEAR, &status));
+if (U_FAILURE(status)) {
+    printf("Error getting Gregorian date.");
+    return 1;
+}
+// Create a Japanese Calendar.
+cal2 = ucal_open(NULL, -1, "ja_J@calendar=japanese", UCAL_TRADITIONAL, &status);
+if (U_FAILURE(status)) {
+    printf("Couldn't create Japanese Calendar.");
+    return 1;
+}
+// Set the date.
+time = ucal_getMillis(cal1, &status);
+if (U_FAILURE(status)) {
+    printf("Error getting time.\n");
+    return;
+}
+ucal_setMillis(cal2, time, &status);
+if (U_FAILURE(status)) {
+    printf("Error setting time.\n");
+    return;
+}
+// Display the date.
+printf("Japanese Calendar:\t%d/%d/%d\n",
+        ucal_get(cal2, UCAL_MONTH, &status) + 1,
+        ucal_get(cal2, UCAL_DATE, &status),
+        ucal_get(cal2, UCAL_YEAR, &status));
+if (U_FAILURE(status)) {
+    printf("Error getting Japanese date.");
+    return;
+}
+ucal_close(cal1);
+ucal_close(cal2);
+```
+
+**Java**
+
+```Java
+Calendar cal1, cal2;
+// Create a new Gregorian Calendar.
+cal1 = new GregorianCalendar();
+// Set the Gregorian Calendar to a specific date for testing.
+cal1.set(1980, Calendar.SEPTEMBER, 3);
+// Display the date.
+System.out.println("Gregorian Calendar:\t" + (cal1.get(Calendar.MONTH) + 1) +
+                    "/" +
+                    cal1.get(Calendar.DATE) + "/" +
+                    cal1.get(Calendar.YEAR));
+// Create a Japanese Calendar.
+cal2 = new JapaneseCalendar();
+// Set the date and timezone
+cal2.setTime(cal1.getTime());
+cal2.setTimeZone(cal1.getTimeZone());
+// Display the date.
+System.out.println("Japanese Calendar:\t" + (cal2.get(Calendar.MONTH) + 1) +
+                    "/" +
+                    cal2.get(Calendar.DATE) + "/" +
+                    cal2.get(Calendar.YEAR));
+```
diff --git a/docs/userguide/datetime/calendar/index.md b/docs/userguide/datetime/calendar/index.md
new file mode 100644
index 0000000..7a071c2
--- /dev/null
+++ b/docs/userguide/datetime/calendar/index.md
@@ -0,0 +1,313 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Calendar Classes
+
+## Overview
+
+ICU has two main calendar classes used for parsing and formatting Calendar
+information correctly:
+
+1.  Calendar
+    An abstract base class that defines the calendar API. This API supports
+    UDate to fields conversion and field arithmetic.
+
+2.  GregorianCalendar
+    A concrete subclass of Calendar that implements the standard calendar used
+    today internationally.
+
+In addition to these, ICU has other Calendar sub classes to support
+non-gregorian calendars including:
+
+*   Japanese
+
+*   Buddhist
+
+*   Chinese
+
+*   Persian
+
+*   Indian
+
+*   Islamic
+
+*   Hebrew
+
+*   Indian
+
+*   Coptic
+
+*   Ethiopic
+
+The Calendar class is designed to support additional calendar systems in the
+future.
+
+> :point_right: **Note**: *Calendar classes are related to UDate, the TimeZone classes, and the DateFormat
+classes.*
+
+### Calendar locale and keyword handling
+
+When a calendar object is created, via either Calendar::create(), or
+ucal_open(), or indirectly within a date formatter, ICU looks up the 'default'
+calendar type for that locale. At present, all locales default to a Gregorian
+calendar, except for the compatibility locales th_TH_TRADITIONAL and
+ja_JP_TRADITIONAL. If the "calendar" keyword is supplied, this value will
+override the default for that locale.
+
+For instance, Calendar::createInstance("fr_FR", status) will create a Gregorian
+calendar, but Calendar::createInstance("fr_FR@calendar=buddhist") will create a
+Buddhist calendar.
+
+It is an error to use an invalid calendar type. It will produce a missing
+resource error.
+
+> :point_right: **Note**: *As of ICU 2.8, the above description applies to ICU4J only. ICU4J will have
+this behavior in 3.0*
+
+## Usage
+
+This section discusses how to use the Calendar class and the GregorianCalendar
+subclass.
+
+### Calendar
+
+Calendar is an abstract base class. It defines common protocols for a hierarchy
+of classes. Concrete subclasses of Calendar, for example the GregorianCalendar
+class, define specific operations that correspond to a real-world calendar
+system. Calendar objects (instantiations of concrete subclasses of Calendar),
+embody state that represents a specific context. They correspond to a real-world
+locale. They also contain state that specifies a moment in time.
+
+The API defined by Calendar encompasses multiple functions:
+
+1.  Representation of a specific time as a UDate
+
+2.  Representation of a specific time as a set of integer fields, such as YEAR,
+    MONTH, HOUR, etc.
+
+3.  Conversion from UDate to fields
+
+4.  Conversion from fields to UDate
+
+5.  Field arithmetic, including adding, rolling, and field difference
+
+6.  Context management
+
+7.  Factory methods
+
+8.  Miscellaneous: field meta-information, time comparison
+
+#### Representation and Conversion
+
+The basic function of the Calendar class is to convert between a UDate value and
+a set of integer fields. A UDate value is stored as UTC time in milliseconds,
+which means it is calendar and time zone independent. UDate is the most compact
+and portable way to store and transmit a date and time. Integer field values, on
+the other hand, depend on the calendar system (that is, the concrete subclass of
+Calendar) and the calendar object's context state.
+
+> :point_right: **Note**: *Integer field values are needed when implementing a human interface that must
+display or input a date and/or time.*
+
+At any given time, a calendar object uses (when DateFormat is not sufficient)
+either its internal UDate or its integer fields (depending on which has been set
+most recently via setTime() or set()), to represent a specific date and time.
+Whatever the current internal representation, when the caller requests a UDate
+or an integer field it is computed if necessary. The caller need never trigger
+the conversion explicitly. The caller must perform a conversion to set either
+the UDate or the integer fields, and then retrieve the desired data. This also
+applies in situations where the caller has some integer fields and wants to
+obtain others.
+
+#### Field Arithmetic
+
+Arithmetic with UDate values is straightforward. Since the values are
+millisecond scalar values, direct addition and subtraction is all that is
+required. Arithmetic with integer fields is more complicated. For example, what
+is the date June 4, 1999 plus 300 days? Calendar defines three basic methods (in
+several variants) that perform field arithmetic: add(), roll(), and
+fieldDifference().
+
+The add() method adds positive or negative values to a specified field. For
+example, calling add(Calendar::MONTH, 2) on a GregorianCalendar object set to
+March 15, 1999 sets the calendar to May 15, 1999. The roll() method is similar,
+but does not modify fields that are larger. For example, calling
+roll(Calendar::HOUR, n) changes the hour that a calendar is set to without
+changing the day. Calling roll(Calendar::MONTH, n) changes the month without
+changing the year.
+
+The fieldDifference() method is the inverse of the add() method. It computes the
+difference between a calendar's currently set time and a specified UDate in
+terms of a specified field. Repeated calls to fieldDifference() compute the
+difference between two UDates in terms of whatever fields the caller specifies
+(for example, years, months, days, and hours). If the add() method is called
+with the results of fieldDifference(when, n) , then the calendar is moved toward
+field by field.
+
+This is demonstrated in the following example:
+
+```C++
+Calendar cal = Calendar.getInstance();
+cal.set(2000, Calendar.MARCH, 15);
+Date date = new Date(2000-1900, Calendar.JULY, 4);
+int yearDiff = cal.fieldDifference(date, Calendar.YEAR); // yearDiff <= 0
+int monthDiff = cal.fieldDifference(date, Calendar.MONTH); // monthDiff ;<= 3
+// At this point cal has been advanced 3 months to June 15, 2000.
+int dayDiff = cal.fieldDifference(date, Calendar.DAY_OF_MONTH); // dayDiff ;<=19
+// At this point cal has been advanced 19 days to July 4, 2000.
+```
+
+#### Context Management
+
+A calendar object performs its computations within a specific context. The
+context affects the results of conversions and arithmetic computations. When a
+calendar object is created, it establishes its context using either default
+values or values specified by the caller:
+
+1.  Locale-specific week data, including the first day of the week and the
+    minimal days in the first week. Initially, this is retrieved from the locale
+    resource data for the specified locale, or if none is specified, for the
+    default locale.
+
+2.  A TimeZone object. Initially, this is set to the specified zone object, or
+    if none is specified, the default TimeZone.
+
+The context of a calendar object can be queried after the calendar is created
+using calls such as getMinimalDaysInFirstWeek(), getFirstDayOfWeek(), and
+getTimeZone(). The context can be changed using calls such as
+setMinimalDaysInFirstWeek(), setFirstDayOfWeek(), and setTimeZone().
+
+#### Factory Methods
+
+Like other format classes, the best way to create a calendar object is by using
+one of the factory methods. These are static methods on the Calendar class that
+create and return an instance of a concrete subclass. Factory methods should be
+used to enable the code to obtain the correct calendar for a locale without
+having to know specific details. The factory methods on Calendar are named
+createInstance().
+
+***MONTH field***
+> :point_right: **Note**: *Calendar numbers months starting from zero, so calling cal.set(1998, 3, 5)
+sets cal to April 15, 1998, not March 15, 1998. This follows the Java
+convention. To avoid mistakes, use the constants defined in the Calendar class
+for the months and days of the week. For example, cal.set(1998, Calendar::APRIL,
+15).*
+
+#### Ambiguous Wall Clock Time Resolution
+
+When the time offset from UTC has changed, it produces an ambiguous time slot
+around the transition. For example, many US locations observe daylight saving
+time. On the date of transition to daylight saving time in US, wall clock time
+jumps from 12:59 AM (standard) to 2:00 AM (daylight). Therefore, wall clock
+times from 1:00 AM to 1:59 AM do not exist on the date. When the input wall time
+falls into this missing time slot, the ICU Calendar resolves the time using the
+UTC offset before the transition by default. In this example, 1:30 AM is
+interpreted as 1:30 AM standard time (non-exist), so the final result will be
+2:30 AM daylight time.
+On the date of transition back to standard time, wall clock time is moved back
+one hour at 2:00 AM. So wall clock times from 1:00 AM to 1:59 AM occur twice. In
+this case, the ICU Calendar resolves the time using the UTC offset after the
+transition by default. For example, 1:30 AM on the date is resolved as 1:30 AM
+standard time.
+Ambiguous wall clock time resolution behaviors can be customized by Calendar
+APIs setRepeatedWallTimeOption() and setSkippedWallTimeOption(). These APIs are
+available in ICU 49 or later versions.
+
+### Gregorian Calendar
+
+The GregorianCalendar class implements two calendar systems, the Gregorian
+calendar and the Julian calendar. These calendar systems are closely related,
+differing mainly in their definition of the leap year. The Julian calendar has
+leap years every four years; the Gregorian calendar refines this by excluding
+century years that are not divisible by 400. GregorianCalendar defines two eras,
+BC (B.C.E.) and AD (C.E.).
+
+Historically, most western countries used the Julian calendar until the 16th to
+20th century, depending on the country. They then switched to the Gregorian
+calendar. The GregorianCalendar class mirrors this behavior by defining a
+cut-over date. Before this date, the Julian calendar algorithms are used. After
+it, the Gregorian calendar algorithms are used. By default, the cut-over date is
+set to October 4, 1582 C.E., which reflects the time when countries first began
+adopting the Gregorian calendar. The GregorianCalendar class does not attempt
+historical accuracy beyond this behavior, and does not vary its cut-over date by
+locale. However, users can modify the cut-over date by using the
+setGregorianChange() method.
+
+Code that is written correctly instantiates calendar objects using the Calendar
+factory methods, and therefore holds a Calendar* pointer, Such code can not
+directly access the GregorianCalendar-specific methods not present in Calendar.
+The correct way to handle this is to perform a dynamic cast, after testing the
+type of the object using getDynamicClassID(). For example:
+
+```C++
+void setCutover(Calendar *cal, UDate myCutover) {
+    if (cal->getDynamicClassID() == GregorianCalendar::getStaticClassID()) {
+        GregorianCalendar *gc = (GregorianCalendar*)cal;
+        gc->setGregorianChange(myCutover, status);
+    }
+}
+```
+
+> :point_right: **Note**: *This is a general technique that should be used throughout ICU in conjunction
+with the factory methods.*
+
+### Disambiguation
+
+When computing a UDate from fields, some special circumstances can arise. There
+might be insufficient information to compute the UDate (such as only year and
+month but no day in the month), there might be inconsistent information (such as
+"Tuesday, July 15, 1996" -— July 15, 1996, is actually a Monday), or the input
+time might be ambiguous because of time zone transition.
+
+1.  **Insufficient Information**
+    ICU Calendar uses the default field values to specify missing fields. The
+    default for a field is the same as that of the start of the epoch (that is,
+    YEAR = 1970, MONTH = JANUARY, DAY_OF_MONTH = 1).
+
+2.  **Inconsistent Information**
+    If fields conflict, the calendar gives preference to fields set more
+    recently. For example, when determining the day, the calendar looks for one
+    of the following combinations of fields:
+    MONTH + DAY_OF_MONTH
+    MONTH + WEEK_OF_MONTH + DAY_OF_WEEK
+    MONTH + DAY_OF_WEEK_IN_MONTH + DAY_OF_WEEK
+    DAY_OF_YEAR
+    DAY_OF_WEEK + WEEK_OF_YEAR
+    For the time of day, the calendar looks for one of the following
+    combinations of fields:
+    HOUR_OF_DAY
+    AM_PM + HOUR
+
+3.  **Ambiguous Wall Clock Time**
+    When time offset from UTC has changed, it produces ambiguous time slot
+    around the transition. For example, many US locations observe daylight
+    saving time. On the date switching to daylight saving time in US, wall clock
+    time jumps from 1:00 AM (standard) to 2:00 AM (daylight). Therefore, wall
+    clock time from 1:00 AM to 1:59 AM do not exist on the date. When the input
+    wall time fall into this missing time slot, the ICU Calendar resolves the
+    time using the UTC offset before the transition by default. In this example,
+    1:30 AM is interpreted as 1:30 AM standard time (non-exist), so the final
+    result will be 2:30 AM daylight time.
+    On the date switching back to standard time, wall clock time is moved back
+    one hour at 2:00 AM. So wall clock time from 1:00 AM to 1:59 AM occur twice.
+    In this case, the ICU Calendar resolves the time using the UTC offset after
+    the transition by default. For example, 1:30 AM on the date is resolved as
+    1:30 AM standard time.
+
+***Options for Ambiguous Time Resolution***
+> :point_right: **Note**: *Ambiguous wall clock time resolution behaviors can be customized by Calendar APIs setRepeatedTimeOption() and setSkippedTimeOption(). These methods are available in ICU 49 or later versions.*
+
+***WEEK_OF_YEAR field***
+> :point_right: **Note**: *Values calculated for the WEEK_OF_YEAR field range from 1 to 53. Week 1 for a year is the first week that contains at least getMinimalDaysInFirstWeek() days from that year. It depends on the values of getMinimalDaysInFirstWeek(), getFirstDayOfWeek(), and the day of the week of January 1. Weeks between week 1 of one year and week 1 of the following year are numbered sequentially from 2 to 52 or 53 (if needed).
+For example, January 1, 1998 was a Thursday. If getFirstDayOfWeek() is MONDAY
+and getMinimalDaysInFirstWeek() is 4 (these are the values reflecting ISO 8601
+and many national standards), then week 1 of 1998 starts on December 29, 1997,
+and ends on January 4, 1998. However, if getFirstDayOfWeek() is SUNDAY, then
+week 1 of 1998 starts on January 4, 1998, and ends on January 10, 1998. The
+first three days of 1998 are then part of week 53 of 1997.*
+
+## Programming Examples
+
+Programming for calendar [examples in C++, C, and Java](examples.md) .
diff --git a/docs/userguide/datetime/index.md b/docs/userguide/datetime/index.md
new file mode 100644
index 0000000..682e5b3
--- /dev/null
+++ b/docs/userguide/datetime/index.md
@@ -0,0 +1,137 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Date/Time Services
+
+## Overview of ICU System Time Zones
+
+A time zone represents an offset applied to Greenwich Mean Time (GMT) to obtain
+local time. The offset might vary throughout the year, if daylight savings time
+(DST) is used, or might be the same all year long. Typically, regions closer to
+the equator do not use DST. If DST is in use, then specific rules define the
+point at which the offset changes and the amount by which it changes. Thus, a
+time zone is described by the following information:
+
+*   An identifying string, or ID. This consists only of invariant characters
+    (see the file utypes.h). It typically has the format continent / city. The
+    city chosen is not the only city in which the zone applies, but rather a
+    representative city for the region. Some IDs consist of three or four
+    uppercase letters; these are legacy zone names that are aliases to standard
+    zone names.
+
+*   An offset from GMT, either positive or negative. Offsets range from
+    approximately minus half a day to plus half a day.
+
+If DST is observed, then three additional pieces of information are needed:
+
+1.  The precise date and time during the year when DST begins. In the first half
+    of the year it's in the northern hemisphere, and in the second half of the
+    year it's in the southern hemisphere.
+
+2.  The precise date and time during the year when DST ends. In the first half
+    of the year it's in the southern hemisphere, and in the second half of the
+    year it's in the northern hemisphere.
+
+3.  The amount by which the GMT offset changes when DST is in effect. This is
+    almost always one hour.
+
+### System and User Time Zones
+
+ICU supports local time zones through the classes TimeZone and SimpleTimeZone in
+the C++ API. In the C API, time zones are designated by their ID strings.
+
+Users can construct their own time zone objects by specifying the above
+information to the C++ API. However, it is more typical for users to use a
+pre-existing system time zone since these represent all current international
+time zones in use. This document lists the system time zones, both in order of
+GMT offset and in alphabetical order of ID.
+
+Since this list changes one or more times a year, *this document only represents
+a snapshot*. For the most current list of ICU system zones, use the method
+TimeZone::getAvailableIDs().
+
+*The zones are listed in binary sort order (that is, 'A' through 'Z' come before
+'a' through 'z'). This is the same order in which the zones are stored
+internally, and the same order in which they are returned by
+TimeZone::getAvailableIDs(). The reason for this is that ICU locates zones using
+a binary search, and the binary search relies on this sort order.*
+*You might notice that zones such as Etc/GMT+1 appear to have the wrong sign for
+their GMT offset. In fact, their sign is inverted since the the Etc zones follow
+the POSIX sign conventions. This is the way the original Olson data is set up,
+and ICU reproduces the Olson data faithfully. See the Olson files for more
+details.*
+
+### References
+
+The ICU system time zones are derived from the tz database (also known as the
+“Olson” database) at [ftp://elsie.nci.nih.gov/pub](ftp://elsie.nci.nih.gov/pub)
+. This is the data used across much of the industry, including by UNIX systems,
+and is usually updated several times each year. ICU (since version 2.8) and base
+Java (since Java 1.4) contain code and tz data supporting both current and
+historic time zone usage.
+
+## How ICU Represents Dates/Times
+
+ICU represents dates and times using UDates. A UDate is a scalar value that
+indicates a specific point in time, independent of calendar system and local
+time zone. It is stored as the number of milliseconds from a reference point
+known as the epoch. The epoch is midnight Universal Time Coordinated (UTC)
+January 1, 1970 A.D. Negative UDate values indicate times before the epoch.
+
+*These classes have the same architecture as the Java classes.*
+
+Most people only need to use the DateFormat classes for parsing and formatting
+dates and times. However, for those who need to convert dates and times or
+perform numeric calculations, the services described in this section can be very
+useful.
+
+To translate a UDate to a useful form, a calendar system and local time zone
+must be specified. These are specified in the form of objects of the Calendar
+and TimeZone classes. Once these two objects are specified, they can be used to
+convert the UDate to and from its corresponding calendar fields. The different
+fields are defined in the Calendar class and include the year, month, day, hour,
+minute, second, and so on.
+
+Specific Calendar objects correspond to calendar systems (such as Gregorian) and
+conventions (such as the first day of the week) in use in different parts of the
+world. To obtain a Calendar object for France, for example, call
+Calendar::createInstance(Locale::getFrance(), status).
+
+The TimeZone class defines the conversion between universal coordinated time
+(UTC),, and local time, according to real-world rules. Different TimeZone
+objects correspond to different real-world time zones. For example, call
+TimeZone::createTimeZone("America/Los_Angeles") to obtain an object that
+implements the U.S. Pacific time zone, both Pacific Standard Time (PST) and
+Pacific Daylight Time (PDT).
+
+As previously mentioned, the Calendar and TimeZone objects must be specified
+correctly together. One way of doing so is to create each independently, then
+use the Calendar::setTimeZone() method to associate the time zone with the
+calendar. Another is to use the Calendar::createInstance() method that takes a
+TimeZone object. For example, call Calendar::createInstance(
+TimeZone::createInstance( "America/Los_Angeles"), Locale:getUS(), status) to
+obtain a Calendar appropriate for use in the U.S. Pacific time zone.
+
+ICU has four classes pertaining to calendars and timezones:
+
+*   [Calendar](calendar/index.md)
+    Calendar is an abstract base class that represents a calendar system.
+    Calendar objects map UDate values to and from the individual fields used in
+    a particular calendar system. Calendar also performs field computations such
+    as advancing a date by two months.
+
+*   [Gregorian Calendar](calendar/index.md) (§)
+    GregorianCalendar is a concrete subclass of Calendar that implements the
+    rules of the Julian calendar and the Gregorian calendar, which is the common
+    calendar in use internationally today.
+
+*   [TimeZone](timezone/index.md)
+    TimeZone is an abstract base class that represents a time zone. TimeZone
+    objects map between universal coordinated time (UTC) and local time.
+
+*   [SimpleTimeZone](timezone/index.md) (§)
+    SimpleTimeZone is a concrete subclass of TimeZone that implements standard
+    time and daylight savings time according to real-world rules. Individual
+    SimpleTimeZone objects correspond to real-world time zones.
diff --git a/docs/userguide/datetime/timezone/examples.md b/docs/userguide/datetime/timezone/examples.md
new file mode 100644
index 0000000..3f236cb
--- /dev/null
+++ b/docs/userguide/datetime/timezone/examples.md
@@ -0,0 +1,76 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Date and Time Zone Examples
+
+## C++ TimeZone example code
+
+This example code illustrates some time zone operations.
+
+```C++
+ UErrorCode success = U_ZERO_ERROR;
+ UnicodeString dateReturned, curTZNameEn, curTZNameFr;
+ UDate curDate;
+ int32_t stdOffset,dstOffset;
+
+ // Create a Time Zone with America/Los_Angeles
+ TimeZone *tzWest = TimeZone::createTimeZone("America/Los_Angeles");
+
+
+// Print out the Time Zone Name, GMT offset etc.
+ curTZNameEn = tzWest->getDisplayName(Locale::getEnglish(),curTZNameEn);
+ u_printf("%s\n","Current Time Zone Name in English:");
+ u_printf("%S\n", curTZNameEn.getTerminatedBuffer());
+
+ curTZNameFr = tzWest->getDisplayName(Locale::getCanadaFrench(),curTZNameFr);
+ u_printf("%s\n","Current Time Zone Name in French:");
+ u_printf("%S\n", curTZNameFr.getTerminatedBuffer());
+
+
+ // Create a Calendar to get current date
+ Calendar* calendar = Calendar::createInstance(success);
+ curDate = calendar->getNow();
+
+
+ // Print out the Current Date/Time in the given time zone
+ DateFormat *dt = DateFormat::createDateInstance();
+ dateReturned = dt->format(curDate,dateReturned,success);
+ u_printf("%s\n", "Current Time:");
+ u_printf("%S\n", dateReturned.getTerminatedBuffer());
+
+
+ // Use getOffset to get the stdOffset and dstOffset for the given time
+ tzWest->getOffset(curDate,true,stdOffset,dstOffset,success);
+ u_printf("%s\n%d\n","Current Time Zone STD offset:",stdOffset/(1000*60*60));
+ u_printf("%s\n%d\n","Current Time Zone DST offset:",dstOffset/(1000*60*60));
+ u_printf("%s\n", "Current date/time is in daylight savings time?");
+ u_printf("%s\n", (calendar->inDaylightTime(success))?"Yes":"No");
+
+
+ // Use createTimeZoneIDEnumeration to get the specific Time Zone IDs
+ // in United States with -5 hour standard offset from GMT
+ stdOffset = (-5)*U_MILLIS_PER_HOUR; // U_MILLIS_PER_HOUR = 60*60*1000;
+ StringEnumeration *ids = TimeZone::createTimeZoneIDEnumeration(UCAL_ZONE_TYPE_CANONICAL_LOCATION,"US",&stdOffset,success);
+ for (int i=0; i<ids->count(success);i++) {
+   u_printf("%s\n",ids->next(NULL,success));
+ }
+
+
+ // Use Calendar to get the hour of the day for different time zones
+ int32_t hour1,hour2;
+ TimeZone *tzEast = TimeZone::createTimeZone("America/New_York");
+ Calendar * cal1 = Calendar::createInstance(tzWest,success);
+ Calendar * cal2 = Calendar::createInstance(tzEast,success);
+ hour1 = cal1->get(UCAL_HOUR_OF_DAY,success);
+ hour2 = cal2->get(UCAL_HOUR_OF_DAY,success);
+ u_printf("%s\n%d\n","Current hour of the day in North American West: ", hour1);
+ u_printf("%s\n%d\n","Current hour of the day in North American East: ", hour2);
+
+ delete cal1;
+ delete cal2;
+ delete ids;
+ delete calendar;
+ delete dt;
+ ```
diff --git a/docs/userguide/datetime/timezone/index.md b/docs/userguide/datetime/timezone/index.md
new file mode 100644
index 0000000..503fd53
--- /dev/null
+++ b/docs/userguide/datetime/timezone/index.md
@@ -0,0 +1,242 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# ICU TimeZone Classes
+
+## Overview
+
+A time zone is a system that is used for relating local times in different
+geographical areas to one another. For example, in the United States, Pacific
+Time is three hours earlier than Eastern Time; when it's 6 P.M. in San
+Francisco, it's 9 P.M. in Brooklyn. To make things simple, instead of relating
+time zones to one another, all time zones are related to a common reference
+point.
+
+For historical reasons, the reference point is Greenwich, England. Local time in
+Greenwich is referred to as Greenwich Mean Time, or GMT. (This is similar, but
+not precisely identical, to Universal Coordinated Time, or UTC. We use the two
+terms interchangeably in ICU since ICU does not concern itself with either leap
+seconds or historical behavior.) Using this system, Pacific Time is expressed as
+GMT-8:00, or GMT-7:00 in the summer. The offset -8:00 indicates that Pacific
+Time is obtained from GMT by adding -8:00, that is, by subtracting 8 hours.
+
+The offset differs in the summer because of daylight savings time, or DST. At
+this point it is useful to define three different flavors of local time:
+
+* **Standard Time**:
+  Standard Time is local time without a daylight savings time offset. For
+  example, in California, standard time is GMT-8:00; that is, 8 hours before
+  GMT.
+* **Daylight Savings Time**:
+  Daylight savings time is local time with a daylight savings time offset.
+  This offset is typically one hour, but is sometimes less. In California,
+  daylight savings time is GMT-7:00. Daylight savings time is observed in most
+  non-equatorial areas.
+* **Wall Time**:
+  Wall time is what a local clock on the wall reads. In areas that observe
+  daylight savings time for part of the year, wall time is either standard
+  time or daylight savings time, depending on the date. In areas that do not
+  observe daylight savings time, wall time is equivalent to standard time.
+
+## Time Zones in ICU
+
+ICU supports time zones through two classes:
+
+* **TimeZone**:
+  `TimeZone` is an abstract base class that defines the time zone API. This API
+  supports conversion between GMT and local time.
+* **SimpleTimeZone**:
+  `SimpleTimeZone` is a concrete subclass of TimeZone that implements the
+  standard time zones used today internationally.
+
+Timezone classes are related to `UDate`, the `Calendar` classes, and the
+`DateFormat` classes.
+
+### Timezone Class in ICU
+
+`TimeZone` is an abstract base class. It defines common protocol for a hierarchy
+of classes. This protocol includes:
+
+* A programmatic ID, for example, "America/Los_Angeles". This ID is used to
+  call up a specific real-world time zone. It corresponds to the IDs defined
+  in the [IANA Time Zone datbase](https://www.iana.org/time-zones) used by UNIX
+  and other systems, and has the format continent/city or ocean/city.
+* A raw offset. This is the difference, in milliseconds, between a time zone's
+  standard time and GMT. Positive raw offsets are east of Greenwich.
+* Factory methods and methods for handling the default time zone.
+* Display name methods.
+* An API to compute the difference between local wall time and GMT.
+
+#### Factory Methods and the Default Timezone
+
+The TimeZone factory method `createTimeZone()` creates and returns a `TimeZone`
+object given a programmatic ID. The user does not know what the class of the
+returned object is, other than that it is a subclass of `TimeZone`.
+
+The `createAvailableIDs()` methods return lists of the programmatic IDs of all
+zones known to the system. These IDs may then be passed to `createTimeZone()` to
+create the actual time zone objects. ICU maintains a comprehensive list of
+current international time zones, as derived from the Olson data.
+
+`TimeZone` maintains a static time zone object known as the *default time zone*.
+This is the time zone that is used implicitly when the user does not specify
+one. ICU attempts to match this to the host OS time zone. The user may obtain a
+clone of the default time zone by calling `createDefault()` and may change the
+default time zone by calling `setDefault()` or `adoptDefault()`.
+
+#### Display Name
+
+When displaying the name of a time zone to the user, use the display name, not
+the programmatic ID. The display name is returned by the `getDisplayName()`
+method. A time zone may have three display names:
+
+* Generic name, such as "Pacific Time".
+* Standard name, such as "Pacific Standard Time".
+* Daylight savings name, such as "Pacific Daylight Time".
+
+Furthermore, each of these names may be LONG or SHORT. The SHORT form is
+typically an abbreviation, e.g., "PST", "PDT".
+
+In addition to being available directly from the `TimeZone` API, the display name
+is used by the date format classes to format and parse time zones.
+
+#### getOffset() API
+
+`TimeZone` defines the API `getOffset()` by which the caller can determine the
+difference between local time and GMT. This is a pure virtual API, so it is
+implemented in the concrete subclasses of `TimeZone`.
+
+## Updating the Time Zone Data
+
+Time zone data changes often in response to governments around the world
+changing their local rules and the areas where they apply. ICU derives its tz
+data from the [IANA Time Zone Database](http://www.iana.org/time-zones).
+
+The ICU project publishes updated timezone resource data in response to IANA
+updates, and these can be used to patch existing ICU installations. Several
+update strategies are possible, depending on the ICU version and configuration.
+
+* ICU4J: Use the time zone update utility.
+* ICU4C 54 and newer: Drop in the binary update files.
+* ICU4C 36 and newer: the best update strategy will depend on how ICU data
+  loading is configured for the specific ICU installation.
+  * Data is loaded from a .dat package file: replace the time zone resources
+    in the .dat file using the icupkg tool.
+  * Data is loaded from a .dll or .so shared library: obtain the updated
+    sources for the tz resources and rebuild the data library.
+  * Data is loaded from individual files: drop in the updated binary .res
+    files.
+
+The [ICU Data](../../icudata.md) section of this user guide gives more
+information on how ICU loads resources.
+
+The ICU resource files required for time zone data updates are posted at
+<https://github.com/unicode-org/icu-data/tree/master/tzdata/icunew>. The
+required resource files for ICU version 44 and newer are
+
+* zoneinfo64.res
+* windowsZones.res
+* timezoneTypes.res
+* metaZones.res
+
+### ICU4C TZ update of a .dat Package File
+
+For ICU configurations that load data from a .dat package file, replace the time
+zone resources in that file.
+
+1. Download the new .res files from
+   `https://github.com/unicode-org/icu-data/tree/master/tzdata/icunew/<IANA tz version>/44/<platform directory>`.
+   * `<IANA tz version>` is a combination of year and letter, such as "2019c".
+   * *"44"* is the directory for updates to ICU version 4.4 and newer.
+   * `<platform directory>` is "le" for little endian processors, including
+     all Intel processors.
+   * `<platform directory>` is "be" for big endian processors, including IBM
+     Power and Sparc.
+   * `<platform directory>` is "ee" for IBM mainframes using EBCDIC character
+     sets.
+2. Check that the tool "icupkg" is available. If not already on your system,
+   you can get it by [downloading](https://github.com/unicode-org/icu/releases)
+   and building ICU, following the instructions in the ReadMe file included in
+   the download. Alternatively, on many Linux systems, "apt-get install
+   icu-devtools" will install the tool.
+3. Locate the .dat file to be updated, and do the update. The commands below
+   are for a .dat file named icudt55l.dat.
+
+```Shell
+icupkg -a zoneinfo64.res icudt55l.dat
+icupkg -a windowsZones.res icudt55l.dat
+icupkg -a timezoneTypes.res icudt55l.dat
+icupkg -a metaZones.res icudt55l.dat
+```
+
+In ICU versions older than 4.4 some of the time zone resources have slightly
+different names. The update procedure is the same, but substitute the names
+found in the desired download directory - 42, 40, 38 or 36.
+
+### ICU4C TZ Update with Drop-in .res files (ICU 54 and newer)
+
+With this approach, the four individual .res files are dropped in any convenient
+location in the file system, and ICU is given an absolute path to the directory
+containing them. For the time zone resources only, ICU will check this directory
+first when loading data. This approach will work even when all other ICU data
+loading is from a shared library or .dat file.
+
+There are two ways to specify the directory:
+
+* At ICU build time, by defining the C pre-processor variable
+  `U_TIMEZONE_FILES_DIR` to the run time path to the directory containing the
+  .res files.
+* At run time, by setting the environment variable `ICU_TIMEZONE_FILES_DIR` to
+  the absolute path of the directory containing the .res files.
+
+If both are defined, the environment variable `ICU_TIMEZONE_FILES_DIR` take
+precedence. If either is defined, the time zone directory will be checked first,
+meaning that time zone resource files placed there will override time zone
+resources that may exist in other ICU data locations.
+
+To do the update, download the .res files appropriate for the platform, as
+described for the .dat file update above, and copy them into the time zone res
+file directory.
+
+### ICU4C TZ update when ICU is configured for individual files
+
+If the ICU-using application sets an ICU data path (or can be changed to set
+one), then the time zone .res file can be placed there. Download the files as
+described above and copy them to the specified directory. See the
+[ICU Data](../../icudata.md) page of the user guide for more information about
+the ICU data path.
+
+### ICU4C TZ update when ICU data is built into a shared library
+
+1. Set up the environment necessary to rebuild your specific configuration of
+   ICU.
+2. Download the .txt file sources for the updated resources from
+   `https://github.com/unicode-org/icu-data/tree/master/tzdata/icunew/<IANA tz version>/44`
+3. Copy the downloaded .txt files into the ICU sources for your installation,
+   in the subdirectory source/data/misc/
+4. Rebuid ICU.
+5. Copy the freshly built ICU data shared library to the desired destination.
+
+> :point_right: **Note**: The standard ICU download package contains pre-built
+ICU data. To rebuild ICU data from .txt files, you will need to replace  the
+contents of `icu4c/source/data` with the contents of ICU4C data.zip. See
+[ICU Data Build Tool](../../icu_data/buildtool.md) for more details.
+
+There are too many possible platform variations to be more specific about how to
+rebuild ICU4C in these instructions. See the ReadMe file included with the ICU
+sources for general information on building ICU.
+
+### Update the time zone data for ICU4J
+
+The [ICU4J Time Zone Update Update
+Utility](http://site.icu-project.org/download/icutzu) automates the process of
+updating ICU4J jar files with the latest time zone data. Instructions for use
+are [here](https://htmlpreview.github.io/?https://github.com/unicode-org/icu-data/blob/master/tzdata/tzu/readme.html).
+
+The updater will work with ICU version 3.4.2 and newer.
+
+Sample Code
+
+See the [Date and Time Zone Examples](examples.md) subpage.
diff --git a/docs/userguide/datetime/universaltimescale.md b/docs/userguide/datetime/universaltimescale.md
new file mode 100644
index 0000000..0d7c6d5
--- /dev/null
+++ b/docs/userguide/datetime/universaltimescale.md
@@ -0,0 +1,256 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Universal Time Scale
+
+## Overview
+
+There are quite a few different conventions for binary datetime, depending on
+the platform or protocol. Some of these have severe drawbacks. For example,
+people using Unix time (seconds since Jan 1, 1970, usually in a 32-bit integer)
+think that they are safe until near the year 2038. But cases can and do arise
+where arithmetic manipulations causes serious problems. Consider the computation
+of the average of two datetimes, for example: if one calculates them with
+`averageTime = (time1 + time2)/2`, there will be overflow even with dates
+beginning in 2004. Moreover, even if these problems don't occur, there is the
+issue of conversion back and forth between different systems.
+
+Binary datetimes differ in a number of ways: the data type, the unit, and the
+epoch (origin). We'll refer to these as time scales. For example: (Sorted by
+epoch and unit, descending. In Java, `int64_t`=`long` and `int32_t`=`int`.)
+
+| Source                                     | Data Type                                                                        | Epoch       | Unit                                                    |
+| ------------------------------------------ | -------------------------------------------------------------------------------- | ----------- | ------------------------------------------------------- |
+| MacOS X (CFDate/NSDate)                    | double (1.0=1s but fractional seconds are used as well; imprecise for 0.1s etc.) | 2001-Jan-01 | seconds (and fractions thereof)                         |
+| Unix time_t                                | int32_t or int64_t (signed int32_t limited to 1970..2038)                        | 1970-Jan-01 | seconds                                                 |
+| Java Date                                  | int64_t                                                                          | 1970-Jan-01 | milliseconds                                            |
+| Joda DateTime                              | int64_t                                                                          | 1970-Jan-01 | milliseconds                                            |
+| ICU4C UDate                                | double (does not use fractional milliseconds)                                    | 1970-Jan-01 | milliseconds                                            |
+| JavaScript Date                            | double (does not use fractional milliseconds; JavaScript Number stores a double) | 1970-Jan-01 | milliseconds                                            |
+| Unix struct timeval (as in gettimeofday)   | struct: time_t (seconds); suseconds_t (microseconds)                             | 1970-Jan-01 | microseconds                                            |
+| Gnome g_get_real_time()                    | gint64                                                                           | 1970-Jan-01 | microseconds                                            |
+| Unix struct timespec (as in clock_gettime) | struct: time_t (seconds); long (nanoseconds)                                     | 1970-Jan-01 | nanoseconds                                             |
+| MacOS (old)                                | uint32_t (1904..2040)                                                            | 1904-Jan-01 | seconds                                                 |
+| Excel                                      | ?                                                                                | 1899-Dec-31 | days                                                    |
+| DB2                                        | ?                                                                                | 1899-Dec-31 | days                                                    |
+| Windows FILETIME                           | int64_t                                                                          | 1601-Jan-01 | ticks (100 nanoseconds; finest granularity in industry) |
+| .NET DateTime                              | uint62 (only 0001-9999; only 62 bits; also 2-bit field for UTC/local)            | 0001-Jan-01 | ticks (100 nanoseconds; finest granularity in industry) |
+| ICU Universal Time Scale                   | int64_t                                                                          | 0001-Jan-01 | same as .Net but allows 29000BC..29000AD                |
+
+All of the epochs start at 00:00 am (the earliest possible time on the day in
+question), and are usually assumed to be UTC.
+
+The ranges, in years, for different data types are given in the following table.
+The range for integer types includes the entire range expressible with positive
+and negative values of the data type. The range for double is the range that
+would be allowed without losing precision to the corresponding unit.
+
+| Units                  | 64-bit integer          | Double         | 32-bit integer |
+| ---------------------- | ----------------------- | -------------- | -------------- |
+| 1 second               | 5.84542x10<sup>11</sup> | 285,420,920.94 | 136.10         |
+| 1 millisecond          | 584,542,046.09          | 285,420.92     | 0.14           |
+| 1 microsecond          | 584,542.05              | 285.42         | 0.00           |
+| 100 nanoseconds (tick) | 58,454.20               | 28.54          | 0.00           |
+| 1 nanosecond           | 584.5420461             | 0.2854         | 0.00           |
+
+ICU implements a universal time scale that is similar to the 
+[.NET framework's System.DateTime](https://docs.microsoft.com/dotnet/api/system.datetime?view=netframework-4.8).
+The universal time scale is a 64-bit integer that holds ticks since midnight,
+January 1<sup>st</sup>, 0001. Negative values are supported. This has enough
+range to guarantee that calculations involving dates around the present are safe.
+
+The universal time scale always measures time according to the proleptic
+Gregorian calendar. That is, the Gregorian calendar's leap year rules are used
+for all times, even before 1582 when it was introduced. (This is different from
+the default ICU calendar which switches from the Julian to the Gregorian
+calendar in 1582. See `GregorianCalendar::setGregorianChange()` and
+`ucal_setGregorianChange()`.)
+
+ICU provides conversion functions to and from all other major time scales,
+allowing datetimes in any time scale to be converted to the universal time
+scale, safely manipulated, and converted back to any other datetime time scale.
+
+## Background
+
+So how did we decide what to use for the universal time scale? Java time has
+plenty of range, but cannot represent a .NET `System.DateTime` value without
+severe loss of precision. ICU4C time addresses this by using a double that is
+otherwise equivalent to the Java time. However, there are disadvantages with
+doubles. They provide for much more graceful degradation in arithmetic
+operations. But they only have 53 bits of accuracy, which means that they will
+lose precision when converting back and forth to ticks. What would really be
+nice would be a long double (80 bits -- 64 bit mantissa), but that is not
+supported on most systems.
+
+The Unix extended time uses a structure with two components: time in seconds and
+a fractional field (microseconds). However, this is clumsy, slow, and prone to
+error (you always have to keep track of overflow and underflow in the fractional
+field). `BigDecimal` would allow for arbitrary precision and arbitrary range, but
+we did not want to use this as the normal type, because it is slow and does not
+have a fixed size.
+
+Because of these issues, we concluded that the .NET `System.DateTime` is the best
+timescale to use. However, we use the full range allowed by the data type,
+allowing for datetimes back to 29,000 BC and up to 29,000 AD. (`System.DateTime`
+uses only 62 bits and only supports dates from 0001 AD to 9999 AD.) This time
+scale is very fine grained, does not lose precision, and covers a range that
+will meet almost all requirements. It will not handle the range that Java times
+do, but frankly, being able to handle dates before 29,000 BC or after 29,000 AD
+is of very limited interest.
+
+## Constants
+
+ICU provides routines to convert from other timescales to the universal time
+scale, to convert from the universal time scale to other timescales, and to get
+information about a particular timescale. In all of these routines, the
+timescales are referenced using an integer constant, according to the following
+table:
+
+| Source                 | ICU4C                       | ICU4J                  |
+| ---------------------- | --------------------------- | ---------------------- |
+| Java                   | UDTS_JAVA_TIME              | JAVA_TIME              |
+| Unix                   | UDTS_UNIX_TIME              | UNIX_TIME              |
+| ICU4C                  | UDTS_ICU4C_TIME             | ICU4C_TIME             |
+| Windows FILETIME       | UDTS_WINDOWS_FILE_TIME      | WINDOWS_FILE_TIME      |
+| .NET DateTime          | UDTS_DOTNET_DATE_TIME       | DOTNET_DATE_TIME       |
+| Macintosh (old)        | UDTS_MAC_OLD_TIME           | MAC_OLD_TIME           |
+| Macintosh              | UDTS_MAC_TIME               | MAC_TIME               |
+| Excel                  | UDTS_EXCEL_TIME             | EXCEL_TIME             |
+| DB2                    | UDTS_DB2_TIME               | DB2_TIME               |
+| Unix with microseconds | UDTS_UNIX_MICROSECONDS_TIME | UNIX_MICROSECONDS_TIME |
+
+The routine that gets a particular piece of information about a timescale takes
+an integer constant that identifies the particular piece of information,
+according to the following table:
+
+| Value                | ICU4C                   | ICU4J              |
+| -------------------- | ----------------------- | ------------------ |
+| Precision            | UTSV_UNITS_VALUE        | UNITS_VALUE        |
+| Epoch offet          | UTSV_EPOCH_OFFSET_VALUE | EPOCH_OFFSET_VALUE |
+| Minimum "from" value | UTSV_FROM_MIN_VALUE     | FROM_MIN_VALUE     |
+| Maximum "from" value | UTSV_FROM_MAX_VALUE     | FROM_MAX_VALUE     |
+| Minimum "to" value   | UTSV_TO_MIN_VALUE       | TO_MIN_VALUE       |
+| Maximum "to" value   | UTSV_TO_MAX_VALUE       | TO_MAX_VALUE       |
+
+Here is what the values mean:
+
+* Precision -- the precision of the timescale, in ticks.
+* Epoch offset -- the distance from the universal timescale's epoch to the timescale's epoch, in the timescale's precision.
+* Minimum "from" value -- the minimum timescale value that can safely be converted to the universal timescale.
+* Maximum "from" value -- the maximum timescale value that can safely be converted to the universal timescale.
+* Minimum "to" value -- the minimum universal timescale value that can safely be converted to the timescale.
+* Maximum "to" value -- the maximum universal timescale value that can safely be converted to the timescale.
+
+## Converting
+
+You can convert from other timescale values to the universal timescale using the
+"from" methods. In ICU4C, you use `utmscale_fromInt64`:
+
+```c
+UErrorCode err = U_ZERO_ERROR;
+int64_t unixTime = ...;
+int64_t universalTime;
+
+universalTime = utmscale_fromInt64(unixTime, UDTS_UNIX_TIME, &err);
+```
+
+In ICU4J, you use `UniversalTimeScale.from`:
+
+```java
+long javaTime = ...;
+long universalTime;
+
+universalTime = UniversalTimeScale.from(javaTime, UniversalTimeScale.JAVA_TIME);
+```
+
+You can convert values in the universal timescale to other timescales using the
+"to" methods. In ICU4C, you use `utmscale_toInt64`:
+
+```c
+UErrorCode err = U_ZERO_ERROR;
+int64_t universalTime = ...;
+int64_t unixTime;
+
+unixTime = utmscale_toInt64(universalTime, UDTS_UNIX_TIME, &err);
+```
+
+In ICU4J, you use `UniversalTimeScale.to`:
+
+```java
+long universalTime = ...;
+long javaTime;
+
+javaTime = UniversalTimeScale.to(universalTime, UniversalTimeScale.JAVA_TIME);
+```
+
+That's all there is to it!
+
+If the conversion is out of range, the ICU4C routines
+will set the error code to `U_ILLEGAL_ARGUMENT_ERROR`, and the ICU4J methods will
+throw `IllegalArgumentException`. In ICU4J, you can avoid out of range conversions
+by using the `BigDecimal` methods:
+
+```java
+long fileTime = ...;
+double icu4cTime = ...;
+BigDecimal utICU4C, utFile, utUnix, unixTime, macTime;
+
+utFile   = UniversalTimeScale.bigDecimalFrom(fileTime, UniversalTime.WINDOWS_FILE_TIME);
+
+utICU4C  = UniversalTimeScale.bigDecimalFrom(icu4cTime, UniversalTimeScale.ICU4C_TIME);
+
+unixTime = UniversalTimeScale.toBigDecimal(utFile, UniversalTime.UNIX_TIME);
+macTime  = UniversalTimeScale.toBigDecimal(utICU4C, UniversalTime.MAC_TIME);
+
+utUnix   = UniversalTimeScale.bigDecimalFrom(unixTime, UniversalTime.UNIX_TIME);
+```
+
+> :point_right: **Note**: Because the Universal Time Scale has a finer resolution
+> than some other time scales, time values that can be represented exactly in the 
+> Universal Time Scale will be rounded when converting to these time scales, and
+> resolution will be lost. If you convert these values back to the Universal Time
+> Scale, you will not get the same time value that you started with. If the time
+> scale to which you are converting uses a double to represent the time value, you
+> may loose precision even though the double supports a range that is larger than
+> the range supported by the Universal Time Scale.
+
+## Formatting and Parsing
+
+Currently, ICU does not support direct formatting or parsing of Universal Time
+Scale values. If you want to format a Universal Time Scale value, you will need
+to convert it to an ICU time scale value first. Use `UTDS_ICU4C_TIME` with ICU4C,
+and `UniversalTimeScale.JAVA_TIME` with ICU4J.
+
+When you parse a datetime string, the result will be an ICU time scale value.
+You can convert this value to a Universal Time Scale value using `UDTS_ICU4C_TIME`
+with ICU4C, and `UniversalTime.JAVA_TIME` for ICU4J.
+
+See the previous section, *Converting*, for details of how to do the conversion.
+
+## Getting Timescale Information
+
+To get information about a particular timescale in ICU4C, use
+`utmscale_getTimeScaleValue`:
+
+```c
+UErrorCode err = U_ZERO_ERROR;
+int64_t unixEpochOffset = utmscale_getTimeScaleValue(
+    UDTS_UNIX_TIME,
+    UTSV_EPOCH_OFFSET_VALUE,
+    &err);
+```
+
+In ICU4J, use `UniversalTimeScale.getTimeScaleValue`:
+
+```java
+long javaEpochOffset = UniversalTimeScale.getTimeScaleValue(
+    UniversalTimeScale.JAVA_TIME,
+    UniversalTimeScale.EPOCH_OFFSET_VALUE);
+```
+
+If the integer constants for selecting the timescale or the timescale value are
+out of range, the ICU4C routines will set the error code to
+`U_ILLEGAL_ARGUMENT_ERROR`, and the ICU4J methods will throw
+`IllegalArgumentException`.
diff --git a/docs/userguide/design.md b/docs/userguide/design.md
new file mode 100644
index 0000000..536b12c
--- /dev/null
+++ b/docs/userguide/design.md
@@ -0,0 +1,899 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# ICU Architectural Design
+
+This chapter discusses the ICU design structure, the ICU versioning support, and
+the introduction of namespace in C++.
+
+## Java and ICU Basic Design Structure
+
+The JDK internationalization components and ICU components both share the same
+common basic architectures with regard to the following:
+
+1. locales
+2. data-driven services
+3. ICU threading models and the open and close model
+4. cloning customization
+5. error handling
+6. extensibility
+7. resource bundle inheritance model
+
+There are design features in ICU4C that are not in the Java Development Kit
+(JDK) due
+to programming language restrictions. These features include the following:
+
+### Locales
+
+Locale IDs are composed of language, country, and variant information. The
+following links provide additional useful information regarding ISO standards:
+[ISO-639](http://lcweb.loc.gov/standards/iso639-2/englangn.html) , and an ISO
+Country Code,
+[ISO-3166](http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html)
+. For example, Italian, Italy, and Euro are designated as: it_IT_EURO.
+
+### Data-driven Services
+
+Data-driven services often use resource bundles for locale data. These services
+map a key to data. The resources are designed not only to manage system locale
+information but also to manage application-specific or general services data.
+ICU supports string, numeric, and binary data types and can be structured into
+nested arrays and tables.
+
+This results in the following:
+
+1. Data used by the services can be built at compile time or run time.
+2. For efficient loading, system data is pre-compiled to .dll files or files
+   that can be mapped into memory.
+3. Data for services can be added and modified without source code changes.
+
+### ICU Threading Model and Open and Close Model
+
+The "open and close" model supports multi-threading. It enables ICU users to use
+the same kind of service for different locales, either in the same thread or in
+different threads.
+
+For example, a thread can open many collators for different languages, and
+different threads can use different collators for the same locale
+simultaneously. Constant data can be shared so that only the current state is
+allocated for each editor.
+
+The ICU threading model is designed to avoid contention for resources, and
+enable you to use the services for multiple locales simultaneously within the
+same thread. The ICU threading model, like the rest of the ICU architecture, is
+the same model used for the international services in Java™.
+
+When you use a service such as collation, the client opens the service using an
+ID, typically a locale. This service allocates a small chunk of memory used for
+the state of the service, with pointers to shared, read-only data in support of
+that service. (In Java, you call `getInstance()` to create an object; in C++,
+`createInstance()`. ICU uses the open and close metaphor in C because it is more
+familiar to C programmers.)
+
+If no locale is supplied when a service is opened, ICU uses the default locale.
+Once a service is open, changing the default locale has no effect. Thus, there
+can not be any thread synchronization between the default locales and open
+services.
+
+When you open a second service for the same locale, another small chunk of
+memory is used for the state of the service, with pointers to the same shared,
+read-only data. Thus, the majority of the memory usage is shared. When any
+service is closed, then the chunk of memory is deallocated. Other connections
+that point to the same shared data stay valid.
+
+Any number of services, for the same locale or different locales, can be open
+within the same thread or in different threads.
+
+#### Thread-safe const APIs
+
+In recent ICU releases, we have worked to make any service object *thread-safe*
+(usable concurrently) *as long as all of the threads are using only const APIs*:
+APIs that are declared const in C++, take a const this-like service pointer in
+C, or are "logically const" in Java. This is an enhancement over the original
+Java/ICU threading model. (Originally, concurrent use of even only const APIs
+was not thread-safe.)
+
+However, you cannot use a reference to an open service object in two threads at
+the same time *if either of them calls any non-const API*. An individual open
+service object is not thread-safe for concurrent "writes". Rather, for non-const
+use, you must use the clone function to create a copy of the service you want
+and then pass this copy to the second thread. This procedure allows you to use
+the same service in different threads, but avoids any thread synchronization or
+deadlock problems.
+
+#### Freezable
+
+Some classes also implement the `Freezable` interface (or similar pattern in
+C++), for example `UnicodeSet` or `Collator`: An object that typically starts
+out mutable can be set up and then "frozen", which makes it immutable and thus
+usable concurrently because all non-const APIs are disabled. A frozen object can
+never be "thawed". For example, a `Collator` can be created, various attributes
+set, then frozen and then used from many threads for comparing strings and
+getting sort keys.
+
+#### Clone vs. open
+
+Clone operations are designed to be much faster than reopening the service with
+initial parameters and copying the source's state. (With objects in C++ and
+Java, the clone function is also much safer than trying to recreate a service,
+since you get the proper subclass.) Once a service is cloned, changes will not
+affect the original source service, or vice-versa.
+
+Thus, the normal mode of operation is to:
+
+1. Open a service with a given locale.
+2. Use the service as long as needed. However, do not keep opening and closing
+   a service within a tight loop.
+3. Clone a service if it needs to be used in parallel in another thread.
+4. Close any clones that you open as well as any instances of the services that
+   are owned.
+
+> :point_right: **Note**: These service instances may be closed in any sequence.
+The preceding steps are given as an example.
+
+#### Cloning Customization
+
+Typically, the services supplied with ICU cover the vast majority of usages.
+However, there are circumstances where the service needs to be customized for a
+new locale. ICU (and Java) enable you to create customized services. For
+example, you can create a `RuleBasedCollator` by merging the rules for French and
+Arabic to get a custom French-Arabic collation sequence. By merging these rules,
+the pointer does not point to a read-only table that is shared between threads.
+Instead, the pointer refers to a table that is specific to your particular open
+service. If you clone the open service, the table is copied. When you close the
+service, the table is destroyed.
+
+For some services, ICU supplies registration. You can register a customized open
+service under an ID; keeping a copy of that service even after you close the
+original. A client in that thread or in other threads can recreate a copy of the
+service by opening with that ID.
+
+ICU may cache service instances. Therefore, registration should be done during
+startup, before opening services by locale ID.
+
+These registrations are not persistent; once your program finishes, ICU flushes
+all the registrations. While you still might have multiple copies of data
+tables, it is faster to create a service from a registered ID than it is to
+create a service from rules.
+
+> :point_right: **Note**: To work around the lack of persistent registration,
+query the service for the parameters used to create it and then store those
+parameters in a file on a disk.
+
+For services whose IDs are locales, such as collation, the registered IDs must
+also be locales. For those services (like Transliteration or Timezones) that are
+cross-locale, the IDs can be any string.
+
+Prospective future enhancements for this model are:
+
+1. Having custom services share data tables, by making those tables reference
+   counted. This will reduce memory consumption and speed clone operations (a
+   performance enhancement chiefly useful for multiple threads using the same
+   customized service).
+2. Expanding registration for all the international services.
+3. Allowing persistent registration of services.
+
+#### Per-client Locale ID vs Per-thread Locale ID
+
+Some application environments operate by setting a per thread (or per process)
+locale ID, and then not passing the locale ID as a parameter during processing.
+If this usage model were used with ICU in a multi-threaded server, it might
+result in ICU being requested to constantly open, use, and then close service
+objects. Instead, it is recommended that locale IDs be associated with each
+client be stored with other per-client data, along with any service objects
+(such as collators or formatters) that client might use. If operations involving
+a single client are short-lived, it might be more efficient to keep a pool of
+service objects, organized according to locale. Then, if a particular locale's
+formatter is in high demand, that formatter can be used, and then returned to
+the pool.
+
+### ICU Memory Usage
+
+ICU4C APIs are designed to allow separate heaps for its libraries vs. the
+application. This is achieved by providing functions to allocate and release
+objects owned by ICU4C using only ICU4C library functions. For more details see
+the Memory Usage section in the [Coding Guidelines](dev/codingguidelines.md).
+
+### ICU Initialization and Termination
+
+The ICU library does not normally require any explicit initialization prior to
+use. An application begins use simply by calling any ICU API in the usual way.
+(There is one exception to this, described below.)
+
+In C++ programs, ICU objects and APIs may safely be used during static
+initialization of other application-defined classes or objects. There are no
+order-of-initialization problems between ICU and static objects from other
+libraries because ICU does not rely on C++ static object initialization for its
+normal operation.
+
+When an application is terminating, it may optionally call the function
+`u_cleanup(void)`, which will free any heap storage that has been allocated and
+held by the ICU library. The main benefit of `u_cleanup()` occurs when using
+memory leak checking tools while debugging or testing an application. Without
+`u_cleanup()`, memory being held by the ICU library will be reported as leaks.
+
+(For some platforms, the configure option `--enable-auto-cleanup` (or
+defining the option `UCLN_NO_AUTO_CLEANUP` to 0) will add code which
+automatically cleans up ICU when its shared library is unloaded. See comments in
+`ucln_imp.h`)
+
+#### Initializing ICU in Multithreaded Environments
+
+There is one specialized case where extra care is needed to safely initialize
+ICU. This situation will arise only when ALL of the following conditions occur:
+
+1. The application main program is written in plain C, not C++.
+2. The application is multithreaded, with the first use of ICU within the
+   process possibly occurring simultaneously in more than one thread.
+3. The application will be run on a platform that does not handle C++ static
+   constructors from libraries when the main program is not in C++. Platforms
+   known to exhibit this behavior are Mac OS X and HP/UX. Platforms that handle
+   C++ libraries correctly include Windows, Linux and Solaris.
+
+To safely initialize the ICU library when all of the above conditions apply, the
+application must explicitly arrange for a first-use of ICU from a single thread
+before the multi-threaded use of ICU begins (see below for basic steps in safely
+initializing the ICU library). A convenient ICU operation for this purpose is
+`uloc_getDefault()` , declared in the header file `unicode/uloc.h`.
+
+#### Steps in Safely Initializing ICU in Single and Multi-threaded Environments
+
+1. If needed, certain data loading functions, such as `u_setCommonData()`,
+   `u_setAppData()`, and `u_setDataDirectory()`, must be called before any other
+   ICU function. In addition there are some other heap, mutex, and trace
+   functions, such as `u_setMemoryFunctions()` and `u_setMutexFunctions()`, which
+   also must be called during the initial and unused state of ICU.
+2. Next, `u_init()` can be called to ensure proper loading and initialization of
+   data that are required internally by various ICU functions. Explicit use of
+   this function is needed in a multi-threaded application by the main thread.
+   Each subsequent thread does not need to call `u_init()` again after the main
+   thread has successfully executed this function. In a single threaded
+   program, calls to this function is not needed but
+   recommended.
+3. After the successful initialization of ICU, normal use of ICU, whether using
+   multiple threads or just a single one, is permitted.
+4. When the application is done using ICU, the individual threads must cease
+   all ICU services leaving only the main thread.
+5. After all but the main thread have released ICU, `u_cleanup()` can be called.
+   The releasing of the individual threads to ICU is necessary because
+   `u_cleanup()` is not thread safe. In addition, all ICU items, including
+   collators, resource bundles, and converters, must be closed before calling
+   this function. `u_cleanup()` will free/delete all memory owned by the ICU
+   libraries returning them to their original load state. Generally, this
+   function should be called only once just before an application exits.
+   However, applications needing to dynamically load and unload the ICU
+   libraries can call this function just before the library unloads.
+   `u_cleanup()` also clears any ICU heap functions, mutex functions, or trace
+   functions that may haven been set for the process. If ICU is to be
+   reinitialized after calling `u_cleanup()`, these runtime override functions
+   will need to be setup again if they are still required. Great care needs to
+   be exercised when using `u_cleanup()` and should only be implemented by those
+   who know what they are doing. In any event, if the application doesn't exit
+   and requires ICU again after correctly calling `u_cleanup()`, go back to step
+   (1).
+
+### Error Handling
+
+In order for ICU to maximize portability, this version includes only the subset
+of the C++ language that compile correctly on older C++ compilers and provide a
+usable C interface. Thus, there is no use of the C++ exception mechanism in the
+code or Application Programming Interface (API).
+
+To communicate errors reliably and support multi-threading, this version uses an
+error code parameter mechanism. Every function that can fail takes an error-code
+parameter by reference. This parameter is always the last parameter listed for
+the function.
+
+The `UErrorCode` parameter is defined as an enumerated type. Zero represents no
+error, positive values represent errors, and negative values represent non-error
+status codes. Macros (`U_SUCCESS` and `U_FAILURE`) are provided to check the
+error code.
+
+The `UErrorCode` parameter is an input-output function. Every function tests the
+error code before performing any other task and immediately exits if it produces
+a FAILURE error code. If the function fails later on, it sets the error code
+appropriately and exits without performing any other work, except for any
+cleanup it needs to do. If the function encounters a non-error condition that it
+wants to signal, such as "encountered an unmapped character" in conversion, the
+function sets the error code appropriately and continues. Otherwise, the
+function leaves the error code unchanged.
+
+Generally, only the functions that do not take a `UErrorCode` parameter, but
+call functions that do, must declare a variable. Almost all functions that take
+a `UErrorCode` parameter, and also call other functions that do, merely have to
+propagate the error code that they were passed to the functions they call.
+Functions that declare a new `UErrorCode` parameter must initialize it to
+`U_ZERO_ERROR` before calling any other functions.
+
+ICU enables you to call several functions (that take error codes) successively
+without having to check the error code after each function. Each function
+usually must check the error code before doing any other processing, since it is
+supposed to stop immediately after receiving an error code. Propagating the
+error-code parameter down the call chain saves the programmer from having to
+declare the parameter in every instance and also mimics the C++ exception
+protocol more closely.
+
+### Extensibility
+
+There are 3 major extensibility elements in ICU:
+
+1. **Data Extensibility**:
+   The user installs new locales or conversion data to enhance the existing ICU
+   support. For more details, refer to the package tool (:construction: **TODO**: need link)
+   chapter for more information.
+2. **Code Extensibility**:
+   The classes, data, and design are fully extensible. Examples of this
+   extensibility include the BreakIterator , RuleBasedBreakIterator and
+   DictionaryBasedBreakIterator classes.
+3. **Error Handling Extensibility**:
+   There are mechanisms available to enhance the built-in error handling when
+   it is necessary. For example, you can design and create your own conversion
+   callback functions when an error occurs. Refer to the
+   [Conversion](conversion/index.md) chapter callback section for more
+   information.
+
+### Resource Bundle Inheritance Model
+
+A resource bundle is a set of \<key,value> pairs that provide a mapping from key
+to value. A given program can have different sets of resource bundles; one set
+for error messages, one for menus, and so on. However, the program may be
+organized to combine all of its resource bundles into a single related set.
+
+The set is organized into a tree with "root" at the top, the language at the
+first level, the country at the second level, and additional variants below
+these levels. The set must contain a root that has all keys that can be used by
+the program accessing the resource bundles.
+
+Except for the root, each resource bundle has an immediate parent. For example,
+if there is a resource bundle `X_Y_Z`, then there must be the resource bundles:
+`X_Y`, and `X`. Each child resource bundle can omit any \<key,value> pair that is
+identical to its parent's pair. (Such omission is strongly encouraged as it
+reduces data size and maintenance effort). It must override any \<key,value> pair
+that is different from its parent's pair. If you have a resource bundle for the
+locale ID `language_country_variant`, you must also have
+a bundle for the ID `language_country` and one for the ID `language`.
+
+If a program doesn't find a key in a child resource bundle, it can be assumed
+that it has the same key as the parent. The default locale has no effect on
+this. The particular language used for the root is commonly English, but it
+depends on the developer's preference. Ideally, the language should contain
+values that minimize the need for its children to override it.
+
+The default locale is used only when there is not a resource bundle for a given
+language. For example, there may not be an Italian resource bundle. (This is
+very different than the case where there is an Italian resource bundle that is
+missing a particular key.) When a resource bundle is missing, ICU uses the
+parent unless that parent is the root. The root is an exception because the root
+language may be completely different than its children. In this case, ICU uses a
+modified lookup and the default locale. The following are different lookup
+methods available:
+
+**Lookup chain** : Searching for a resource bundle.
+
+1. `en_US_<some-variant>`
+2. `en_US`
+3. `en`
+4. `<defaultLang>_<defaultCountry>`
+5. `<defaultLang>`
+6. `root`
+
+**Lookup chain** : Searching for a \<key, value> pair after
+`en_US_<some-variant>` has ben loaded. ICU does not use the default locale in
+this case.
+
+1. `en_US_<some-variant>`
+2. `en_US`
+3. `en`
+4. `root`
+
+## Other ICU Design Principles
+
+ICU supports extensive version code and data changes and introduces namespace
+usage.
+
+### Version Numbers in ICU
+
+Version changes show clients when parts of ICU change. ICU; its components (such
+as Collator); each resource bundle, including all the locale data resource
+bundles; and individual tagged items within a resource bundle, have their own
+version numbers. Version numbers numerically and lexically increase as changes
+are made.
+
+All version numbers are used in Application Programming Interfaces (APIs) with a
+`UVersionInfo` structure. The `UVersionInfo` structure is an array of four
+unsigned bytes. These bytes are:
+
+1. Major version number
+2. Minor version number
+3. Milli version number
+4. Micro version number
+
+Two `UVersionInfo` structures may be compared using binary comparison (`memcmp`)
+to see which is larger or newer. Version numbers may be different for different
+services. For instance, do not compare the ICU library version number to the ICU
+collator version number.
+
+`UVersionInfo` structures can be converted to and from string representations as
+dotted integers (such as "1.4.5.0") using the `u_versionToString()` and
+`u_versionFromString()` functions. String representations may omit trailing zeros.
+
+The interpretation of version numbers depends on what is being described.
+
+#### ICU Release Version Number (ICU 49 and later)
+
+The first version number field contains the ICU release version number, for
+example 49. Each new version might contain new features, new locale data, and
+modified behavior. (See below for more information on
+[ICU Binary Compatibility](###icu-binary-compatibility).)
+
+The second field is 1 for the initial release (e.g., 49.1). The second and
+sometimes third fields are incremented for binary compatible maintenance
+releases.
+
+* For maintenance releases for only either C or J, the third field is
+  incremented (e.g., ICU4C 49.1.1).
+* For shared updates for C & J, the second field is incremented to 2 and
+  higher (e.g., ICU4C & ICU4J 49.2).
+
+(The second field is 0 during development, with milestone numbers in the third
+field during that time. For example, 49.0.1 for 49 milestone 1.)
+
+#### ICU Release Version Number (ICU 1.4 to ICU 4.8)
+
+In earlier releases, the first two version fields together indicated the ICU
+release, for example 4.8. The third field was 0 for the initial release, and 1
+and higher for binary compatible (bug fixes only) maintenance releases (e.g.,
+4.8.1). The fourth field was used for updates specific to only one of Java, C++,
+or ICU-in-Eclipse.
+
+The second version field was *even* for formal releases ("reference releases")
+(e.g., 1.6 or 4.8) and *odd* during their development (unreleased unstable
+snapshot versions; e.g., 4.7). During development, the third field contained the
+milestone number (e.g., 4.7.1 for 4.8 milestone 1). For very old ICU code, we
+published semi-formal “enhancement” releases with odd second-field numbers
+(e.g., 1.7).
+
+Library filenames and some other internal uses already used a concatenation of
+the first two fields ("48" for 4.8).
+
+Resource Bundles and Elements
+
+The data stored in resource bundles is tagged with version numbers. A resource
+bundle can contain a tagged string named "Version" that declares the version
+number in dotted-integer format. For example,
+
+```Text
+en {
+Version { "1.0.3.5" }
+...
+}
+```
+
+A resource bundle may omit the "version" element and thus, will inherit a
+version along the usual chain. For example, if the resource bundle **en_US**
+contained no "version" element, it would inherit "1.0.3.5" from the parent en
+element. If inheritance passes all the way to the root resource bundle and it
+contains no "version" resource, then the resource bundle receives the default
+version number 0.
+
+Elements within a resource bundle may also contain version numbers. For example:
+
+```Text
+be {
+    CollationElements {
+        Version { "1.0.0.0" }
+        ...
+    }
+}
+```
+
+In this example, the CollationElements data is version 1.0.0.0. This element
+version is not related to the version of the bundle.
+
+#### Internal version numbers
+
+Internally, data files carry format and other version numbers. These version
+numbers ensure that ICU can use the data file. The interpretation depends
+entirely on the data file type. Often, the major number in the format version
+stays the same for backwards-compatible changes to a data file format. The minor
+format version number is incremented for additions that do not violate the
+backwards compatibility of the data file.
+
+#### Component Version Numbers
+
+ICU component version numbers may be found using:
+
+1. `u_getVersion()` returns the version number of ICU as a whole in C++. In C,
+   `ucol_getVersion()` returns the version number of ICU as a whole.
+2. `ures_getVersion()` and `ResourceBundle::getVersion()` return the version
+   number of a ResourceBundle. This is a data version number for the bundle as a
+   whole and subject to inheritance.
+3. `u_getUnicodeVersion()` and `Unicode::getUnicodeVersion()` return the version
+   number of the Unicode character data that underlies ICU. This version
+   reflects the numbering of the Unicode releases. See
+   <http://www.unicode.org/> for more information.
+4. `Collator::getVersion()` in C++ and `ucol_getVersion()` in C return the version
+   number of the Collator. This is a code version number for the collation code
+   and algorithm. It is a combination of version numbers for the collation
+   implementation, the Unicode Collation Algorithm data (which is the data that
+   is used for characters that are not mentioned in a locale's specific
+   collation elements), and the collation elements.
+
+#### Configuration and Management
+
+A major new feature in ICU 2.0 is the ability to link to different versions of
+ICU with the same program. Using this new feature, a program can keep using ICU
+1.8 collation, for example, while using ICU 2.0 for other services. ICU now can
+also be unloaded if needed, to free up resources, and then reloaded when it is
+needed.
+
+### Namespace in C++
+
+ICU 2.0 introduced the use of a C++ namespace to avoid naming collision between
+ICU exported symbols and other libraries. All the public ICU C++ classes are
+defined in the "icu_VersionNumber::" namespace, which is also aliased as
+namespace "icu". Starting with ICU 2.0, including any public ICU C++ header by
+default includes a "using namespace icu_VersionNumber" statement. This is for
+backward compatibility, and should be turned off in favor of explicitly using
+`icu::UnicodeString` etc. (see [How To Use ICU](howtouseicu.md)). (If entry point
+renaming is turned off, then only the unversioned "icu" namespace is used.)
+
+Starting with ICU 49, ICU4C requires namespace support.
+
+### Library Dependencies (C++)
+
+It is sometimes useful to see a dependency chart between the public ICU APIs and
+ICU libraries. This chart can be useful to people that are new to ICU or to
+people that want only certain ICU libraries.
+
+> :construction: **TODO**: The dependency chart is currently not available.
+
+Here are some things to realize about the chart.
+
+1. It gives a general overview of the ICU library dependencies.
+2. Internal dependencies, like the mutex API, are left out for clarity.
+3. Similar APIs were lumped together for clarity (e.g. Formatting). Some of
+   these dependency details can be viewed from the ICU API reference.
+4. The descriptions of each API can be found in our [ICU API
+   reference](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/)
+
+### Code Dependencies (C++)
+
+Starting with ICU 49, the dependencies of code files (.o files compiled from
+.c/.cpp) are documented in
+[source/test/depstest/dependencies.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/depstest/dependencies.txt).
+Adjacent Python code is used to parse this file and to
+[verify](http://site.icu-project.org/processes/release/tasks/healthy-code#TOC-Check-library-dependencies)
+that it matches the actual dependencies of the code files.
+
+The dependency list can be used to build subset libraries. In addition, by
+reducing intra-library dependencies, the code size of statically linked ICU code
+has been reduced.
+
+### ICU API categories
+
+ICU APIs, as defined in header and class files, are either "external" or
+"internal". External APIs are meant to be used by applications, while internal
+APIs should be used only within ICU. APIs are marked to indicate whether they
+are external or internal, as follows. Every external API has a lifecycle label,
+see below.
+
+#### External ICU4C APIs
+
+External ICU4C APIs are
+
+1. declared in header files in unicode folders and exported at build/install
+   time to an `include/unicode` folder
+2. when C++ class members, are `public` or `protected`
+3. do not have an `@internal` label
+
+Exception: Layout engine header files are not in a unicode folder, although the
+public ones are still copied to the `include/unicode` folder at build/install
+time. External layout engine APIs are the ones that have lifecycle labels and
+not an `@internal` label.
+
+#### External ICU4J APIs
+
+External ICU4J APIs are
+
+1. declared in one of the ICU4J core packages (`com.ibm.icu.lang`,
+   `com.ibm.icu.math`, `com.ibm.icu.text`, or `com.ibm.icu.util`).
+2. `public` or `protected` class members
+3. `public` or `protected` contained classes
+4. do not have an `@internal` label
+
+#### "System" APIs
+
+"System" APIs are external APIs that are intended only for special uses for
+system-level code, for example `u_cleanup()`. Normal users should not use them,
+although they are public and supported. System APIs have a `@system` label
+in addition to the lifecycle label that all external APIs have (see below).
+
+#### Internal APIs
+
+All APIs that do not fit any of the descriptions above are internal, which means
+that they are for ICU internal use only and may change at any time without
+notice. Some of them are member functions of public C++ or Java classes, and are
+"technically public but logistically internal" for implementation reasons;
+typically because programming languages don't provide sufficiently access
+control (without clumsy mechanisms). In this case, such APIs have an
+`@internal` label.
+
+### ICU API compatibility
+
+As ICU develops, it adds external APIs - functions, classes, constants, and so
+on. Occasionally it is also necessary to remove or change external APIs. In
+order to make this work, we use the following process:
+
+For all API changes (and for significant/controversial/difficult implementation
+changes), we use proposals to announce and discuss them. A proposal is simply an
+email to the icu-design mailing list that details what is proposed to be
+changed, with an expiration date of typically a week. This gives all mailing
+list members a chance to review upcoming changes, and to discuss them. A
+proposal often changes significantly as a result of discussion. Most proposals
+will eventually find consensus among list members; otherwise, the ICU-TC decides
+what to do. If the addition or change of APIs would affect you, please subscribe
+to the main [icu-design mailing list](http://icu-project.org/contacts.html) .
+
+When a **new API** is added to ICU, it **is marked as draft with a `@draft ICU
+x.y` label in the API documentation, **where x.y is the ICU version when the
+API *signature* was introduced or last changed**. A draft API is not guaranteed
+to be stable! Although we will not make gratuitous changes, sometimes the draft
+APIs turns out to be unsatisfactory in actual practice and may need to be
+changed or even removed. Changes of "draft" API are subject to the proposal
+process described above.
+
+**When a `@draft ICU x.y` API is changed, it must remain `@draft` and its version
+number must be updated.**
+
+In ICU4J 3.4.2 and earlier, `@draft` APIs were also marked with Java's `@deprecated`
+tag, so that uses of draft APIs in client code would be flagged by the compiler.
+These uses of the `@deprecated` tag were indicated with the comment “This is a
+draft API and might change in a future release of ICU.” Many clients found this
+confusing and/or undesireable, so ICU4J 3.4.3 no longer marks draft APIs with
+the `@deprecated` tag by default. For clients who prefer the earlier behavior,
+ICU4J provides an ant build target, `restoreDeprecated`, which will update the
+source files to use the `@deprecated` tag. Then clients can just rebuild the ICU4J
+jar as usual.
+
+When an API is judged to be stable and has not been changed for at least one ICU
+release, it is relabeled as stable with a `@stable ICU x.y**` label in the API
+documentation. A stable API is expected to be available in this form for a long
+time. The ICU version **x.y** indicates the last time the API *signature* was
+introduced or changed. **The promotion from `@draft ICU x.y` to `@stable ICU x.y`
+must not change the x.y version number.**
+
+We occasionally make an exception and allow adding new APIs marked as
+`@stable ICU x.y` APIs in the x.y release itself if we believe that they have to
+be stable. We might do this for enum constants that reflect 1:1 Unicode property
+aliases and property value aliases, for a Unicode upgrade in the x.y release.
+
+We sometimes **"broaden" a `@stable`** API function by changing its signature
+in a compatible way. For example, in Java, we might change an input parameter
+from a `String` to a `CharSequence`. In this case we keep the `@stable` but
+update the ICU version number indicating the function signature change.
+
+Even a stable API may eventually need to become deprecated or obsolete. Such
+APIs are strongly discouraged from use. Typically, an improved API is introduced
+at the time of deprecation/obsolescence of the old one.
+
+1. Use of deprecated APIs is strongly discouraged, but they are retained for
+   backward compatibility. These are marked with labels like
+   `@deprecated ICU x.y Use u_abc() instead.`. **The ICU version x.y shows the
+   ICU release in which the API was first declared "deprecated".**
+2. In ICU4J, starting with release 57, a custom Javadoc tag `@discouraged`
+   was added. While similar to `@deprecated` it is used when either ICU wants
+   to discourage a particular API from use but the JDK hasn't deprecated it or
+   ICU needs to keep it for compatibility reasons. These are marked with labels
+   like `@discouraged ICU x.y. Use u_abc() instead.`.
+3. Obsolete APIs are are those whose continued retention will cause severe
+   conflicts or user error, or whose continued support would be a very
+   significant maintenance burden. We make every effort to keep these to a
+   minimum. Obsolete APIs are marked with labels like `@obsolete ICU x.y. Use
+   u_abc() instead since this API will be removed in that release.`.
+   **The x.y indicates that we plan to remove it in ICU version x.y.**
+
+Stable C or Java APIs will not be obsoleted because doing so would break
+forward binary compatibility of the ICU library. Stable APIs may be
+deprecated, but they will be retained in the library.
+
+An "obsolete" API will remain unchanged until it is removed in the indicated
+ICU release, which will be usually one year after the API was declared
+obsolete. Sometimes we still keep it available for some time via a
+compile-time switch but stop maintaining it. In rare occasions, an API must
+be replaced right away because of naming conflicts or severe defects; in
+such cases we provide compile-time switches (`#ifdef` or other mechanisms) to
+select the old API.
+
+For example, here is how an API might be tagged in various versions:
+
+* **In ICU 0.2**: The API is newly introduced as a draft in this release.
+
+  ```Text
+  @draft ICU 0.2
+  f(x)
+  ```
+
+* **In ICU 0.4**: The draft version number is updated, because the signature
+  changed.
+
+  ```Text
+  @draft ICU 0.4
+  f(x, y)
+  ```
+
+* **In ICU 0.6**: The API is promoted from draft to stable, but the version
+  number does not change, as the signature is the same.
+
+  ```Text
+  @stable ICU 0.4
+  f(x, y)
+  ```
+
+* **In ICU 1.0**: The API is "broadened" in a compatible way. For example,
+  changing an input parameter from char to int or from some class to a base
+  class. The signature is changed (so we update the ICU version number), but old
+  calling code continues to work unchanged (so we retain @stable if that's what
+  it was.)
+
+  ```Text
+  @stable ICU 1.0
+  f(xbase, y)
+  ```
+
+* **In ICU 1.2**: The API is demoted to deprecated (or obsolete) status.
+
+  ```Text
+  @deprecated ICU 1.2 Use g(x,y,z) instead.
+  f(xbase, y)
+  ```
+
+  or, when this API is planned to be removed in ICU 1.4:
+
+  ```Text
+  @obsolete ICU 1.4. Use g(x,y,z) instead.
+  f(xbase, y)
+  ```
+
+### ICU Binary Compatibility
+
+ICU4C may be configured for use as a system library in an environment where
+applications that are built with one version of ICU must continue to run without
+change with later versions of the ICU shared library.
+
+Here are the requirements for enabling binary compatibility for ICU4C:
+
+1. Applications must use only APIs that are marked as stable.
+2. Applications must use only plain C APIs, never C++.
+3. ICU must be built with function renaming disabled.
+4. Applications must be built using an ICU that was configured for binary
+   compatibility.
+5. Use ICU version 3.0 or later.
+
+**Stable APIs Only.** APIs in the ICU library that are tagged as being stable
+will be maintained in future versions of the library. Stable functions will
+continue to exist with the same signature and the same meaning, allowing
+applications to continue to work without change.
+
+Stable APIs do not guarantee that the results from every function will always be
+completely identical between ICU versions (see the
+[Version Numbers in ICU](#version-numbers-in-icu) section above). Bugs may be
+fixed. The Unicode character data may change with new versions of the Unicode
+standard. Locale data may be updated or changed, yielding different results for
+operations like formatting or collation. Applications that require exact
+bit-for-bit, bug-for-bug compatibility of ICU results should not rely on ICU
+release-to-release binary compatibility, but should instead link against a
+specific version of ICU.
+
+To verify that an application uses only stable APIs, build it with the C
+preprocessor symbols `U_HIDE_DRAFT_API` and `U_HIDE_DEPRECATED_API` defined. This
+will produce build errors if any draft, deprecated or obsolete APIs are used. An
+operating system level installation of ICU may set this option permanently.
+
+**C APIs only.** Only plain C APIs remain compatible across ICU releases. The
+reason C++ binary compatibility is not supported is primarily because the design
+of C++ language and runtime environments present extreme technical difficulties
+to doing so. Stable C++ APIs are *source* compatible, but applications using
+them must be recompiled when moving between ICU releases.
+
+**Function renaming disabled.** Function renaming is an ICU feature that allows
+an application to explicitly link against a specific version of the ICU library,
+and to continue to use that version even when other ICU versions exist in the
+runtime environment. This is the exact opposite of release-to-release binary
+compatibility – instead of being able to transparently change ICU versions, an
+application is explicitly tied to one specific version.
+
+Function renaming is enabled by default, and must be disabled at ICU build time
+to enable release to release binary compatibility. To disable renaming, use the
+configure option
+
+```Shell
+configure -–disable-renaming [other configure options]
+```
+
+(Configure options may also be passed to the runConfigureICU script.)
+
+To enable release-to-release binary compatibility, ICU must be built with
+`--disable-renaming`, *and* applications must be built using the headers and
+libraries that resulted from the `–-disable-renaming` ICU build
+
+**ICU Version 3.0 or Later.** Binary compatibility of ICU releases is supported
+beginning with ICU version 3.0. Older versions of ICU (2.8 and earlier) do not
+provide for binary compatibility between versions.
+
+#### Linking against multiple versions of ICU4C
+
+This section is intended to aid software developers who are implementing or
+integrating solutions based on ICU, that may need to consider having multiple
+versions of ICU running within the same executable (address space) at once.
+Typically, users of ICU are encouraged to update to the latest stable version.
+Under certain circumstances, however, behavior from earlier versions is desired,
+or else, an application is linking together code which is already built against
+a different version of ICU.
+
+The major and minor numbers are the first and second numbers in a version
+number, separated by a period. For example, in the version numbers 3.4.2.1,
+3.4.2, or 3.4, "3" is the major, and "4" is the minor. Normally, ICU employs
+"symbol renaming", such that the C function names and C++ object names are
+`#defined` to contain the major and minor numbers. So, for example, if your
+application calls the function `ucnv_open()`, it will link against
+`ucnv_open_3_4` if compiled against ICU 3.4, 3.4.2, or even 3.4.2.1. However, if
+compiled against ICU 3.8, the same code will link against `ucnv_open_3_8`.
+Similarly, `UnicodeString` is renamed to `UnicodeString_3_4`, etc. This is normally
+transparent to the user, however, if you inspect the symbols of the library or
+your code, you will see the modified symbols.
+
+If there are multiple versions of ICU being linked against in one application,
+it will need to link against all relevant libraries for each version, for
+example, common, i18n, and data. ICU uses standard library renaming, where, for
+example, `libicuuc.so` on one platform will actually be a symbolic link to
+`libicuuc.so.3.4`. When multiple ICU versions are used, the application may need
+to explicitly link against the exact versions of ICU being used.
+
+To disable renaming, build ICU with `--disable-renaming` passed to configure.
+Or, set the equivalent `#define U_DISABLE_RENAMING 1`. Renaming must be disabled
+both in the ICU build, and in the calling application.
+
+### ICU Data Compatibility
+
+Starting in ICU 3.8 and later, the data library that comes with ICU is binary
+compatible and structurally compatible with versions of ICU with the same major
+and minor version, or a maintenance release. This allows multiple maintenance
+releases of ICU to share the same data, but generally the latest maintenance
+release of the data should be used.
+
+The binary compatibility of the data refers to the resource bundle binary format
+that is contains the locale data, charset conversion tables and other file
+formats supported by ICU. These binary formats are readable by many versions of
+ICU. For example, resource bundles written with ICU 3.6 are readable by ICU 3.8.
+
+The structural compatibility of the data refers to the structural contents of
+the ICU data. The structure of the locale data may change between reference
+releases, but the keys to reference specific types of data will be the same
+between maintenance releases. This means that resource keys to access data
+within resource bundles will work between maintenance releases of a specific
+reference release. For example, an ICU 3.8 calendar will be able to use ICU
+3.8.1 data, and vis versa; however ICU 3.6 may not be able to read ICU 3.8
+locale data. Generally, these keys are not accessible by ICU users because only
+the ICU implementation uses these resource keys.
+
+The contents of the data library may change between ICU maintenance releases and
+give you different results due to important updates and bug fixes. An example of
+an important update would be a timezone rule update for when a country changes
+when daylight saving time occurs. So the results may be different between
+maintenance releases.
+
+### ICU4J Serialization Compatibility
+
+Starting in ICU4J 3.6, ICU4J stable API classes (marked as `@stable`) implementing
+`java.io.Serializable` support serialized objects to be deserialized by ICU4J 3.6
+or newer version of ICU4J. Some classes perform only shallow serialization,
+therefore, it is not guaranteed that a deserialized object behaves exactly same
+with the original object across ICU4J versions. Also, when it is difficult to
+maintain serialization compatibility in a certain class across different ICU4J
+versions for technical or other reasons, the ICU project committee may approve
+the breakage. In such event, a note explaining the compatibility issue will be
+posted in the ICU public mailing lists and also documented in the release note
+of the new ICU4J version introducing the incompatibility.
diff --git a/docs/userguide/dev/codingguidelines.md b/docs/userguide/dev/codingguidelines.md
new file mode 100644
index 0000000..4d6e686
--- /dev/null
+++ b/docs/userguide/dev/codingguidelines.md
@@ -0,0 +1,2069 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Coding Guidelines
+
+## Overview
+
+This section provides the guidelines for developing C and C++ code, based on the
+coding conventions used by ICU programmers in the creation of the ICU library.
+
+## Details about ICU Error Codes
+
+When calling an ICU API function and an error code pointer (C) or reference
+(C++), a `UErrorCode` variable is often passed in. This variable is allocated by
+the caller and must pass the test `U_SUCCESS()` before the function call.
+Otherwise, the function will not work. Normally, an error code variable is
+initialized by `U_ZERO_ERROR`.
+
+`UErrorCode` is passed around and used this way, instead of using C++ exceptions
+for the following reasons:
+
+* It is useful in the same form for C also
+* Some C++ compilers do not support exceptions
+
+> :point_right: **Note**: This error code mechanism, in fact, works similar to
+exceptions. If users call several ICU functions in a sequence, as soon as one
+sets a failure code, the functions in the following example will not work. This
+procedure prevents the API function from processing data that is not valid in
+the sequence of function calls and relieves the caller from checking the error
+code after each call. It is somewhat similar to how an exception terminates a
+function block or try block early.*
+
+The following code shows the inside of an ICU function implementation:
+
+```C++
+U_CAPI const UBiDiLevel * U_EXPORT2
+ubidi_getLevels(UBiDi *pBiDi, UErrorCode *pErrorCode) {
+    int32_t start, length;
+
+    if(U_FAILURE(*pErrorCode)) {
+        return NULL;
+    } else if(pBiDi==NULL || (length=pBiDi->length)<=0) {
+        *pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
+        return NULL;
+    }
+
+    ...
+    return result;
+}
+```
+
+Note: We have decided that we do not want to test for `pErrorCode==NULL`. Some
+existing code does this, but new code should not.
+
+Note: *Callers* (as opposed to implementers) of ICU APIs can simplify their code
+by defining and using a subclass of `icu::ErrorCode`. ICU implementers can use the
+`IcuTestErrorCode` class in intltest code.
+
+It is not necessary to check for `U_FAILURE()` immediately before calling a
+function that takes a `UErrorCode` parameter, because that function is supposed to
+check for failure. Exception: If the failure comes from objection allocation or
+creation, then you probably have a `NULL` object pointer and must not call any
+method on that object, not even one with a `UErrorCode` parameter.
+
+### Sample Function with Error Checking
+
+```C++
+    U_CAPI int32_t U_EXPORT2
+    uplrules_select(const UPluralRules *uplrules,   // Do not check
+                                                    // "this"/uplrules vs. NULL.
+                    double number,
+                    UChar *keyword, int32_t capacity,
+                    UErrorCode *status)             // Do not check status!=NULL.
+    {
+        if (U_FAILURE(*status)) {                   // Do check for U_FAILURE()
+                                                    // before setting *status
+            return 0;                               // or calling UErrorCode-less
+                                                    // select(number).
+        }
+        if (keyword == NULL ? capacity != 0 : capacity < 0) {
+                                                    // Standard destination buffer
+                                                    // checks.
+            *status = U_ILLEGAL_ARGUMENT_ERROR;
+            return 0;
+        }
+        UnicodeString result = ((PluralRules*)uplrules)->select(number);
+        return result.extract(keyword, capacity, *status);
+    }
+```
+
+### New API Functions
+
+If the API function is non-const, then it should have a `UErrorCode` parameter.
+(Not the other way around: Some const functions may need a `UErrorCode` as well.)
+
+Default C++ assignment operators and copy constructors should not be used (they
+should be declared private and not implemented). Instead, define an `assign(Class
+&other, UErrorCode &errorCode)` function. Normal constructors are fine, and
+should have a UErrorCode parameter.
+
+### Warning Codes
+
+Some `UErrorCode` values do not indicate a failure but an additional informational
+return value. Their enum constants have the `_WARNING` suffix and they pass the
+`U_SUCCESS()` test.
+
+However, experience has shown that they are problematic: They can get lost
+easily because subsequent function calls may set their own "warning" codes or
+may reset a `UErrorCode` to `U_ZERO_ERROR`.
+
+The source of the problem is that the `UErrorCode` mechanism is designed to mimic
+C++/Java exceptions. It prevents ICU function execution after a failure code is
+set, but like exceptions it does not work well for non-failure information
+passing.
+
+Therefore, we recommend to use warning codes very carefully:
+
+* Try not to rely on any warning codes.
+* Use real APIs to get the same information if possible.
+  For example, when a string is completely written but cannot be
+  NUL-terminated, then `U_STRING_NOT_TERMINATED_WARNING` indicates this, but so
+  does the returned destination string length (which will have the same value
+  as the destination capacity in this case). Checking the string length is
+  safer than checking the warning code. (It is even safer to not rely on
+  NUL-terminated strings but to use the length.)
+* If warning codes must be used, then the best is to set the `UErrorCode` to
+  `U_ZERO_ERROR` immediately before calling the function in question, and to
+  check for the expected warning code immediately after the function returns.
+
+Future versions of ICU will not introduce new warning codes, and will provide
+real API replacements for all existing warning codes.
+
+### Bogus Objects
+
+Some objects, for example `UnicodeString` and `UnicodeSet`, can become "bogus". This
+is used when methods that create or modify the object fail (mostly due to an
+out-of-memory condition) but do not take a `UErrorCode` parameter and can
+therefore not otherwise report the failure.
+
+* A bogus object appears as empty.
+* A bogus object cannot be modified except with assignment-like functions.
+* The bogus state of one object does not transfer to another. For example,
+  adding a bogus `UnicodeString` to a `UnicodeSet` does not make the set bogus.
+  (It would be hard to make propagation consistent and test it well. Also,
+  propagation among bogus states and error codes would be messy.)
+* If a bogus object is passed into a function that does have a `UErrorCode`
+  parameter, then the function should set the `U_ILLEGAL_ARGUMENT_ERROR` code.
+
+## API Documentation
+
+"API" means any public class, function, or constant.
+
+### API status tag
+
+Aside from documenting an API's functionality, parameters, return values etc. we
+also mark every API with whether it is `@draft`, `@stable`, `@deprecated` or
+`@internal`. (Where `@internal` is used when something is not actually supported
+API but needs to be physically public anyway.) A new API is usually marked with
+"`@draft ICU 4.8`". For details of how we mark APIs see the "ICU API
+compatibility" section of the [ICU Architectural Design](../design.md) page. In
+Java, also see existing @draft APIs for complete examples.
+
+Functions that override a base class or interface definition take the API status
+of the base class function. For C++, use the `@copydoc base::function()` tag to
+copy both the description and the API status from the base function definition.
+For Java methods the status tags must be added by hand; use the `{@inheritDoc}`
+JavaDoc tag to pick up the rest of the base function documentation.
+Documentation should not be manually replicated in overriding functions; it is
+too hard to keep multiple copies synchronized.
+
+The policy for the treatment of status tags in overriding functions was
+introduced with ICU 64 for C++, and with ICU 59 for Java. Earlier code may
+deviate.
+
+### Coding Example
+
+Coding examples help users to understand the usage of each API. Whenever
+possible, it is encouraged to embed a code snippet illustrating the usage of an
+API along with the functional specification.
+
+#### Embedding Coding Examples in ICU4J - JCite
+
+Since ICU4J 49M2, the ICU4J ant build target "doc" utilizes an external tool
+called [JCite](https://arrenbrecht.ch/jcite/). The tool allows us to cite a
+fragment of existing source code into JavaDoc comment using a tag. To embed a
+code snippet with the tag. For example,
+`{@.jcite com.ibm.icu.samples.util.timezone.BasicTimeZoneExample:---getNextTransitionExample}`
+will be replaced a fragment of code marked by comment lines
+`// ---getNextTransisionExample` in `BasicTimeZoneExample.java` in package
+`com.ibm.icu.samples.util.timezone`. When embedding code snippet using JCite, we
+recommend to follow next guidelines
+
+* A sample code should be placed in `<icu4j_root>/samples/src` directory,
+  although you can cite any source fragment from source files in
+  `<icu4j_root>/demos/src`, `<icu4j_root\>/main/core/*/src`,
+  `<icu4j_root>/main/test/*/src`.
+* A sample code should use package name -
+  `com.ibm.icu.samples.<subpackage>.<facility>`. `<subpackage>` is corresponding
+  to the target ICU API class's package, that is, one of lang/math/text/util.
+  `<facility>` is a name of facility, which is usually the base class of the
+  service. For example, use package `com.ibm.icu.samples.text.dateformat` for
+  samples related to ICU's date format service,
+  `com.ibm.icu.samples.util.timezone` for samples related to time zone service.
+* A sample code should be self-contained as much as possible (use only JDK and
+  ICU public APIs if possible). This allows readers to cut & paste a code
+  snippet to try it out easily.
+* The citing comment should start with three consecutive hyphen followed by
+  lower camel case token - for example, "`// ---compareToExample`"
+* Keep in mind that the JCite tag `{@.jcite ...}` is not resolved without JCite.
+  It is encouraged to avoid placing code snippet within a sentence. Instead,
+  you should place a code snippet using JCite in an independent paragraph.
+
+#### Embedding Coding Examples in ICU4C
+
+Also since ICU4C 49M2, ICU4C docs (using the [\\snippet command](http://www.doxygen.nl/manual/commands.html#cmdsnippet)
+which is new in Doxygen 1.7.5) can cite a fragment of existing sample or test code.
+
+Example in `ucnv.h`:
+
+```C++
+     /**
+      * \snippet samples/ucnv/convsamp.cpp ucnv_open
+      */
+     ucnv_open( ... ) ...
+```
+
+This cites code in icu4c/source/samples/ucnv/convsamp.cpp as follows:
+
+```C++
+      //! [ucnv_open]
+      conv = ucnv_open("koi8-r", &status);
+      //! [ucnv_open]
+```
+
+Notice the tag "`ucnv_open`" which must be the same in all three places (in
+the header file, and twice in the cited file).
+
+## C and C++ Coding Conventions Overview
+
+The ICU group uses the following coding guidelines to create software using the
+ICU C++ classes and methods as well as the ICU C methods.
+
+### C/C++ Hiding Un-@stable APIs
+
+In C/C++, we enclose `@draft` and such APIs with `#ifndef U_HIDE_DRAFT_API` or
+similar as appropriate. When a draft API becomes stable, we need to remove the
+surrounding #ifndef.
+
+Note: The `@system` tag is *in addition to* the
+`@draft`/`@stable`/`@deprecated`/`@obsolete` status tag.
+
+Copy/paste the appropriate #ifndef..#endif pair from the following:
+
+```C++
+#ifndef U_HIDE_DRAFT_API
+#endif  // U_HIDE_DRAFT_API
+
+#ifndef U_HIDE_DEPRECATED_API
+#endif  // U_HIDE_DEPRECATED_API
+
+#ifndef U_HIDE_OBSOLETE_API
+#endif  // U_HIDE_OBSOLETE_API
+
+#ifndef U_HIDE_SYSTEM_API
+#endif  // U_HIDE_SYSTEM_API
+
+#ifndef U_HIDE_INTERNAL_API
+#endif  // U_HIDE_INTERNAL_API
+```
+
+We `#ifndef` `@draft`/`@deprecated`/... APIs as much as possible, including C
+functions, many C++ class methods (see exceptions below), enum constants (see
+exceptions below), whole enums, whole classes, etc.
+
+We do not #ifndef APIs where that would be problematic:
+
+* struct/class members where that would modify the object layout (non-static
+  struct/class fields, virtual methods)
+* enum constants where that would modify the numeric values of following
+  constants
+* C++ class boilerplate (e.g., default/copy constructors) because otherwise
+  the compiler would create public ones
+* private class members
+* definitions in internal/test/tools header files (that would be pointless;
+  they should probably not have API tags in the first place)
+* forward or friend declarations
+* definitions that are needed for other definitions that would not be
+  `#ifndef`'ed (e.g., for public macros or private methods)
+* platform macros (mostly in `platform.h`/`umachine.h` & similar) and
+  user-configurable settings (mostly in `uconfig.h`)
+
+More handy copy-paste text:
+
+```C++
+    // Do not enclose the protected default constructor with #ifndef U_HIDE_INTERNAL_API
+    // or else the compiler will create a public default constructor.
+
+    // Do not enclose protected default/copy constructors with #ifndef U_HIDE_INTERNAL_API
+    // or else the compiler will create public ones.
+```
+
+### C and C++ Type and Format Convention Guidelines
+
+The following C and C++ type and format conventions are used to maximize
+portability across platforms and to provide consistency in the code:
+
+#### Constants (#define, enum items, const)
+
+Use uppercase letters for constants. For example, use `UBREAKITERATOR_DONE`,
+`UBIDI_DEFAULT_LTR`, `ULESS`.
+
+For new enum types (as opposed to new values added to existing types), do not
+define enum types in C++ style. Instead, define C-style enums with U... type
+prefix and `U_`/`UMODULE_` constants. Define such enum types outside the ICU
+namespace and outside any C++ class. Define them in C header files if there are
+appropriate ones.
+
+#### Variables and Functions
+
+Use mixed-case letters that start with a lowercase letter for variables and
+functions. For example, use `getLength()`.
+
+#### Types (class, struct, enum, union)
+
+Use mixed-case that start with an uppercase letter for types. For example, use
+class `DateFormatSymbols`.
+
+#### Function Style
+
+Use the `getProperty()` and `setProperty()` style for functions where a lowercase
+letter begins the first word and the second word is capitalized without a space
+between it and the first word. For example, `UnicodeString`
+`getSymbol(ENumberFormatSymbol symbol)`,
+`void setSymbol(ENumberFormatSymbol symbol, UnicodeString value)` and
+`getLength()`, `getSomethingAt(index/offset)`.
+
+#### Common Parameter Names
+
+In order to keep function parameter names consistent, the following are
+recommendations for names or suffixes (usual "Camel case" applies):
+
+* "start": the index (of the first of several code units) in a string or array
+* "limit": the index (of the **first code unit after** a specified range) in a
+  string or array (the number of units are (limit-start))
+* name the length (for the number of code units in a (range of a) string or
+  array) either "length" or "somePrefixLength"
+* name the capacity (for the number of code units available in an output
+  buffer) either "capacity" or "somePrefixCapacity"
+
+#### Order of Source/Destination Arguments
+
+Many ICU function signatures list source arguments before destination arguments,
+as is common in C++ and Java APIs. This is the preferred order for new APIs.
+(Example: `ucol_getSortKey(const UCollator *coll, const UChar *source,
+int32_t sourceLength, uint8_t *result, int32_t resultLength)`)
+
+Some ICU function signatures list destination arguments before source arguments,
+as is common in C standard library functions. This should be limited to
+functions that closely resemble such C standard library functions or closely
+related ICU functions. (Example: `u_strcpy(UChar *dst, const UChar *src)`)
+
+#### Order of Include File Includes
+
+Include system header files (like `<stdio.h>`) before ICU headers followed by
+application-specific ones. This assures that ICU headers can use existing
+definitions from system headers if both happen to define the same symbols. In
+ICU files, all used headers should be explicitly included, even if some of them
+already include others.
+
+Within a group of headers, place them in alphabetical order.
+
+#### Style for ICU Includes
+
+All ICU headers should be included using ""-style includes (like
+`"unicode/utypes.h"` or `"cmemory.h"`) in source files for the ICU library, tools,
+and tests.
+
+#### Pointer Conversions
+
+Do not cast pointers to integers or integers to pointers. Also, do not cast
+between data pointers and function pointers. This will not work on some
+compilers, especially with different sizes of such types. Exceptions are only
+possible in platform-specific code where the behavior is known.
+
+Please use C++-style casts, at least for pointers, for example `const_cast`.
+
+* For conversion between related types, for example from a base class to a
+  subclass (when you *know* that the object is of that type), use
+  `static_cast`. (When you are not sure if the object has the subclass type,
+  then use a `dynamic_cast`; see a later section about that.)
+* Also use `static_cast`, not `reinterpret_cast`, for conversion from `void *`
+  to a specific pointer type. (This is accepted and recommended because there
+  is an implicit conversion available for the opposite conversion.) See
+  [ICU-9434](https://unicode-org.atlassian.net/browse/ICU-9434) for details.
+* For conversion between unrelated types, for example between `char *` and
+  `uint8_t *`, or between `Collator *` and `UCollator *`, use a
+  `reinterpret_cast`.
+
+#### Returning a Number of Items
+
+To return a number of items, use `countItems()`, **not** `getItemCount()`, even if
+there is no need to actually count using that member function.
+
+#### Ranges of Indexes
+
+Specify a range of indexes by having start and limit parameters with names or
+suffix conventions that represent the index. A range should contain indexes from
+start to limit-1 such as an interval that is left-closed and right-open. Using
+mathematical notation, this is represented as: \[start..limit\[.
+
+#### Functions with Buffers
+
+Set the default value to -1 for functions that take a buffer (pointer) and a
+length argument with a default value so that the function determines the length
+of the input itself (for text, calling `u_strlen()`). Any other negative or
+undefined value constitutes an error.
+
+#### Primitive Types
+
+Primitive types are defined by the `unicode/utypes.h` file or a header file that
+includes other header files. The most common types are `uint8_t`, `uint16_t`,
+`uint32_t`, `int8_t`, `int16_t`, `int32_t`, `char16_t`,
+`UChar` (same as `char16_t`), `UChar32` (signed, 32-bit), and `UErrorCode`.
+
+The language built-in type bool and constants true and false may be used
+internally, for local variables and parameters of internal functions. The ICU
+type `UBool` must be used in public APIs and in the definition of any persistent
+data structures. `UBool` is guaranteed to be one byte in size and signed; bool is
+not.
+
+#### File Names (.h, .c, .cpp, data files if possible, etc.)
+
+Limit file names to 31 lowercase ASCII characters. (Older versions of MacOS have
+that length limit.)
+
+Exception: The layout engine uses mixed-case file names.
+
+(We have abandoned the 8.3 naming standard although we do not change the names
+of old header files.)
+
+#### Language Extensions and Standards
+
+Proprietary features, language extensions, or library functions, must not be
+used because they will not work on all C or C++ compilers.
+In Microsoft Visual C++, go to Project Settings(alt-f7)->All Configurations->
+C/C++->Customize and check Disable Language Extensions.
+
+Exception: some Microsoft headers will not compile without language extensions
+being enabled, which in turn requires some ICU files be built with language
+extensions.
+
+#### Tabs and Indentation
+
+Save files with spaces instead of tab characters (\\x09). The indentation size
+is 4.
+
+#### Documentation
+
+Use Java doc-style in-file documentation created with
+[doxygen](http://www.doxygen.org/) .
+
+#### Multiple Statements
+
+Place multiple statements in multiple lines. if() or loop heads must not be
+followed by their bodies on the same line.
+
+#### Placements of {} Curly Braces
+
+Place curly braces {} in reasonable and consistent locations. Each of us
+subscribes to different philosophies. It is recommended to use the style of a
+file, instead of mixing different styles. It is requested, however, to not have
+if() and loop bodies without curly braces.
+
+#### if() {...} and Loop Bodies
+
+Use curly braces for if() and else as well as loop bodies, etc., even if there
+is only one statement.
+
+#### Function Declarations
+
+Have one line that has the return type and place all the import declarations,
+extern declarations, export declarations, the function name, and function
+signature at the beginning of the next line.
+
+Function declarations need to be in the form CAPI return-type `U_EXPORT2` to
+satisfy all the compilers' requirements.
+
+For example, use the following
+convention:
+
+```C++
+U_CAPI int32_t U_EXPORT2
+u_formatMessage(...);
+```
+
+> :point_right: **Note**: The `U_CAPI`/`U_DRAFT`/... and `U_EXPORT2` qualifiers
+are required for both the declaration and the definiton of *exported C and
+static C++ functions*. Use `U_CAPI` (or `U_DRAFT` etc.) before and `U_EXPORT2`
+after the return type of *exported C and static C++ functions*. Internal
+functions that are visible outside a compilation unit need a `U_CFUNC` before
+the return type. *Non-static C++ class member functions* do *not* get
+`U_CAPI`/`U_EXPORT2` because they are exported and declared together with their
+class exports.
+
+#### Use Anonymous Namesapces or Static For File Scope
+
+Use anonymous namespaces or `static` for variables, functions, and constants that
+are not exported explicitly by a header file. Some platforms are confused if
+non-static symbols are not explicitly declared extern. These platforms will not
+be able to build ICU nor link to it.
+
+#### Using C Callbacks From C++ Code
+
+z/OS and Windows COM wrappers around ICU need `__cdecl` for callback functions.
+The reason is that C++ can have a different function calling convention from C.
+These callback functions also usually need to be private. So the following code
+
+```C++
+UBool
+isAcceptable(void * /* context */,
+             const char * /* type */, const char * /* name */,
+             const UDataInfo *pInfo)
+{
+    // Do something here.
+}
+```
+
+should be changed to look like the following by adding `U_CDECL_BEGIN`, `static`,
+`U_CALLCONV` and `U_CDECL_END`.
+
+```C++
+U_CDECL_BEGIN
+static UBool U_CALLCONV
+isAcceptable(void * /* context */,
+             const char * /* type */, const char * /* name */,
+             const UDataInfo *pInfo)
+{
+    // Do something here.
+}
+U_CDECL_END
+```
+
+#### Same Module and Functionality in C and in C++
+
+Determine if two headers are needed. If the same functionality is provided with
+both a C and a C++ API, then there can be two headers, one for each language,
+even if one uses the other. For example, there can be umsg.h for C and `msgfmt.h`
+for C++.
+
+Not all functionality has or needs both kinds of API. More and more
+functionality is available only via C APIs to avoid duplication of API,
+documentation, and maintenance. C APIs are perfectly usable from C++ code,
+especially with `UnicodeString` methods that alias or expose C-style string
+buffers.
+
+#### Platform Dependencies
+
+Use the platform dependencies that are within the header files that `utypes.h`
+files include. They are `platform.h` (which is generated by the configuration
+script from `platform.h.in`) and its more specific cousins like `pwin32.h` for
+Windows, which define basic types, and `putil.h`, which defines platform
+utilities.
+**Important:** Outside of these files, and a small number of implementation
+files that depend on platform differences (like `umutex.c`), **no** ICU source
+code may have **any** `#ifdef` **OperatingSystemName** instructions.
+
+#### Short, Unnested Mutex Blocks
+
+Do not use function calls within a mutex block for mutual-exclusion (mutex)
+blocks. This can prevent deadlocks from occurring later. There should be as
+little code inside a mutex block as possible to minimize the performance
+degradation from blocked threads.
+Also, it is not guaranteed that mutex blocks are re-entrant; therefore, they
+must not be nested.
+
+#### Names of Internal Functions
+
+Internal functions that are not declared static (regardless of inlining) must
+follow the naming conventions for exported functions because many compilers and
+linkers do not distinguish between library exports and intra-library visible
+functions.
+
+#### Which Language for the Implementation
+
+Write implementation code in C++. Use objects very carefully, as always:
+Implicit constructors, assignments etc. can make simple-looking code
+surprisingly slow.
+
+For every C API, make sure that there is at least one call from a pure C file in
+the cintltst test suite.
+
+Background: We used to prefer C or C-style C++ for implementation code because
+we used to have users ask for pure C. However, there was never a large, usable
+subset of ICU that was usable without any C++ dependencies, and C++ can(!) make
+for much shorter, simpler, less error-prone and easier-to-maintain code, for
+example via use of "smart pointers" (`unicode/localpointer.h` and `cmemory.h`).
+
+We still try to expose most functionality via *C APIs* because of the
+difficulties of binary compatible C++ APIs exported from DLLs/shared libraries.
+
+#### No Compiler Warnings
+
+ICU must compile without compiler warnings unless such warnings are verified to
+be harmless or bogus. Often times a warning on one compiler indicates a breaking
+error on another.
+
+#### Enum Values
+
+When casting an integer value to an enum type, the enum type *should* have a
+constant with this integer value, or at least it *must* have a constant whose
+value is at least as large as the integer value being cast, with the same
+signedness. For example, do not cast a -1 to an enum type that only has
+non-negative constants. Some compilers choose the internal representation very
+tightly for the defined enum constants, which may result in the equivalent of a
+`uint8_t` representation for an enum type with only small, non-negative constants.
+Casting a -1 to such a type may result in an actual value of 255. (This has
+happened!)
+
+When casting an enum value to an integer type, make sure that the enum value's
+numeric value is within range of the integer type.
+
+#### Do not check for this!=NULL, do not check for NULL references
+
+In public APIs, assume this!=0 and assume that references are not 0. In C code,
+"this" is the "service object" pointer, such as `set` in
+`uset_add(USet* set, UChar32 c)` — don't check for set!=NULL.
+
+We do usually check all other (non-this) pointers for NULL, in those cases when
+NULL is not valid. (Many functions allow a NULL string or buffer pointer if the
+length or capacity is 0.)
+
+Rationale: "this" is not really an argument, and checking it costs a little bit
+of code size and runtime. Other libraries also commonly do not check for valid
+"this", and resulting failures are fairly obvious.
+
+### Memory Usage
+
+#### Dynamically Allocated Memory
+
+ICU4C APIs are designed to allow separate heaps for its libraries vs. the
+application. This is achieved by providing factory methods and matching
+destructors for all allocated objects. The C++ API uses a common base class with
+overridden `new`/`delete` operators and/or forms an equivalent pair with `createXyz()`
+factory methods and the `delete` operator. The C API provides pairs of `open`/`close`
+functions for each service. See the C++ and C guideline sections below for
+details.
+
+Exception: Most C++ API functions that return a `StringEnumeration` (by pointer
+which the caller must delete) are named `getXyz()` rather than `createXyz()`
+because "get" is much more natural. (These are not factory methods in the sense
+of `NumberFormat::createScientificInstance()`.) For example,
+`static StringEnumeration *Collator::``get``Keywords(UErrorCode &)`. We should document
+clearly in the API comments that the caller must delete the returned
+`StringEnumeration`.
+
+#### Declaring Static Data
+
+All unmodifiable data should be declared `const`. This includes the pointers and
+the data itself. Also if you do not need a pointer to a string, declare the
+string as an array. This reduces the time to load the library and all its
+pointers. This should be done so that the same library data can be shared across
+processes automatically. Here is an example:
+
+```C++
+#define MY_MACRO_DEFINED_STR "macro string"
+const char *myCString = "myCString";
+int16_t myNumbers[] = {1, 2, 3};
+```
+
+This should be changed to the following:
+
+```C++
+static const char MY_MACRO_DEFINED_STR[] = "macro string";
+static const char myCString[] = "myCString";
+static const int16_t myNumbers[] = {1, 2, 3};
+```
+
+#### No Static Initialization
+
+The most common reason to have static initialization is to declare a
+`static const UnicodeString`, for example (see `utypes.h` about invariant characters):
+
+```C++
+static const UnicodeString myStr("myStr", "");
+```
+
+The most portable and most efficient way to declare ASCII text as a Unicode
+string is to do the following instead:
+
+```C++
+static const UChar myStr[] = { 0x6D, 0x79, 0x53, 0x74, 0x72, 0}; /* "myStr" */
+```
+
+We do not use character literals
+for Unicode characters and strings because the execution character set of C/C++
+compilers is almost never Unicode and may not be ASCII-compatible (especially on
+EBCDIC platforms). Depending on the API where the string is to be used, a
+terminating NUL (0) may or may not be required. The length of the string (number
+of `UChar`s in the array) can be determined with `sizeof(myStr)/U_SIZEOF_UCHAR`,
+(subtract 1 for the NUL if present). Always remember to put in a comment at the
+end of the declaration what the Unicode string says.
+
+Static initialization of C++ objects **must not be used** in ICU libraries
+because of the following reasons:
+
+1. It leads to intractable order-of-initialization dependencies.
+2. It makes it difficult or impossible to release all of the libraries
+   resources. See `u_cleanup()`.
+3. It takes time to initialize the library.
+4. Dependency checking is not completely done in C or C++. For instance, if an
+   ICU user creates an ICU object or calls an ICU function statically that
+   depends on static data, it is not guaranteed that the statically declared
+   data is initialized.
+5. Certain users like to manage their own memory. They can not manage ICU's
+   memory properly because of item #2.
+6. It is easier to debug code that does not use static initialization.
+7. Memory allocated at static initialization time is not guaranteed to be
+   deallocated with a C++ destructor when the library is unloaded. This is a
+   problem when ICU is unloaded and reloaded into memory and when you are using
+   a heap debugging tool. It would also not work with the `u_cleanup()` function.
+8. Some platforms cannot handle static initialization or static destruction
+   properly. Several compilers have this random bug (even in the year 2001).
+
+ICU users can use the `U_STRING_DECL` and `U_STRING_INIT` macros for C strings. Note
+that on some platforms this will incur a small initialization cost (simple
+conversion). Also, ICU users need to make sure that they properly and
+consistently declare the strings with both macros. See `ustring.h` for details.
+
+### C++ Coding Guidelines
+
+This section describes the C++ specific guidelines or conventions to use.
+
+#### Portable Subset of C++
+
+ICU uses only a portable subset of C++ for maximum portability. Also, it does
+not use features of C++ that are not implemented well in all compilers or are
+cumbersome. In particular, ICU does not use exceptions, or the Standard Template
+Library (STL).
+
+We have started to use templates in ICU 4.2 (e.g., `StringByteSink`) and ICU 4.4
+(`LocalPointer` and some internal uses). We try to limit templates to where they
+provide a lot of benefit (robust code, avoid duplication) without much or any
+code bloat.
+
+We continue to not use the Standard Template Library (STL) in ICU library code
+because its design causes a lot of code bloat. More importantly:
+
+* Exceptions: STL classes and algorithms throw exceptions. ICU does not throw
+  exceptions, and ICU code is not exception-safe.
+* Memory management: STL uses default new/delete, or Allocator parameters
+  which create different types; they throw out-of-memory exceptions. ICU
+  memory allocation is customizable and must not throw exceptions.
+* Non-polymorphic: For APIs, STL classes are also problematic because
+  different template specializations create different types. For example, some
+  systems use custom string classes (different allocators, different
+  strategies for buffer sharing vs. copying), and ICU should be able to
+  interface with most of them.
+
+We have started to use compiler-provided Run-Time Type Information (RTTI) in ICU
+4.6. It is now required for building ICU, and encouraged for using ICU where
+RTTI is needed. For example, use `dynamic_cast<DecimalFormat*>` on a
+`NumberFormat` pointer that is usually but not always a `DecimalFormat` instance.
+Do not use `dynamic_cast<>` on a reference, because that throws a `bad_cast`
+exception on failure.
+
+ICU uses a limited form of multiple inheritance equivalent to Java's interface
+mechanism: All but one base classes must be interface/mixin classes, i.e., they
+must contain only pure virtual member functions. For details see the
+'boilerplate' discussion below. This restriction to at most one base class with
+non-virtual members eliminates problems with the use and implementation of
+multiple inheritance in C++. ICU does not use virtual base classes.
+
+> :point_right: **Note**: Every additional base class, *even an interface/mixin
+class*, adds another vtable pointer to each subclass object, that is, it
+*increases the object/instance size by 8 bytes* on most platforms.
+
+#### Classes and Members
+
+C++ classes and their members do not need a 'U' or any other prefix.
+
+#### Global Operators
+
+Global operators (operators that are not class members) can be problematic for
+library entry point versioning, may confuse users and cannot be easily ported to
+Java (ICU4J). They should be avoided if possible.
+
+~~The issue with library entry point versioning is that on platforms that do not
+support namespaces, users must rename all classes and global functions via
+urename.h. This renaming process is not possible with operators.~~ Starting with
+ICU 49, we require C++ namespace support. However, a global operator can be used
+in ICU4C (when necessary) if its function signature contains an ICU C++ class
+that is versioned. This will result in a mangled linker name that does contain
+the ICU version number via the versioned name of the class parameter. For
+example, ICU4C 2.8 added an operator + for `UnicodeString`, with two `UnicodeString`
+reference parameters.
+
+#### Virtual Destructors
+
+In classes with virtual methods, destructors must be explicitly declared, and
+must be defined (implemented) outside the class definition in a .cpp file.
+
+More precisely:
+
+1. All classes with any virtual members or any bases with any virtual members
+   should have an explicitly declared virtual destructor.
+2. Constructors and destructors should be declared and/or defined prior to
+   *any* other methods, public or private, within the class definition.
+3. All virtual destructors should be defined out-of-line, and in a .cpp file
+   rather than a header file.
+
+This is so that the destructors serve as "key functions" so that the compiler
+emits the vtable in only and exactly the desired files. It can help make
+binaries smaller that use statically-linked ICU libraries, because the compiler
+and linker can prove more easily that some code is not used.
+
+The Itanium C++ ABI (which is used on all x86 Linux) says: "The virtual table
+for a class is emitted in the same object containing the definition of its key
+function, i.e. the first non-pure virtual function that is not inline at the
+point of class definition. If there is no key function, it is emitted everywhere
+used."
+
+(This was first done in ICU 49; see [ticket #8454](http://bugs.icu-project.org/trac/ticket/8454.)
+
+#### Namespaces
+
+Beginning with ICU version 2.0, ICU uses namespaces. The actual namespace is
+`icu_M_N` with M being the major ICU release number and N being the minor ICU
+release number. For convenience, the namespace `icu` is an alias to the current
+release-specific one. (The actual namespace name is `icu` itself if renaming is
+turned off.)
+
+Starting with ICU 49, we require C++ namespace support.
+
+Class declarations, even forward declarations, must be scoped to the ICU
+namespace. For example:
+
+```C++
+U_NAMESPACE_BEGIN
+
+class Locale;
+
+U_NAMESPACE_END
+
+// outside U_NAMESPACE_BEGIN..U_NAMESPACE_END
+extern void fn(icu::UnicodeString&);
+
+// outside U_NAMESPACE_BEGIN..U_NAMESPACE_END
+// automatically set by utypes.h
+// but recommended to be not set automatically
+U_NAMESPACE_USE
+Locale loc("fi");
+```
+
+U_NAMESPACE_USE (expands to using namespace icu_M_N; when available) is
+automatically done when `utypes.h` is included, so that all ICU classes are
+immediately usable. However, we recommend that you turn this off via
+`CXXFLAGS="-DU_USING_ICU_NAMESPACE=0"`.
+
+#### Declare Class APIs
+
+Class APIs need to be declared like either of the following:
+
+#### Inline-Implemented Member Functions
+
+Class member functions are usually declared but not inline-implemented in the
+class declaration. A long function implementation in the class declaration makes
+it hard to read the class declaration.
+
+It is ok to inline-implement *trivial* functions in the class declaration.
+Pretty much everyone agrees that inline implementations are ok if they fit on
+the same line as the function signature, even if that means bending the
+single-statement-per-line rule slightly:
+
+```C++
+T *orphan() { T *p=ptr; ptr=NULL; return p; }
+```
+
+Most people also agree that very short multi-line implementations are ok inline
+in the class declaration. Something like the following is probably the maximum:
+
+```C++
+Value *getValue(int index) {
+    if(index>=0 && index<fLimit) {
+        return fArray[index];
+    }
+    return NULL;
+}
+```
+
+If the inline implementation is longer than that, then just declare the function
+inline and put the actual inline implementations after the class declaration in
+the same file. (See `unicode/unistr.h` for many examples.)
+
+If it's significantly longer than that, then it's probably not a good candidate
+for inlining anyway.
+
+#### C++ class layout and 'boilerplate'
+
+There are different sets of requirements for different kinds of C++ classes. In
+general, all instantiable classes (i.e., all classes except for interface/mixin
+classes and ones with only static member functions) inherit the `UMemory` base
+class. `UMemory` provides `new`/`delete` operators, which allows to keep the ICU
+heap separate from the application heap, or to customize ICU's memory allocation
+consistently.
+
+> :point_right: **Note**: Public ICU APIs must return or orphan only C++ objects
+that are to be released with `delete`. They must not return allocated simple
+types (including pointers, and arrays of simple types or pointers) that would
+have to be released with a `free()` function call using the ICU library's heap.
+Simple types and pointers must be returned using fill-in parameters (instead of
+allocation), or cached and owned by the returning API.
+
+**Public ICU C++ classes** must inherit either the `UMemory` or the `UObject`
+base class for proper memory management, and implement the following common set
+of 'boilerplate' functions:
+
+* default constructor
+* copy constructor
+* assignment operator
+* operator==
+* operator!=
+
+> :point_right: **Note**: Each of the above either must be implemented, verified
+that the default implementation according to the C++ standard will work
+(typically not if any pointers are used), or declared private without
+implementation.
+
+* If public subclassing is intended, then the public class must inherit
+  `UObject` and should implement
+  * `clone()`
+* **RTTI:**
+  * If a class is a subclass of a parent (e.g., `Format`) with ICU's "poor
+    man's RTTI" (Run-Time Type Information) mechanism (via
+    `getDynamicClassID()` and `getStaticClassID()`) then add that to the new
+    subclass as well (copy implementations from existing C++ APIs).
+  * If a class is a new, immediate subclass of `UObject` (e.g.,
+    `Normalizer2`), creating a whole new class hierarchy, then declare a
+    *private* `getDynamicClassID()` and define it to return NULL (to
+    override the pure virtual version in `UObject`); copy the relevant lines
+    from `normalizer2.h` and `normalizer2.cpp`
+    (`UOBJECT_DEFINE_NO_RTTI_IMPLEMENTATION(className)`). Do not add any
+    "poor man's RTTI" at all to subclasses of this class.
+
+**Interface/mixin classes** are equivalent to Java interfaces. They are as much
+multiple inheritance as ICU uses — they do not decrease performance, and they do
+not cause problems associated with multiple base classes having data members.
+Interface/mixin classes contain only pure virtual member functions, and must
+contain an empty virtual destructor. See for example the `UnicodeMatcher` class.
+Interface/mixin classes must not inherit any non-interface/mixin class,
+especially not `UMemory` or `UObject`. Instead, implementation classes must inherit
+one of these two (or a subclass of them) in addition to the interface/mixin
+classes they implement. See for example the `UnicodeSet` class.
+
+**Static classes** contain only static member functions and are therefore never
+instantiated. They must not inherit `UMemory` or `UObject`. Instead, they must
+declare a private default constructor (without any implementation) to prevent
+instantiation. See for example the `LESwaps` layout engine class.
+
+**C++ classes internal to ICU** need not (but may) implement the boilerplate
+functions as mentioned above. They must inherit at least `UMemory` if they are
+instantiable.
+
+#### Make Sure The Compiler Uses C++
+
+The `__cplusplus` macro being defined ensures that the compiler uses C++. Starting
+with ICU 49, we use this standard predefined macro.
+
+Up until ICU 4.8 we used to define and use `XP_CPLUSPLUS` but that was redundant
+and did not add any value because it was defined if-and-only-if `__cplusplus` was
+defined.
+
+#### Adoption of Objects
+
+Some constructors and factory functions take pointers to objects that they
+adopt. The newly created object contains a pointer to the adoptee and takes over
+ownership and lifecycle control. If an error occurs while creating the new
+object (and thus in the code that adopts an object), then the semantics used
+within ICU must be *adopt-on-call* (as opposed to, for example,
+adopt-on-success):
+
+* **General**: A constructor or factory function that adopts an object does so
+  in all cases, even if an error occurs and a `UErrorCode` is set. This means
+  that either the adoptee is deleted immediately or its pointer is stored in
+  the new object. The former case is most common when the constructor or
+  factory function is called and the `UErrorCode` already indicates a failure.
+  In the latter case, the new object must take care of deleting the adoptee
+  once it is deleted itself regardless of whether or not the constructor was
+  successful.
+
+* **Constructors**: The code that creates the object with the new operator
+  must check the resulting pointer returned by new and delete any adoptees if
+  it is 0 because the constructor was not called. (Typically, a `UErrorCode`
+  must be set to `U_MEMORY_ALLOCATION_ERROR`.)
+
+  **Pitfall**: If you allocate/construct via "`ClassName *p = new
+  ClassName(adoptee);`" and the memory allocation failed (p==NULL), then the
+  constructor has not been called, the adoptee has not been adopted, and you
+  are still responsible for deleting it!
+
+* **Factory functions (createInstance())**: The factory function must set a
+  `U_MEMORY_ALLOCATION_ERROR` and delete any adoptees if it cannot allocate the
+  new object. If the construction of the object fails otherwise, then the
+  factory function must delete it and the factory function must delete its
+  adoptees. As a result, a factory function always returns either a valid
+  object and a successful `UErrorCode`, or a 0 pointer and a failure `UErrorCode`.
+  A factory function returns a pointer to an object that must be deleted by
+  the user/owner.
+
+Example: (This is a best-practice example. It does not reflect current `Calendar`
+code.)
+
+```C++
+Calendar*
+Calendar::createInstance(TimeZone* zone, UErrorCode& errorCode) {
+    LocalPointer<TimeZone> adoptedZone(zone);
+    if(U_FAILURE(errorCode)) {
+        // The adoptedZone destructor deletes the zone.
+        return NULL;
+    }
+    // since the Locale isn't specified, use the default locale
+    LocalPointer<Calendar> c(new GregorianCalendar(zone, Locale::getDefault(), errorCode));
+    if(c.isNull()) {
+        errorCode = U_MEMORY_ALLOCATION_ERROR;
+        // The adoptedZone destructor deletes the zone. return NULL;
+    } else if(U_FAILURE(errorCode)) {
+        // The c destructor deletes the Calendar.
+        return NULL;
+    } // c adopted the zone. adoptedZone.orphan();
+    return c.orphan();
+}
+```
+
+#### Memory Allocation
+
+All ICU C++ class objects directly or indirectly inherit `UMemory` (see
+'boilerplate' discussion above) which provides `new`/`delete` operators, which in
+turn call the internal functions in `cmemory.c`. Creating and releasing ICU C++
+objects with `new`/`delete` automatically uses the ICU allocation functions.
+
+> :point_right: **Note**: Remember that (in absence of explicit :: scoping) C++
+determines which `new`/`delete` operator to use from which type is allocated or
+deleted, not from the context of where the statement is. Since non-class data
+types (like `int`) cannot define their own `new`/`delete` operators, C++ always
+uses the global ones for them by default.
+
+When global `new`/`delete` operators are to be used in the application (never inside
+ICU!), then they should be properly scoped as e.g. `::new`, and the application
+must ensure that matching `new`/`delete` operators are used. In some cases where
+such scoping is missing in non-ICU code, it may be simpler to compile ICU
+without its own `new`/`delete` operators. See `source/common/unicode/uobject.h` for
+details.
+
+In ICU library code, allocation of non-class data types — simple integer types
+**as well as pointers** — must use the functions in `cmemory.h`/`.c` (`uprv_malloc()`,
+`uprv_free()`, `uprv_realloc()`). Such memory objects must be released inside ICU,
+never by the user; this is achieved either by providing a "close" function for a
+service or by avoiding to pass ownership of these objects to the user (and
+instead filling user-provided buffers or returning constant pointers without
+passing ownership).
+
+The `cmemory.h`/`.c` functions can be overridden at ICU compile time for custom
+memory management. By default, `UMemory`'s `new`/`delete` operators are
+implemented by calling these common functions. Overriding the `cmemory.h`/`.c`
+functions changes the memory management for both C and C++.
+
+C++ objects that were either allocated with new or returned from a `createXYZ()`
+factory method must be deleted by the user/owner.
+
+#### Memory Allocation Failures
+
+All memory allocations and object creations should be checked for success. In
+the event of a failure (a NULL returned), a `U_MEMORY_ALLOCATION_ERROR` status
+should be returned by the ICU function in question. If the allocation failure
+leaves the ICU service in an invalid state, such that subsequent ICU operations
+could also fail, the situation should be flagged so that the subsequent
+operations will fail cleanly. Under no circumstances should a memory allocation
+failure result in a crash in ICU code, or cause incorrect results rather than a
+clean error return from an ICU function.
+
+Some functions, such as the C++ assignment operator, are unable to return an ICU
+error status to their caller. In the event of an allocation failure, these
+functions should mark the object as being in an invalid or bogus state so that
+subsequent attempts to use the object will fail. Deletion of an invalid object
+should always succeed.
+
+#### Memory Management
+
+C++ memory management is error-prone, and memory leaks are hard to avoid, but
+the following helps a lot.
+
+First, if you can stack-allocate an object (for example, a `UnicodeString` or
+`UnicodeSet`), do so. It is the easiest way to manage object lifetime.
+
+Inside functions, avoid raw pointers to owned objects. Instead, use
+[LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html)`<UnicodeString>`
+or `LocalUResouceBundlePointer` etc., which is ICU's "smart pointer"
+implementation. This is the "[Resource Acquisition Is Initialization(RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)"
+idiom. The "smart pointer" auto-deletes the object when it goes out of scope,
+which means that you can just return from the function when an error occurs and
+all auto-managed objects are deleted. You do not need to remember to write an
+increasing number of "`delete xyz;`" at every function exit point.
+
+*In fact, you should almost never need to write "delete" in any function.*
+
+* Except in a destructor where you delete all of the objects which the class
+  instance owns.
+* Also, in static "cleanup" functions you still need to delete cached objects.
+
+When you pass on ownership of an object, for example to return the pointer of a
+newly built object, or when you call a function which adopts your object, use
+`LocalPointer`'s `.orphan()`.
+
+* Careful: When you return an object or pass it into an adopting factory
+  method, you can use `.orphan()` directly.
+* However, when you pass it into an adopting constructor, you need to pass in
+  the `.getAlias()`, and only if the *allocation* of the new owner succeeded
+  (you got a non-NULL pointer for that) do you `.orphan()` your `LocalPointer`.
+* See the `Calendar::createInstance()` example above.
+* See the `AlphabeticIndex` implementation for live examples. Search for other
+  uses of `LocalPointer`/`LocalArray`.
+
+Every object must always be deletable/destructable. That is, at a minimum, all
+pointers to owned memory must always be either NULL or point to owned objects.
+
+Internally:
+
+[cmemory.h](http://bugs.icu-project.org/trac/browser/icu/trunk/source/common/cmemory.h)
+defines the `LocalMemory` class for chunks of memory of primitive types which
+will be `uprv_free()`'ed.
+
+[cmemory.h](http://bugs.icu-project.org/trac/browser/icu/trunk/source/common/cmemory.h)
+also defines `MaybeStackArray` and `MaybeStackHeaderAndArray` which automate
+management of arrays.
+
+Use `CharString`
+([charstr.h](http://bugs.icu-project.org/trac/browser/icu/trunk/source/common/charstr.h))
+for `char *` strings that you build and modify.
+
+#### Global Inline Functions
+
+Global functions (non-class member functions) that are declared inline must be
+made static inline. Some compilers will export symbols that are declared inline
+but not static.
+
+#### No Declarations in the for() Loop Head
+
+Iterations through `for()` loops must not use declarations in the first part of
+the loop. There have been two revisions for the scoping of these declarations
+and some compilers do not comply to the latest scoping. Declarations of loop
+variables should be outside these loops.
+
+#### Common or I18N
+
+Decide whether or not the module is part of the common or the i18n API
+collection. Use the appropriate macros. For example, use
+`U_COMMON_IMPLEMENTATION`, `U_I18N_IMPLEMENTATION`, `U_COMMON_API`, `U_I18N_API`.
+See `utypes.h`.
+
+#### Constructor Failure
+
+If there is a reasonable chance that a constructor fails (For example, if the
+constructor relies on loading data), then either it must use and set a
+`UErrorCode` or the class needs to support an `isBogus()`/`setToBogus()` mechanism
+like `UnicodeString` and `UnicodeSet`, and the constructor needs to set the object
+to bogus if it fails.
+
+#### UVector, UVector32, or UVector64
+
+Use `UVector` to store arrays of `void *`; use `UVector32` to store arrays of
+`int32_t`; use `UVector64` to store arrays of `int64_t`. Historically, `UVector`
+has stored either `int32_t` or `void *`, but now storing `int32_t` in a `UVector`
+is deprecated in favor of `UVector32`.
+
+### C Coding Guidelines
+
+This section describes the C-specific guidelines or conventions to use.
+
+#### Declare and define C APIs with both `U_CAPI` and `U_EXPORT2`
+
+All C APIs need to be **both declared and defined** using the `U_CAPI` and
+`U_EXPORT2` qualifiers.
+
+```C++
+U_CAPI int32_t U_EXPORT2
+u_formatMessage(...);
+```
+
+> :point_right: **Note**: Use `U_CAPI` before and `U_EXPORT2` after the return
+type of exported C functions. Internal functions that are visible outside a
+compilation unit need a `U_CFUNC` before the return type.
+
+#### Subdivide the Name Space
+
+Use prefixes to avoid name collisions. Some of those prefixes contain a 3- (or
+sometimes 4-) letter module identifier. Very general names like
+`u_charDirection()` do not have a module identifier in their prefix.
+
+* For POSIX replacements, the (all lowercase) POSIX function names start with
+  "u_": `u_strlen()`.
+* For other API functions, a 'u' is appended to the beginning with the module
+  identifier (if appropriate), and an underscore '_', followed by the
+  **mixed-case** function name. For example, use `u_charDirection()`,
+  `ubidi_setPara()`.
+* For types (struct, enum, union), a "U" is appended to the beginning, often
+  "`U<module identifier>`" directly to the typename, without an underscore. For
+  example, use `UComparisonResult`.
+* For #defined constants and macros, a "U_" is appended to the beginning,
+  often "`U<module identifier>_`" with an underscore to the uppercase macro
+  name. For example, use `U_ZERO_ERROR`, `U_SUCCESS()`. For example, `UNORM_NFC`
+
+#### Functions for Constructors and Destructors
+
+Functions that roughly compare to constructors and destructors are called
+`umod_open()` and `umod_close()`. See the following example:
+
+```C++
+CAPI UBiDi * U_EXPORT2
+ubidi_open();
+
+CAPI UBiDi * U_EXPORT2
+ubidi_openSized(UTextOffset maxLength, UTextOffset maxRunCount);
+
+CAPI void U_EXPORT2
+ubidi_close(UBiDi *pBiDi);
+```
+
+Each successful call to a `umod_open()` returns a pointer to an object that must
+be released by the user/owner by calling the matching `umod_close()`.
+
+#### C "Service Object" Types and LocalPointer Equivalents
+
+For every C "service object" type (equivalent to C++ class), we want to have a
+[LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html)
+equivalent, so that C++ code calling the C API can use the specific "smart
+pointer" to implement the "[Resource Acquisition Is Initialization
+(RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)"
+idiom.
+
+For example, in `ubidi.h` we define the `UBiDi` "service object" type and also
+have the following "smart pointer" definition which will call `ubidi_close()` on
+destruction:
+
+```C++
+// Use config switches like this only after including unicode/utypes.h
+// or another ICU header.
+#if U_SHOW_CPLUSPLUS_API
+
+U_NAMESPACE_BEGIN
+
+/**
+ * class LocalUBiDiPointer
+ * "Smart pointer" class, closes a UBiDi via ubidi_close().
+ * For most methods see the LocalPointerBase base class.
+ *
+ * @see LocalPointerBase
+ * @see LocalPointer
+ * @stable ICU 4.4
+ */
+U_DEFINE_LOCAL_OPEN_POINTER(LocalUBiDiPointer, UBiDi, ubidi_close);
+
+U_NAMESPACE_END
+
+#endif
+```
+
+#### Inline Implementation Functions
+
+Some, but not all, C compilers allow ICU users to declare functions inline
+(which is a C++ language feature) with various keywords. This has advantages for
+implementations because inline functions are much safer and more easily debugged
+than macros.
+
+ICU *used to* use a portable `U_INLINE` declaration macro that can be used for
+inline functions in C. However, this was an unnecessary platform dependency.
+
+We have changed all code that used `U_INLINE` to C++ (.cpp) using "inline", and
+removed the `U_INLINE` definition.
+
+If you find yourself constrained by .c, change it to .cpp.
+
+All functions that are declared inline, or are small enough that an optimizing
+compiler might inline them even without the inline declaration, should be
+defined (implemented) – not just declared – before they are first used. This is
+to enable as much inlining as possible, and also to prevent compiler warnings
+for functions that are declared inline but whose definition is not available
+when they are called.
+
+#### C Equivalents for Classes with Multiple Constructors
+
+In cases like `BreakIterator` and `NumberFormat`, instead of having several
+different 'open' APIs for each kind of instances, use an enum selector.
+
+#### Source File Names
+
+Source file names for C begin with a 'u'.
+
+#### Memory APIs Inside ICU
+
+For memory allocation in C implementation files for ICU, use the functions and
+macros in `cmemory.h`. When allocated memory is returned from a C API function,
+there must be a corresponding function (like a `ucnv_close()`) that deallocates
+that memory.
+
+All memory allocations in ICU should be checked for success. In the event of a
+failure (a NULL returned from `uprv_malloc()`), a `U_MEMORY_ALLOCATION_ERROR` status
+should be returned by the ICU function in question. If the allocation failure
+leaves the ICU service in an invalid state, such that subsequent ICU operations
+could also fail, the situation should be flagged so that the subsequent
+operations will fail cleanly. Under no circumstances should a memory allocation
+failure result in a crash in ICU code, or cause incorrect results rather than a
+clean error return from an ICU function.
+
+#### // Comments
+
+C++ style // comments may be used in plain C files and in headers that will be
+included in C files.
+
+## Source Code Strings with Unicode Characters
+
+### `char *` strings in ICU
+
+| Declared type | encoding | example | Used with |
+| --- | --- | --- | --- |
+| `char *` | varies with platform | `"Hello"` | Most ICU API functions taking `char *` parameters. Unless otherwise noted, characters are restricted to the "Invariant" set, described below |
+| `char *` | UTF-8 |  `u8"¡Hola!"` | Only functions that are explicitly documented as expecting UTF-8. No restrictions on the characters used. |
+| `UChar *` | UTF-16 | `u"¡Hola!"` | All ICU functions with `UChar *` parameters |
+| `UChar32` | Code Point value | `U'😁'` | UChar32 single code point constant. |
+| `wchar_t` | unknown | `L"Hello"` | Not used with ICU. Unknown encoding, unknown size, not portable. |
+
+ICU source files are UTF-8 encoded, allowing any Unicode character to appear in
+Unicode string or character literals, without the need for escaping. But, for
+clarity, use escapes when plain text would be confusing, e.g. for invisible
+characters.
+
+For convenience, ICU4C tends to use `char *` strings in places where only
+"invariant characters" (a portable subset of the 7-bit ASCII repertoire) are
+used. This allows locale IDs, charset names, resource bundle item keys and
+similar items to be easily specified as string literals in the source code. The
+same types of strings are also stored as "invariant character" `char *` strings
+in the ICU data files.
+
+ICU has hard coded mapping tables in `source/common/putil.c` to convert invariant
+characters to and from Unicode without using a full ICU converter. These tables
+must match the encoding of string literals in the ICU code as well as in the ICU
+data files.
+
+> :point_right: **Note**: Important: ICU assumes that at least the invariant
+characters always have the same codes as is common on platforms with the same
+charset family (ASCII vs. EBCDIC). **ICU has not been tested on platforms where
+this is not the case.**
+
+Some usage of `char *` strings in ICU assumes the system charset instead of
+invariant characters. Such strings are only handled with the default converter
+(See the following section). The system charset is usually a superset of the
+invariant characters.
+
+The following are the ASCII and EBCDIC byte values for all of the invariant
+characters (see also `unicode/utypes.h`):
+
+| Character(s) | ASCII | EBCDIC |
+| --- | --- | --- |
+| a..i | 61..69 | 81..89 |
+| j..r | 6A..72 | 91..99 |
+| s..z | 73..7A | A2..A9 |
+| A..I | 41..49 | C1..C9 |
+| J..R | 4A..52 | D1..D9 |
+| S..Z | 53..5A | E2..E9 |
+| 0..9 | 30..39 | F0..F9 |
+| (space) | 20 | 40 |
+| " | 22 | 7F |
+| % | 25 | 6C |
+| & | 26 | 50 |
+| ' | 27 | 7D |
+| ( | 28 | 4D |
+| ) | 29 | 5D |
+| \* | 2A | 5C |
+| + | 2B | 4E |
+| , | 2C | 6B |
+| - | 2D | 60 |
+| . | 2E | 4B |
+| / | 2F | 61 |
+| : | 3A | 7A |
+| ; | 3B | 5E |
+| < | 3C | 4C |
+| = | 3D | 7E |
+| > | 3E | 6E |
+| ? | 3F | 6F |
+| _ | 5F | 6D |
+
+### Rules Strings with Unicode Characters
+
+In order to include characters in source code strings that are not part of the
+invariant subset of ASCII, one has to use character escapes. In addition, rules
+strings for collation, etc. need to follow service-specific syntax, which means
+that spaces and ASCII punctuation must be quoted using the following rules:
+
+* Single quotes delineate literal text: `a'>'b` => `a>b`
+* Two single quotes, either between or outside of single quoted text, indicate
+  a literal single quote:
+  * `a''b` => `a'b`
+  * `a'>''<'b` => `a>'<b`
+* A backslash precedes a single literal character:
+* Several standard mechanisms are handled by `u_unescape()` and its variants.
+
+> :point_right: **Note**: All of these quoting mechanisms are supported by the
+`RuleBasedTransliterator`. The single quote mechanisms (not backslash, not
+`u_unescape()`) are supported by the format classes. In its infancy,
+`ResourceBundle` supported the `\uXXXX` mechanism and nothing else.
+This quoting method is the current policy. However, there are modules within
+the ICU services that are being updated and this quoting method might not have
+been applied to all of the modules.
+
+## Java Coding Conventions Overview
+
+The ICU group uses the following coding guidelines to create software using the
+ICU Java classes and methods.
+
+### Code style
+
+The standard order for modifier keywords on APIs is:
+
+* public static final synchronized strictfp
+* public abstract
+
+Do not use wild card import, such as "`import java.util.*`". The sort order of
+import statements is `java` / `javax` / `org` / `com`. Within each top level package
+category, sub packages and classes are sorted by alphabetical order. We
+recommend ICU developers to use the Eclipse IDE feature \[Source\] - \[Organize
+Imports\] (Ctrl+Shift+O) to organize import statements.
+
+All if/else/for/while/do loops use braces, even if the controlled statement is a
+single line. This is for clarity and to avoid mistakes due to bad nesting of
+control statements, especially during maintenance.
+
+Tabs should not be present in source files.
+
+Indentation is 4 spaces.
+
+Make sure the code is formatted cleanly with regular indentation. Follow Java
+style code conventions, e.g., don't put multiple statements on a single line,
+use mixed-case identifiers for classes and methods and upper case for constants,
+and so on.
+
+Java source formatting rules described above is coming with the Eclipse project
+file. It is recommended to run \[Source\] - \[Format\] (Ctrl+Shift+F) on Eclipse
+IDE to clean up source files if necessary.
+
+Use UTF-8 encoding (without BOM) for java source files.
+
+Javadoc should be complete and correct when code is checked in, to avoid playing
+catch-up later during the throes of the release. Please javadoc all methods, not
+just external APIs, since this helps with maintenance.
+
+### Code organization
+
+Avoid putting more than one top-level class in a single file. Either use
+separate files or nested classes.
+
+Always define at least one constructor in a public API class. The Java compiler
+automatically generates no-arg constructor when a class has no explicit
+constructors. We cannot provide proper API documentations for such default
+constructors.
+
+Do not mix test, tool, and runtime code in the same file. If you need some
+access to private or package methods or data, provide public accessors for them
+and mark them `@internal`. Test code should be placed in `com.ibm.icu.dev.test`
+package, and tools (e.g., code that generates data, source code, or computes
+constants) in `com.ibm.icu.dev.tool` package. Occasionally for very simple cases
+you can leave a few lines of tool code in the main source and comment it out,
+but maintenance is easier if you just comment the location of the tools in the
+source and put the actual code elsewhere.
+
+Avoid creating new interfaces unless you know you need to mix the interface into
+two or more classes that have separate inheritance. Interfaces are impossible to
+modify later in a backwards-compatible way. Abstract classes, on the other hand,
+can add new methods with default behavior. Use interfaces only if it is required
+by the arcitecture, not just for expediency.
+
+Current releases of ICU4J (since ICU 63) are restricted to use Java SE 7 APIs
+and language features.
+
+### ICU Packages
+
+Public APIs should be placed in `com.ibm.icu.text`, `com.ibm.icu.util`, and
+`com.ibm.icu.lang`. For historical reasons and for easier migration from JDK
+classes, there are also APIs in `com.ibm.icu.math` but new APIs should not be
+added there.
+
+APIs used only during development, testing, or tools work should be placed in
+`com.ibm.icu.dev`.
+
+A class or method which is used by public APIs (listed above) but which is not
+itself public can be placed in different places:
+
+1. If it is only used by one class, make it private in that class.
+2. If it is only used by one class and its subclasses, make it protected in
+   that class. In general, also tag it `@internal` unless you are working on a
+   class that supports user-subclassing (rare).
+3. If it is used by multiple classes in one package, make it package private
+   (also known as default access) and mark it `@internal`.
+4. If it is used by multiple packages, make it public and place the class in
+   `the com.ibm.icu.impl` package.
+
+### Error Handling and Exceptions
+
+Errors should be indicated by throwing exceptions, not by returning “bogus”
+values.
+
+If an input parameter is in error, then a new
+`IllegalArgumentException("description")` should be thrown.
+
+Exceptions should be caught only when something must be done, for example
+special cleanup or rethrowing a different exception. If the error “should never
+occur”, then throw a `new RuntimeException("description")` (rare). In this case,
+a comment should be added with a justification.
+
+Use exception chaining: When an exception is caught and a new one created and
+thrown (usually with additional information), the original exception should be
+chained to the new one.
+
+A catch expression should not catch Throwable. Catch expressions should specify
+the most specific subclass of Throwable that applies. If there are two concrete
+subclasses, both should be specified in separate catch statements.
+
+### Binary Data Files
+
+ICU4J uses the same binary data files as ICU4C, in the big-endian/ASCII form.
+The `ICUBinary` class should be used to read them.
+
+Some data sources (for example, compressed Jar files) do not allow the use of
+several `InputStream` and related APIs:
+
+* Memory mapping is efficient, but not available for all data sources.
+* Do not depend on `InputStream.available()`: It does not provide reliable
+  information for some data sources. Instead, the length of the data needs to
+  be determined from the data itself.
+* Do not call `mark()` and `reset()` methods on `InputStream` without wrapping the
+  `InputStream` object in a new `BufferedInputStream` object. These methods are
+  not implemented by the `ZipInputStream` class, and their use may result in an
+  `IOException`.
+
+### Compiler Warnings
+
+There should be no compiler warnings when building ICU4J. It is recommended to
+develop using Eclipse, and to fix any problems that are shown in the Eclipse
+Problems panel (below the main window).
+
+When a warning is not avoidable, you should add `@SuppressWarnings` annotations
+with minimum scope.
+
+### Miscellaneous
+
+Objects should not be cast to a class in the `sun.*` packages because this would
+cause a `SecurityException` when run under a `SecurityManager`. The exception needs
+to be caught and default action taken, instead of propagating the exception.
+
+## Adding .c, .cpp and .h files to ICU
+
+In order to add compilable files to ICU, add them to the source code control
+system in the appropriate folder and also to the build environment.
+
+To add these files, use the following steps:
+
+1. Choose one of the ICU libraries:
+   * The common library provides mostly low-level utilities and basic APIs that
+     often do not make use of Locales. Examples are APIs that deal with character
+     properties, the Locale APIs themselves, and ResourceBundle APIs.
+   * The i18n library provides Locale-dependent and -using APIs, such as for
+     collation and formatting, that are most useful for internationalized user
+     input and output.
+2. Put the source code files into the folder `icu/source/library-name`, then add
+   them to the build system:
+   * For most platforms, add the expected .o files to
+     `icu/source/library-name/Makefile.in`, to the OBJECTS variable. Add the
+     **public** header files to the HEADERS variable.
+   * For Microsoft Visual C++ 6.0, add all the source code files to
+     `icu/source/library-name/library-name.dsp`. If you don't have Visual C++, add
+     the filenames to the project file manually.
+3. Add test code to `icu/source/test/cintltest` for C APIs and to
+   `icu/source/test/intltest` for C++ APIs.
+4. Make sure that the API functions are called by the test code (100% API
+   coverage) and that at least 85% of the implementation code is exercised by
+   the tests (>=85% code coverage).
+5. Create test code for C using the `log_err()`, `log_info()`, and `log_verbose()`
+   APIs from `cintltst.h` (which uses `ctest.h`) and check it into the appropriate
+   folder.
+6. In order to get your C test code called, add its top level function and a
+   descriptive test module path to the test system by calling `addTest()`. The
+   function that makes the call to `addTest()` ultimately must be called by
+   `addAllTests()` in `calltest.c`. Groups of tests typically have a common
+   `addGroup()` function that calls `addTest()` for the test functions in its
+   group, according to the common part of the test module path.
+7. Add that test code to the build system also. Modify `Makefile.in` and the
+   appropriate `.dsp` file (For example, the file for the library code).
+
+## C Test Suite Notes
+
+The cintltst Test Suite contains all the tests for the International Components
+for Unicode C API. These tests may be automatically run by typing "cintltst" or
+"cintltst -all" at the command line. This depends on the C Test Services:
+`cintltst` or `cintltst -all`.
+
+### C Test Services
+
+The purpose of the test services is to enable the writing of tests entirely in
+C. The services have been designed to make creating tests or converting old ones
+as simple as possible with a minimum of services overhead. A sample test file,
+"demo.c", is included at the end of this document. For more information
+regarding C test services, please see the `icu4c/source/tools/ctestfw` directory.
+
+### Writing Test Functions
+
+The following shows the possible format of test functions:
+
+```C++
+void some_test()
+{
+}
+```
+
+Output from the test is accomplished with three printf-like functions:
+
+```C++
+void log_err ( const char *fmt, ... );
+void log_info ( const char *fmt, ... );
+void log_verbose ( const char *fmt, ... );
+```
+
+* `log_info()` writes to the console for informational messages.
+* `log_verbose()` writes to the console ONLY if the VERBOSE flag is turned
+  on (or the `-v` option to the command line). This option is useful for
+  debugging. By default, the VERBOSE flag is turned OFF.
+* `log_error()` can be called when a test failure is detected. The error is
+  then logged and error count is incremented by one.
+
+To use the tests, link them into a hierarchical structure. The root of the
+structure will be allocated by default.
+
+```C++
+TestNode *root = NULL; /* empty */
+addTest( &root, &some_test, "/test");
+```
+
+Provide `addTest()` with the function pointer for the function that performs the
+test as well as the absolute 'path' to the test. Paths may be up to 127 chars in
+length and may be used to group tests.
+
+The calls to `addTest` must be placed in a function or a hierarchy of functions
+(perhaps mirroring the paths). See the existing cintltst for more details.
+
+### Running the Tests
+
+A subtree may be extracted from another tree of tests for the programmatic
+running of subtests.
+
+```C++
+TestNode* sub;
+sub = getTest(root, "/mytests");
+```
+
+And a tree of tests may be run simply by:
+
+```C++
+runTests( root ); /* or 'sub' */
+```
+
+Similarly, `showTests()` lists out the tests. However, it is easier to use the
+command prompt with the Usage specified below.
+
+### Globals
+
+The command line parser resets the error count and prints a summary of the
+failed tests. But if `runTest` is called directly, for instance, it needs to be
+managed manually. `ERROR_COUNT` contains the number of times `log_err` was
+called. `runTests` resets the count to zero before running the tests.
+`VERBOSITY` must be 1 to display `log_verbose()` data. Otherwise, `VERBOSITY`
+must be set to 0 (default).
+
+### Building cintltst
+
+To compile this test suite using Microsoft Visual C++ (MSVC), follow the
+instructions in `icu4c/source/readme.html#HowToInstall` for building the `allC`
+workspace. This builds the libraries as well as the `cintltst` executable.
+
+### Executing cintltst
+
+To run the test suite from the command line, change the directories to
+`icu4c/source/test/cintltst/Debug` for the debug build (or
+`icu4c/source/test/cintltst/Release` for the release build) and then type `cintltst`.
+
+### cintltst Usage
+
+Type `cintltst -h` to view its command line parameters.
+
+```Text
+### Syntax:
+### Usage: [ -l ] [ -v ] [ -verbose] [-a] [ -all] [-n]
+ [-no_err_msg] [ -h] [ /path/to/test ]
+### -l To get a list of test names
+### -all To run all the test
+### -a To run all the test(same as -all)
+### -verbose To turn ON verbosity
+### -v To turn ON verbosity(same as -verbose)
+### -h To print this message
+### -n To turn OFF printing error messages
+### -no_err_msg (same as -n)
+### -[/subtest] To run a subtest
+### For example to run just the utility tests type: cintltest /tsutil)
+### To run just the locale test type: cintltst /tsutil/loctst
+###
+
+/******************** sample ctestfw test ********************
+********* Simply link this with libctestfw or ctestfw.dll ****
+************************* demo.c *****************************/
+
+#include "stdlib.h"
+#include "ctest.h"
+#include "stdio.h"
+#include "string.h"
+
+/**
+* Some sample dummy tests.
+* the statics simply show how often the test is called.
+*/
+void mytest()
+{
+    static i = 0;
+    log_info("I am a test[%d]\n", i++);
+}
+
+void mytest_err()
+{
+    static i = 0;
+    log_err("I am a test containing an error[%d]\n", i++);
+    log_err("I am a test containing an error[%d]\n", i++);
+}
+
+void mytest_verbose()
+{
+    /* will only show if verbose is on (-v) */
+    log_verbose("I am a verbose test, blabbing about nothing at
+all.\n");
+}
+
+/**
+* Add your tests from this function
+*/
+
+void add_tests( TestNode** root )
+{
+    addTest(root, &mytest, "/apple/bravo" );
+    addTest(root, &mytest, "/a/b/c/d/mytest");
+    addTest(root, &mytest_err, "/d/e/f/h/junk");
+    addTest(root, &mytest, "/a/b/c/d/another");
+    addTest(root, &mytest, "/a/b/c/etest");
+    addTest(root, &mytest_err, "/a/b/c");
+    addTest(root, &mytest, "/bertrand/andre/damiba");
+    addTest(root, &mytest_err, "/bertrand/andre/OJSimpson");
+    addTest(root, &mytest, "/bertrand/andre/juice/oj");
+    addTest(root, &mytest, "/bertrand/andre/juice/prune");
+    addTest(root, &mytest_verbose, "/verbose");
+
+}
+
+int main(int argc, const char *argv[])
+{
+    TestNode *root = NULL;
+
+    add_tests(&root); /* address of root ptr- will be filled in */
+
+    /* Run the tests. An int is returned suitable for the OS status code.
+    (0 for success, neg for parameter errors, positive for the # of
+    failed tests) */
+    return processArgs( root, argc, argv );
+}
+```
+
+## C++ IntlTest Test Suite Documentation
+
+The IntlTest suite contains all of the tests for the C++ API of International
+Components for Unicode. These tests may be automatically run by typing `intltest`
+at the command line. Since the verbose option prints out a considerable amount
+of information, it is recommended that the output be redirected to a file:
+`intltest -v > testOutput`.
+
+### Building IntlTest
+
+To compile this test suite using MSVC, follow the instructions for building the
+`alCPP` (All C++ interfaces) workspace. This builds the libraries as well as the
+`intltest` executable.
+
+### Executing IntelTest
+
+To run the test suite from the command line, change the directories to
+`icu4c/source/test/intltest/Debug`, then type: `intltest -v >testOutput`. For the
+release build, the executable will reside in the
+`icu4c/source/test/intltest/Release` directory.
+
+### IntelTest Usage
+
+Type just `intltest -h` to see the usage:
+
+```Text
+### Syntax:
+### IntlTest [-option1 -option2 ...] [testname1 testname2 ...]
+### where options are: verbose (v), all (a), noerrormsg (n),
+### exhaustive (e) and leaks (l).
+### (Specify either -all (shortcut -a) or a test name).
+### -all will run all of the tests.
+###
+### To get a list of the test names type: intltest LIST
+### To run just the utility tests type: intltest utility
+###
+### Test names can be nested using slashes ("testA/subtest1")
+### For example to list the utility tests type: intltest utility/LIST
+### To run just the Locale test type: intltest utility/LocaleTest
+###
+### A parameter can be specified for a test by appending '@' and the value
+### to the testname.
+```
+
+## C: Testing with Fake Time
+
+The "Fake Time" capability allows ICU4C to be tested as if the hardware clock is
+set to a specific time. This section documents how to use this facility.
+Note that this facility requires the POSIX 'gettimeofday' function to be
+operable.
+
+This facility affects all ICU 'current time' calculations, including date,
+calendar, time zone formats, and relative formats. It doesn't affect any calls
+directly to the underlying operating system.
+
+1. Build ICU with the **`U_DEBUG_FAKETIME`** preprocessor macro set. This can
+   be accomplished with the following line in a file
+   **icu/source/icudefs.local** :
+
+   ```Shell
+   CPPFLAGS+=-DU_DEBUG_FAKETIME
+   ```
+
+2. Determine the `UDate` value (the time value in milliseconds ± Midnight, Jan 1,
+   1970 GMT) which you want to use as the target. For this sample we will use
+   the value `28800000`, which is Midnight, Pacific Standard Time 1/1/1970.
+3. Set the environment variable `U_FAKETIME_START=28800000`
+4. Now, the first time ICU checks the current time, it will start at midnight
+   1/1/1970 (pacific time) and roll forward. So, at the end of 10 seconds of
+   program runtime, the clock will appear to be at 12:00:10.
+5. You can test this by running the utility '`icuinfo -m`' which will print out
+   the 'Milliseconds since Epoch'.
+6. You can also test this by running the cintltest test
+   `/tsformat/ccaltst/TestCalendar` in verbose mode which will print out the
+   current time:
+
+   ```Shell
+   $ make check ICUINFO_OPTS=-m U_FAKETIME_START=28800000 CINTLTST_OPTS=-v
+   /tsformat/ccaltst/TestCalendar
+   U_DEBUG_FAKETIME was set at compile time, so the ICU clock will start at a
+   preset value
+   env variable U_FAKETIME_START=28800000 (28800000) for an offset of
+   -1281957858861 ms from the current time 1281986658861
+   PASS: The current date and time fetched is Thursday, January 1, 1970 12:00:00
+   ```
+
+## C: Threading Tests
+
+Threading tests for ICU4C functions should be placed in under utility /
+`MultithreadTest`, in the file `intltest/tsmthred.h` and `.cpp`. See the existing
+tests in this file for examples.
+
+Tests from this location are automatically run under the [Thread
+Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)
+(TSAN) in the ICU continuous build system. TSAN will reliably detect race
+conditions that could possibly occur, however improbable that occurrence might
+be normally.
+
+Data races are one of the most common and hardest to debug types of bugs in
+concurrent systems. A data race occurs when two threads access the same variable
+concurrently and at least one of the accesses is write. The C++11 standard
+officially bans data races as undefined behavior.
+
+## Binary Data Formats
+
+ICU services rely heavily on data to perform their functions. Such data is
+available in various more or less structured text file formats, which make it
+easy to update and maintain. For high runtime performance, most data items are
+pre-built into binary formats, i.e., they are parsed and processed once and then
+stored in a format that is used directly during processing.
+
+Most of the data items are pre-built into binary files that are then installed
+on a user's machine. Some data can also be built at runtime but is not
+persistent. In the latter case, a master object should be built once and then
+cloned to avoid the multiple parsing, processing, and building of the same data.
+
+Binary data formats for ICU must be portable across platforms that share the
+same endianness and the same charset family (ASCII vs. EBCDIC). It would be
+possible to handle data from other platform types, but that would require
+load-time or even runtime conversion.
+
+### Data Types
+
+Binary data items are memory-mapped, i.e., they are used as readonly, constant
+data. Their structures must be portable according to the criteria above and
+should be efficiently usable at runtime without building additional runtime data
+structures.
+
+Most native C/C++ data types cannot be used as part of binary data formats
+because their sizes are not fixed across compilers. For example, an int could be
+16/32/64 or even any other number of bits wide. Only types with absolutely known
+widths and semantics must be used.
+
+Use for example:
+
+* `uint8_t`, `uint16_t`, `int32_t` etc.
+* `UBool`: same as `int8_t`
+* `UChar`: for 16-bit Unicode strings
+* `UChar32`: for Unicode code points
+* `char`: for "invariant characters", see `utypes.h`
+
+> :point_right: **Note**: ICU assumes that `char` is an 8-bit byte but makes no
+assumption about its signedness.
+
+**Do not use** for example:
+
+* `short`, `int`, `long`, `unsigned int` etc.: undefined widths
+* `float`, `double`: undefined formats
+* `bool`: undefined width and signedness
+* `enum`: undefined width and signedness
+* `wchar_t`: undefined width, signedness and encoding/charset
+
+Each field in a binary/mappable data format must be aligned naturally. This
+means that a field with a primitive type of size n bytes must be at an n-aligned
+offset from the start of the data block. `UChar` must be 2-aligned, `int32_t` must
+be 4-aligned, etc.
+
+It is possible to use struct types, but one must make sure that each field is
+naturally aligned, without possible implicit field padding by the compiler —
+assuming a reasonable compiler.
+
+```C++
+// bad because i will be preceded by compiler-dependent padding
+// for proper alignment
+struct BadExample {
+    UBool flag;
+    int32_t i;
+};
+
+// ok with explicitly added padding or generally conscious
+// sequence of types
+struct OKExample {
+    UBool flag;
+    uint8_t pad[3];
+    int32_t i;
+};
+```
+
+Within the binary data, a `struct` type field must be aligned according to its
+widest member field. The struct `OKExample` must be 4-aligned because it contains
+an `int32_t` field. Make padding explicit via additional fields, rather than
+letting the compiler choose optional padding.
+
+Another potential problem with `struct` types, especially in C++, is that some
+compilers provide RTTI for all classes and structs, which inserts a `_vtable`
+pointer before the first declared field. When using `struct` types with
+binary/mappable data in C++, assert in some place in the code that `offsetof` the
+first field is 0. For an example see the genpname tool.
+
+### Versioning
+
+ICU data files have a `UDataHeader` structure preceding the actual data. Among
+other fields, it contains a `formatVersion` field with four parts (one `uint8_t`
+each). It is best to use only the first (major) or first and second
+(major/minor) fields in the runtime code to determine binary compatibility,
+i.e., reject a data item only if its `formatVersion` contains an unrecognized
+major (or major/minor) version number. The following parts of the version should
+be used to indicate variations in the format that are backward compatible, or
+carry other information.
+
+For example, the current `uprops.icu` file's `formatVersion` (see the genprops tool
+and `uchar.c`/`uprops.c`) is set to indicate backward-incompatible changes with the
+major version number, backward-compatible additions with the minor version
+number, and shift width constants for the `UTrie` data structure in the third and
+fourth version numbers (these could change independently of the `uprops.icu`
+format).
+
+## C/C++ Debugging Hints and Tips
+
+### Makefile-based platforms
+
+* use `Makefile.local` files (override of `Makefile`), or `icudefs.local` (at the
+  top level, override of `icudefs.mk`) to avoid the need to modify
+  change-controlled source files with debugging information.
+  * Example: **`CPPFLAGS+=-DUDATA_DEBUG`** in common to enable data
+    debugging
+  * Example: **`CINTLTST_OPTS=/tscoll`** in the cintltst directory provides
+    arguments to the cintltest test upon make check, to only run collation
+    tests.
+    * intltest: INTLTEST_OPTS
+    * cintltst: CINTLTST_OPTS
+    * iotest: IOTEST_OPTS
+    * icuinfo: ICUINFO_OPTS
+    * (letest does not have an OPTS variable as of ICU 4.6.)
+
+### Windows/Microsoft Visual Studio
+
+The following addition to autoexp.dat will cause **`UnicodeString`**s to be
+visible as strings in the debugger without expanding sub-items:
+
+```Text
+;; Copyright (C) 2010 IBM Corporation and Others. All Rights Reserved.
+;; ICU Additions
+;; Add to {VISUAL STUDIO} \Common7\Packages\Debugger\autoexp.dat
+;;   in the [autoexpand] section just before the final [hresult] section.
+;;
+;; Need to change 'icu_##' to the current major+minor (so icu_46 for 4.6.1 etc)
+
+icu_46::UnicodeString {
+    preview        (
+              #if($e.fFlags & 2)   ; stackbuffer
+               (
+                  #(
+                "U= '",
+                [$e.fUnion.fStackBuffer, su],
+                "', len=",
+                [$e.fShortLength, u]
+                ;[$e.fFields.fArray, su]
+               )
+              )
+              #else
+               (
+                  #(
+                "U* '",
+                [$e.fUnion.fFields.fArray, su],
+                "', len=",
+                [$e.fShortLength, u]
+                ;[$e.fFields.fArray, su]
+               )
+              )
+            )
+
+    stringview    (
+              #if($e.fFlags & 2)   ; stackbuffer
+               (
+                  #(
+                "U= '",
+                [$e.fUnion.fStackBuffer, su],
+                "', len=",
+                [$e.fShortLength, u]
+                ;[$e.fFields.fArray, su]
+               )
+              )
+              #else
+               (
+                  #(
+                "U* '",
+                [$e.fUnion.fFields.fArray, su],
+                "', len=",
+                [$e.fShortLength, u]
+                ;[$e.fFields.fArray, su]
+               )
+              )
+            )
+
+}
+;;;
+;;; End ICU Additions
+;;;
+```
diff --git a/docs/userguide/dev/contributions.md b/docs/userguide/dev/contributions.md
new file mode 100644
index 0000000..64eb828
--- /dev/null
+++ b/docs/userguide/dev/contributions.md
@@ -0,0 +1,122 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Contributions to the ICU library
+
+## Why Contribute?
+
+ICU is an open source library that is a de-facto industry standard for
+internationalization libraries. Our goal is to provide top of the line i18n
+support on all widely used platforms. By contributing your code to the ICU
+library, you will get the benefit of continuing improvement by the ICU team and
+the community, as well as testing and multi-platform portability. In addition,
+it saves you from having to re-merge your own additions into ICU each time you
+upgrade to a new ICU release.
+
+## Current Process
+
+See <http://site.icu-project.org/processes/contribute>.
+
+## Historical
+
+### Legal Issues ICU 1.8.1-57
+
+The following process was in place up to ICU 57, when the old ICU license was
+used.
+
+### Old Process
+
+In order for your code to be contributed, you need to assign to IBM joint
+copyright ownership in the contribution. You retain joint ownership in the
+contribution without restriction. (For the complete set of terms, please see the
+forms mentioned below.)
+
+The sections below describe two processes, for one-time and ongoing
+contributors. In either case, please complete the form(s) electronically and
+send it/them to IBM for review. After review by IBM, please print and sign the
+form(s), send it/them by mail, and send the code. The code will then be
+evaluated.
+
+Please consult a legal representative if you do not understand the implications
+of the copyright assignment.
+
+### One-Time Contributors
+
+If you would like to make a contribution only once or infrequently, please use
+the *Joint Copyright Assignment - One-time Contribution* form.
+(<https://github.com/unicode-org/icu-docs/blob/master/legal/contributions/Copyright_Assignment.rtf>).
+The contribution will be identified by a bug ID which is unique to the
+contribution and entered into the form. Therefore, please make sure that there
+is an appropriate bug (or Request For Enhancement) in the ICU bug database, or
+submit one.
+
+The code contribution will be checked into a special part of the ICU source code
+repository and evaluated. The ICU team may request updates, for example for
+better conformance with the ICU [design](../design.md) principles,
+[coding](codingguidelines.md) and testing guidelines, or performance. (See also
+the Requirements (§) above.) Such updates can be contributed without exchanging
+another form: An ICU team member commits related materials into the ICU source
+code repository using the same bug ID that was entered into the copyright
+assignment form.
+
+### Ongoing Contributors
+
+If you are interested in making frequent contributions to ICU, then the ICU
+Project Management Committee may agree to invite you as an ongoing contributor.
+Ongoing contributors may be individuals but are more typically expected to be
+companies with one or more people ("authors") writing different parts of one or
+more contributions.
+
+In this case, the relationship between the contributor and the ICU team is much
+closer: One or more authors belonging to the contributor will have commit access
+to the ICU source code repository. With this direct access come additional
+responsibilities including an understanding that the contributor will work to
+follow the technical Requirements (§) above for contributions, and agreement to
+adhere to the terms of the copyright assignment forms for all future
+contributions.
+
+The process for ongoing contributors involves two types of forms: Initially, and
+only once, an ongoing contributor submits a *Joint Copyright Assignment by
+Ongoing Contributor* form, agreeing to essentially the same terms as in the
+one-time contributor form, for all future contributions. (See the form at
+<https://github.com/unicode-org/icu-docs/blob/master/legal/contributions/Copyright_Assignment_ongoing.rtf>).
+
+The contributor must also send another form, *Addendum to Joint Copyright
+Assignment by Ongoing Contributor: Authors*, for the initial set and each
+addition of authors to ICU contributions, **before** any contributions from
+these authors are committed into the ICU source code repository. (Only new,
+additional authors need to be listed on each such form.) The contributor agrees
+to ensure that all of these authors agree to adhere to the terms of the
+associated *Joint Copyright Assignment by Ongoing Contributor Agreement*. (See
+the Authors Addendum form at
+<https://github.com/unicode-org/icu-docs/blob/master/legal/contributions/Copyright_Assignment_authors.rtf>).
+
+Some of an ongoing contributor's authors will have commit access to the ICU
+source code repository. Their committer IDs need to be established before
+completing the Authors Addendum form, so that these committer IDs can be entered
+there. (The committer IDs should be activated only after the form is received.)
+
+Committer authors commit materials directly into the appropriate parts of the
+ICU source code repository. Contributions from an ongoing contributor are
+identified by their association with the contributor's committer IDs.
+
+### Previous Contributions
+
+All previous "one-off" contributions from non-IBM sources to ICU are listed on
+the code contributions page in ICU's source code repository. The page contains
+links to the softcopies of the Joint Copyright Assignment forms. See
+<https://htmlpreview.github.io/?https://github.com/unicode-org/icu-docs/blob/master/legal/contributions/code_contributions.html>
+
+In addition, the following non-IBM companies are registered as Ongoing
+Contributors:
+
+*   Apple
+*   Google
+
+See the repository folder that contains the contributions page for the full set
+of softcopies of contributor agreements including one-off contributions,
+ongoing-contributor agreements and author-addendum documents to
+ongoing-contributor agreements:
+<https://github.com/unicode-org/icu-docs/tree/master/legal/contributions>
diff --git a/docs/userguide/dev/index.md b/docs/userguide/dev/index.md
new file mode 100644
index 0000000..04a72c3
--- /dev/null
+++ b/docs/userguide/dev/index.md
@@ -0,0 +1,15 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Development
+
+Top-level page for topics for ICU developers. See the subpages listed below for
+details:
+
+[Coding Guidelines](codingguidelines.md)
+
+[Contributions to the ICU library](contributions.md)
+
+[Synchronization Issues](sync/index.md)
\ No newline at end of file
diff --git a/docs/userguide/dev/sync/custom.md b/docs/userguide/dev/sync/custom.md
new file mode 100644
index 0000000..d8b411a
--- /dev/null
+++ b/docs/userguide/dev/sync/custom.md
@@ -0,0 +1,226 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Custom ICU4C Synchronization
+
+### Build Time User Provided Synchronization
+
+Build time user synchronization provides a mechanism for platforms with special
+requirements to provide their own mutex and one-time initialization
+implementations to ICU. This facility was introduced in ICU 53. It may change
+over time.
+
+The alternative implementations are compiled directly into the ICU libraries.
+Alternative implementations cannot be plugged in at run time.
+
+The tables below show the items that must be defined by a custom ICU
+synchronization implementation. The list includes both functions that are used
+throughout ICU code and additional functions are for internal by other ICU
+synchronization primitives.
+
+**Low Level Atomics**, a set of platform or compiler dependent typedefs and
+inlines. Provided in the internal header file
+[umutex.h](../../../../icu4c/source/common/umutex.h).
+
+| Type/Function                                         | Description                                                                 |
+|-------------------------------------------------------|-----------------------------------------------------------------------------|
+| typedef u_atomic_int32_t                              | A 32 bit integer that will work with low level atomic operations. (typedef) |
+| umtx_loadAcquire(u_atomic_int32_t &var)               |                                                                             |
+| umtx_storeRelease(u_atomic_int32_t &var, int32_t val) |                                                                             |
+| umtx_atomic_inc(u_atomic_int32_t &var)                |                                                                             |
+| umtx_atomic_dec(u_atomic_int32_t &var)                |                                                                             |
+
+**Mutexes**. Type declarations for ICU mutex wrappers. Provided in a header file.
+
+| Type                | Description                                                                                       |
+|---------------------|---------------------------------------------------------------------------------------------------|
+| struct UMutex       | An ICU mutex. All instances will be static. Typically just contains an underlying platform mutex. |
+| U_MUTEX_INITIALIZER | A C style initializer for a static instance of a UMutex.                                          |
+
+**Mutex and InitOnce implementations**. Out-of-line platform-specific code.
+Provided in a .cpp file.
+
+| Function                              | Description                              |
+|---------------------------------------|------------------------------------------|
+| umtx_lock(UMutex *mutex)              | Lock a mutex.                            |
+| umtx_unlock(UMutex* mutex)            | Unlock a mutex.                          |
+| umtx_initImplPreInit(UInitOnce &uio)  | umtx_initOnce() implementation function. |
+| umtx_initImplPostInit(UInitOnce &uio) | umtx_initOnce() implementation function. |
+
+`UInitOnce` and `umtx_initOnce()` are used internally by ICU for thread-safe
+one-time initialization. Their implementation is split into a
+platform-independent part (contained in
+[umutex.h](../../../../icu4c/source/common/umutex.h)),
+and the pair of platform-dependent implementation functions listed above.
+
+**Build Setup**
+
+Compiler preprocessor variables are used to name the custom files to be included
+in the ICU build. If defined, the files are included at the top of the normal
+platform `#ifdef` chains in the ICU sources, and effectively define a new
+platform.
+
+| Macro            | Description                                           |
+|------------------|-------------------------------------------------------|
+| U_USER_ATOMICS_H | Set to the name of the low level atomics header file. |
+| U_USER_MUTEX_H   | Mutexes header file.                                  |
+| U_USER_MUTEX_CPP | Mutexes and InitOnce implementation file.             |
+
+It is possible (and reasonable) to supply only the two mutex files, while
+retaining the ICU default implementation for the low level atomics.
+
+Example ICU configure with user mutexes specified:
+
+    CPPFLAGS='-DU_USER_ATOMICS_H=atomic_c11.h -DU_USER_MUTEX_H=mutex_c11.h -DU_USER_MUTEX_CPP=mutex_c11.cpp' ./runConfigureICU --enable-debug Linux
+
+**Stability**
+
+This interface may change between ICU releases. The required set of functions
+may be be extended, or details of the behavior required may be altered.
+
+The types and functions defined by this interface reach deeply into the ICU
+implementation, and we need to retain the ability to make changes should the
+need arise.
+
+**Examples**
+
+The code below shows a complete set of ICU user synchronization files.
+
+This implementation uses C++11 language mutexes and atomics. These make for a
+convenient reference implementation because the C++11 constructs are well
+defined and straight forward to use.
+
+Similar implementations for POSIX and WIndows can be found in files
+`common/umutex.h` and `common/umutex.cpp`, in the platform `#ifdef` chains; these are
+part of the standard ICU distribution.
+
+**Mutex Header**
+```c++
+// Example of an ICU build time customized mutex header.
+//
+// Must define struct UMutex and an initializer that will work with static instances.
+// All UMutex instances in ICU code will be static.
+
+#ifndef ICU_MUTEX_C11_H
+#define ICU_MUTEX_C11_H
+#include <mutex>
+#include <condition_variable>
+struct UMutex {
+    std::mutex fMutex;
+};
+#define U_MUTEX_INITIALIZER {}
+#endif
+```
+
+**Atomics Header**
+```c++
+#include <atomic>
+
+typedef std::atomic<int32_t> u_atomic_int32_t;
+#define ATOMIC_INT32_T_INITIALIZER(val) ATOMIC_VAR_INIT(val)
+
+inline int32_t umtx_loadAcquire(u_atomic_int32_t &var) {
+    return var.load(std::memory_order_acquire);
+}
+
+inline void umtx_storeRelease(u_atomic_int32_t &var, int32_t val) {
+    var.store(val, std::memory_order_release);
+}
+
+inline int32_t umtx_atomic_inc(u_atomic_int32_t &var) {
+    return var.fetch_add(1) + 1;
+}
+
+inline int32_t umtx_atomic_dec(u_atomic_int32_t &var) {
+    return var.fetch_sub(1) - 1;
+}
+```
+
+**Mutex and InitOnce implementations**
+```c++
+//
+// Example ICU build time custom mutex cpp file.
+//
+// Must implement these functions:
+// umtx_lock(UMutex *mutex);
+// umtx_unlock(UMutex *mutex);
+// umtx_initImplPreInit(UInitOnce &uio);
+// umtx_initImplPostInit(UInitOnce &uio);
+
+U_CAPI void U_EXPORT2
+umtx_lock(UMutex *mutex) {
+    if (mutex == NULL) {
+        // Note: globalMutex is pre-defined in the platform-independent ICU code.
+        mutex = &globalMutex;
+    }
+    mutex->fMutex.lock();
+}
+
+U_CAPI void U_EXPORT2
+umtx_unlock(UMutex* mutex) `{
+    if (mutex == NULL) {
+        mutex = &globalMutex;
+    }
+    mutex->fMutex.unlock();
+}
+
+// A mutex and a condition variable are used by the implementation of umtx_initOnce()
+// The mutex is held only while the state of the InitOnce object is being changed or
+// tested. It is not held while initialization functions are running.
+// Threads needing to block, waiting for an initialization to complete, will wait
+// on the condition variable.
+// All InitOnce objects share a common mutex and condition variable. This means that
+// all blocked threads will wake if any (possibly unrelated) initialization completes.
+// Which does no harm, it should be statistically rare, and any spuriously woken
+// threads will check their state and promptly wait again.
+
+static std::mutex initMutex;
+static std::condition_variable initCondition;
+
+// This function is called from umtx_initOnce() when an initial test of a UInitOnce::fState flag
+// reveals that initialization has not completed, that we either need to call the
+// function on this thread, or wait for some other thread to complete the initialization.
+//
+// The actual call to the init function is made inline by template code
+// that knows the C++ types involved. This function returns TRUE if
+// the inline code needs to invoke the Init function, or FALSE if the initialization
+// has completed on another thread.
+//
+// UInitOnce::fState values:
+// 0: Initialization has not yet begun.
+// 1: Initialization is in progress, not yet complete.
+// 2: Initialization is complete.
+//
+UBool umtx_initImplPreInit(UInitOnce &uio) {
+    std::unique_lock<std::mutex> initLock(initMutex);
+    int32_t state = uio.fState;
+    if (state == 0) {
+        umtx_storeRelease(uio.fState, 1);
+        return TRUE; // Caller will next call the init function.
+    } else {
+        while (uio.fState == 1) {
+            // Another thread is currently running the initialization.
+            // Wait until it completes.
+            initCondition.wait(initLock);
+        }
+        U_ASSERT(uio.fState == 2);
+        return FALSE;
+    }
+}
+
+// This function is called from umtx_initOnce() just after an initializationfunction completes.
+// Its purpose is to set the state of the UInitOnce object to initialized, and to
+// unblock any threads that may be waiting on the initialization.
+//
+// Some threads may be waiting on the condition variable, requiring the notify_all().
+// Some threads may be racing to test the fState flag outside of the mutex, 
+// requiring the use of store-release when changing its value.
+
+void umtx_initImplPostInit(UInitOnce &uio) {
+    std::unique_lock<std::mutex> initLock(initMutex);
+    umtx_storeRelease(uio.fState, 2);
+    initCondition.notify_all();
+}
+```
diff --git a/docs/userguide/dev/sync/index.md b/docs/userguide/dev/sync/index.md
new file mode 100644
index 0000000..fd4abb6
--- /dev/null
+++ b/docs/userguide/dev/sync/index.md
@@ -0,0 +1,71 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Synchronization Issues
+
+## Overview
+
+ICU is designed for use in multi-threaded environments. Guidelines for
+developers using ICU are in the [ICU Design](../../design.md) section of the
+user guide.
+
+Within the ICU implementation, access to shared or global data sometimes must be
+protected in order to provide the threading model promised by the ICU design.
+The information on this page is intended for developers of ICU library code
+itself.
+
+ICU4J uses normal JDK synchronization services.
+
+ICU4C faces a more difficult problem, as there is no standard, fully portable
+set of C or C++ synchronization primitives. Internally, ICU4C provides a small
+set of synchronization operations, and requires that all synchronization needed
+within the ICU library code be implemented using them.
+
+The ICU4C synchronization primitives are for internal use only; they are not
+exported as API to normal users of ICU.
+
+ICU provides implementations of its synchronization functions for Windows, POSIX
+and C++11 platforms, and provides a build-time interface to allow [custom
+implementations](custom.md) for other platforms.
+
+## ICU4C Synchronization Primitives
+
+The functions and types listed below are intended for use throughout the ICU
+library code, where ever synchronization is required. They are defined in the
+internal header
+[umutex.h](../../../../icu4c/source/common/umutex.h).
+
+All synchronization within ICU4C implementation code must use these, and avoid
+direct use of functions provided by a particular operating system or compiler.
+
+For examples of use, search the ICU library code.
+
+**Low Level Atomics**
+
+| Type/Function                          | Description                                                     |
+|----------------------------------------|-----------------------------------------------------------------|
+| typedef u_atomic_int32_t               | A 32 bit integer type for use with low level atomic operations. |
+| umtx_atomic_inc(u_atomic_int32_t &var) |                                                                 |
+| umtx_atomic_dec(u_atomic_int32_t &var) |                                                                 |
+
+**Mutexes**
+
+| Type/Function              | Description                                                        |
+|----------------------------|--------------------------------------------------------------------|
+| struct UMutex              | An ICU mutex. All instances must be static.                        |
+| U_MUTEX_INITIALIZER        | A C style initializer for a UMutex.                                |
+| umtx_lock(UMutex *mutex)   | Lock a mutex.                                                      |
+| umtx_unlock(UMutex* mutex) | Unlock a mutex.                                                    |
+| class Mutex                | C++ Mutex wrapper withautomatic lock & unlock. See header mutex.h. |
+
+**One Time Initialization**
+
+| Type/Function                 | Description                                                                             |
+|-------------------------------|-----------------------------------------------------------------------------------------|
+| struct UInitOnce              | Provides an efficient facility for one-time initialization of static or global objects. |
+| umtx_initOnce(UInitOnce, ...) | A family of initialization functions.                                                   |
+
+All of these functions are for internal ICU implementation use only. They are
+not exported, and not intended for external use.
diff --git a/docs/userguide/editing.md b/docs/userguide/editing.md
new file mode 100644
index 0000000..e856b2b
--- /dev/null
+++ b/docs/userguide/editing.md
@@ -0,0 +1,84 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Editing the ICU User Guide
+
+## Overview
+
+> :construction: **TODO**: Adjust this page for use of GitHub Markdown (since 2020)
+rather than Google Sites.
+See the [migration page](https://docs.google.com/document/d/1uK91cxv1amCrt75TBw1PlCC5wZhJH_w3dW_8unWL9EQ/edit)
+for details and tips.
+
+This version of the ICU User Guide is maintained via Google Sites. The Site
+address is <http://sites.google.com/site/icuprojectuserguide/>
+
+Editors are also usually ICU committers. Edit rights are granted by other Site
+owners and collaborators.
+
+The change from editing of Open Office Writer documents and generating HTML and
+PDF to editing a Google Site simplifies the User Guide maintenance and
+encourages us to keep it more up to date than before, at the cost of not being
+able to easily generate a single PDF document with the entire contents.
+
+## Document Structure
+
+Major chapters have Introduction pages, and further sections in a chapter are
+subpages of that main chapter page. The navigation bar is a manually edited
+sidebar accessible (if you are logged in and have edit rights) from Site
+settings/Change appearance.
+
+Page URLs should use lowercase letters and no hyphens.
+
+See the sitemap linked from the bottom of the navigation bar.
+
+Most pages have an automatic Table of contents. On a new page, after entering
+some contents, return to the very top of the page contents, select Insert/Table
+of contents, save, then change it to Right-aligned and turn on Wrap.
+
+## Common Styles
+
+We want to use common styles for code samples, notes and such. Since Google
+Sites does not offer a site-wide CSS style sheet, please copy special items from
+here, paste and modify their text, rather than creating them from scratch.
+
+For headings, and for standard text styles like **bold**, *italic*,
+~~strike-through~~, ... please use standard headings styles from Sites.
+
+### Code
+
+**New:** Use the Format menu styles for Code (inline) and Blockquote Code
+(multi-line).
+
+**Obsolete:**
+
+For inline class/type/function/constant names and similar use Sites' Courier New
+font which is close enough to the Courier font we used to use.
+
+For a block of code, please copy/paste the following and edit its contents:
+
+U16_NEXT(s, i, length, c)
+U16_PREV(s, start, i, c)
+U16_APPEND(s, i, length, c, isError)
+
+### Notes
+
+*Endianness is not an issue on this level because the interpretation of an
+integer is fixed within any given platform.*
+
+## Bookmarks & Links
+
+For internal links, please select the Sites page as a destination rather than
+specifying the full URL as a generic web link.
+Unfortunately, Sites makes it hard to define an anchor on a page and create a
+link to that specific anchor (whether from the same page or another one).
+
+*   For links to a specific section on the same page, please remove the link,
+    underline the former link text, and put "(§)" right after it.
+*   For links to a specific section on another page, just link to the page and
+    name the section. Please also put "(§)" right after it.
+
+If and when Sites offers a reasonable way of defining anchors and linking to
+them, we can search our pages for "(§)" and fix the links.
diff --git a/docs/userguide/format_parse/datetime/examples.md b/docs/userguide/format_parse/datetime/examples.md
new file mode 100644
index 0000000..fb0753e
--- /dev/null
+++ b/docs/userguide/format_parse/datetime/examples.md
@@ -0,0 +1,277 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Date and Time Formatting Examples
+
+## Format
+
+The ICU DateFormat interface enables you to format a date in milliseconds into a
+string representation of the date. Also, the interface enables you to parse the
+string back to the internal date representation in milliseconds.
+
+### C++
+
+```cpp
+DateFormat* df = DateFormat::createDateInstance();
+UnicodeString myString;
+UDate myDateArr[] = { 0.0, 100000000.0, 2000000000.0 }; 
+for (int32_t i = 0; i < 3; ++i) {
+  myString.remove();
+  cout << df->format( myDateArr[i], myString ) << endl;
+}
+```
+
+### C
+
+```c
+/* 1st example: format the dates in millis 100000000 and 2000000000 */
+UErrorCode status=U_ZERO_ERROR;
+int32_t i, myStrlen=0;
+UChar* myString;
+UDate myDateArr[] = { 0.0, 100000000.0, 2000000000.0 }; // test values
+UDateFormat* df = udat_open(UCAL_DEFAULT, UCAL_DEFAULT, NULL, "GMT", &status);
+for (i = 0; i < 3; ++i) {
+  myStrlen = udat_format(df, myDateArr[i], NULL, myStrlen, NULL, &status);
+  if(status==U_BUFFER_OVERFLOW_ERROR){
+    status=U_ZERO_ERROR;
+    myString=(UChar*)malloc(sizeof(UChar) * (myStrlen+1) );
+    udat_format(df, myDateArr[i], myString, myStrlen+1, NULL, &status);
+    printf("%s\n", austrdup(myString) ); 
+    /* austrdup( a function used to convert UChar* to char*) */
+    free(myString);
+  }
+}
+```
+
+## Parse
+
+To parse a date for a different locale, specify it in the locale call. This call
+creates a formatting object.
+
+### C++
+
+```cpp
+DateFormat* df = DateFormat::createDateInstance
+  ( DateFormat::SHORT, Locale::getFrance());
+```
+
+### C
+
+```c
+/* 2nd example: parse a date with short French date/time formatter */
+UDateFormat* df = udat_open(UDAT_SHORT, UDAT_SHORT, "fr_FR", "GMT", &status);
+UErrorCode status = U_ZERO_ERROR;
+int32_t parsepos=0;     
+UDate myDate = udat_parse(df, myString, u_strlen(myString), &parsepos,
+&status);
+```
+
+### Java
+
+```java
+import java.text.FieldPosition;
+import java.text.ParseException;
+import java.util.Calendar;
+import java.util.Date;
+import java.util.Locale;
+
+import com.ibm.icu.text.DateFormat;
+
+public class TestDateTimeFormat {
+    public void run() {
+
+        // Formatting Dates
+
+        DateFormat dfUS = DateFormat.getDateInstance(DateFormat.FULL, Locale.US);
+        DateFormat dfFrance = DateFormat.getDateInstance(DateFormat.FULL, Locale.FRANCE);
+        StringBuffer sb = new StringBuffer();
+        Calendar c = Calendar.getInstance();
+        Date d = c.getTime();
+        sb = dfUS.format(d, sb, new FieldPosition(0));
+        System.out.println(sb.toString());
+
+        StringBuffer sbf = new StringBuffer();
+        sbf = dfFrance.format(d, sbf, new FieldPosition(0));
+        System.out.println(sbf.toString());
+
+        StringBuffer sbg = new StringBuffer();
+        DateFormat dfg = DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.SHORT);
+        FieldPosition pos = new FieldPosition(DateFormat.MINUTE_FIELD);
+        sbg = dfg.format(d, sbg, pos);
+        System.out.println(sbg.toString());
+        System.out.println(sbg.toString().substring(pos.getBeginIndex(), pos.getEndIndex()));
+
+        // Parsing Dates
+
+        String dateString_US = "Thursday, February 7, 2008";
+        String dateString_FRANCE = "jeudi 7 février 2008";
+        try {
+            Date parsedDate_US = dfUS.parse(dateString_US);
+            Date parsedDate_FRANCE = dfFrance.parse(dateString_FRANCE);
+            System.out.println(parsedDate_US.toString());
+            System.out.println(parsedDate_FRANCE.toString());
+        } catch (ParseException pe) {
+            System.out.println("Exception while parsing :" + pe);
+        }
+    }
+
+    public static void main(String args[]) {
+        new TestDateTimeFormat().run();
+    }
+}
+```
+
+## Getting Specific Date Fields
+
+To get specific fields of a date, you can use the FieldPosition function for C++
+or UFieldPosition function for C.
+
+### C++
+
+```cpp
+UErrorCode status = U_ZERO_ERROR;
+FieldPosition pos(DateFormat::YEAR_FIELD)
+UDate myDate = Calendar::getNow();
+UnicodeString str;
+DateFormat* df = DateFormat::createDateInstance
+  ( DateFormat::LONG, Locale::getFrance());
+
+df->format(myDate, str, pos, status);
+cout << pos.getBeginIndex() << "," << pos. getEndIndex() << endl;
+```
+
+### C
+
+```c
+UErrorCode status = U_ZERO_ERROR;
+UFieldPosition pos;
+UChar *myString;
+int32_t myStrlen = 0;
+char buffer[1024];
+
+
+pos.field = 1; /* Same as the DateFormat::EField enum */
+UDateFormat* dfmt = udat_open(UCAL_DEFAULT, UCAL_DEFAULT, NULL, "PST",
+&status);
+myStrlen = udat_format(dfmt, myDate, NULL, myStrlen, &pos, &status);
+if (status==U_BUFFER_OVERFLOW_ERROR){
+    status=U_ZERO_ERROR;
+    myString=(UChar*)malloc(sizeof(UChar) * (myStrlen+1) );
+    udat_format(dfmt, myDate, myString, myStrlen+1, &pos, &status);
+}
+printf("date format: %s\n", u_austrcpy(buffer, myString));
+buffer[pos.endIndex] = 0;   // NULL terminate the string.
+printf("UFieldPosition position equals %s\n", &buffer[pos.beginIndex]);
+```
+
+## DateTimePatternGenerator
+
+This class lets you get a different variety of patterns, such as month+day. The
+following illustrates this in Java, C++ and C.
+
+### Java
+
+```java
+// set up the generator
+DateTimePatternGenerator generator
+    = DateTimePatternGenerator.getInstance(locale);
+
+// get a pattern for an abbreviated month and day
+final String pattern = generator.getBestPattern("MMMd");
+SimpleDateFormat formatter = new SimpleDateFormat(pattern, locale);
+
+// use it to format (or parse)
+String formatted = formatter.format(new Date());
+// for French, the result is "13 sept."
+```
+
+### C++
+
+```cpp
+// set up the generator
+status = U_ZERO_ERROR;
+DateTimePatternGenerator *generator = DateTimePatternGenerator::createInstance( locale, status);
+if (U_FAILURE(status)) {
+    return;
+}
+    
+// get a pattern for an abbreviated month and day
+UnicodeString pattern = generator->getBestPattern(UnicodeString("MMMd"), status); 
+SimpleDateFormat *formatter = new SimpleDateFormat(pattern, locale, status); 
+
+// use it to format (or parse)
+UnicodeString formatted;
+formatted = formatter->format(Calendar::getNow(), formatted, status); 
+// for French, the result is "13 sept."
+```
+
+### C
+
+```c
+const UChar skeleton[]= {'M', 'M', 'M', 'd', 0};
+
+status=U_ZERO_ERROR;    
+generator=udatpg_open(locale, &status);
+if(U_FAILURE(status)) {
+    return;
+
+}
+
+/* get a pattern for an abbreviated month and day */
+length = udatpg_getBestPattern(generator, skeleton, 4,
+                                pattern, patternCapacity, &status);
+formatter = udat_open(UDAT_IGNORE, UDAT_DEFAULT, locale, NULL, -1, 
+                        pattern, length, &status);
+
+/* use it to format (or parse) */
+formattedCapacity = (int32_t)(sizeof(formatted)/sizeof((formatted)[0]));
+resultLen=udat_format(formatter, ucal_getNow(), formatted, formattedCapacity,
+                        NULL, &status);
+/* for French, the result is "13 sept." */
+```
+
+## Changing the TimeZone Formatting Style
+
+It also contains some helper functions for parsing patterns. Here's an example
+of replacing the kind of timezone used in a pattern.
+
+### Java
+
+```cpp
+/**
+ * Replace the zone string with a different type, eg v's for z's, etc.
+ * <p>Called with a pattern, such as one gotten from 
+ * <pre>
+ * String pattern = ((SimpleDateFormat)
+ * DateFormat.getTimeInstance(style, locale)).toPattern();
+ * </pre>
+ * @param pattern original pattern to change, such as "HH:mm zzzz"
+ * @param newZone Must be: z, zzzz, Z, ZZZZ, v, vvvv, V, or VVVV
+ * @return
+ */
+public String replaceZoneString(String pattern, String newZone) {
+    DateTimePatternGenerator.FormatParser formatParser =
+        new DateTimePatternGenerator.FormatParser();
+    final List itemList = formatParser.set(pattern).getItems();
+    boolean found = false;
+    for (int i = 0; i < itemList.size(); ++i) {
+        Object item = itemList.get(i);
+        if (item instanceof VariableField) {
+            // the first character of the variable field determines the type,
+            // according to CLDR.
+            String variableField = item.toString();
+            switch (variableField.charAt(0)) {
+            case 'z': case 'Z': case 'v': case 'V':
+                if (!variableField.equals(newZone)) {
+                    found = true;
+                    itemList.set(i, new VariableField(newZone));
+                }
+                break;
+            }
+        }
+    }
+    return found ? formatParser.toString() : pattern;
+}
+```
diff --git a/docs/userguide/format_parse/datetime/index.md b/docs/userguide/format_parse/datetime/index.md
new file mode 100644
index 0000000..fb2c07c
--- /dev/null
+++ b/docs/userguide/format_parse/datetime/index.md
@@ -0,0 +1,371 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Formatting Dates and Times
+
+## Formatting Dates and Times Overview
+
+Date and time formatters are used to convert dates and times from their internal
+representations to textual form and back again in a language-independent manner.
+The date and time formatters use `UDate`, which is the internal representation.
+Converting from the internal representation (milliseconds since midnight,
+January 1, 1970) to text is known as "formatting," and converting from text to
+milliseconds is known as "parsing." These processes involve two mappings:
+
+*   A mapping between a point in time (UDate) and a set of calendar fields,
+    which in turn depends on:
+    *   The rules of a particular calendar system (e.g. Gregorian, Buddhist,
+        Chinese Lunar)
+    *   The time zone
+*   A mapping between a set of calendar fields and a formatted textual
+    representation, which depends on the fields selected for display, their
+    display style, and the conventions of a particular locale.
+
+## DateFormat
+
+DateFormat helps format and parse dates for any locale. Your code can be
+completely independent of the locale conventions for months, days of the week,
+or calendar format.
+
+### Formatting Dates
+
+The DateFormat interface in ICU enables you to format a Date in milliseconds
+into a string representation of the date. It also parses the string back to the
+internal Date representation in milliseconds.
+
+```cpp
+DateFormat* df = DateFormat::createDateInstance();
+UnicodeString myString;
+UDate myDateArr[] = { 0.0, 100000000.0, 2000000000.0 };
+for (int32_t i = 0; i < 3; ++i) {
+  myString.remove();
+  cout << df->format( myDateArr[i], myString ) << endl;
+}
+```
+
+To format a date for a different Locale, specify it in the call to:
+
+```cpp
+DateFormat* df = DateFormat::createDateInstance
+   ( DateFormat::SHORT, Locale::getFrance());
+```
+
+### Parsing Dates
+
+Use a DateFormat to parse also:
+
+```cpp
+UErrorCode status = ZERO_ERROR;
+UDate myDate = df->parse(myString, status);
+```
+
+When numeric fields abut one another directly, with no intervening delimiter
+characters, they constitute a run of abutting numeric fields. Such runs are
+parsed specially. For example, the format "HHmmss" parses the input text
+"123456" to 12:34:56, parses the input text "12345" to 1:23:45, and fails to
+parse "1234". In other words, the leftmost field of the run is flexible, while
+the others keep a fixed width. If the parse fails anywhere in the run, then the
+leftmost field is shortened by one character, and the entire run is parsed
+again. This is repeated until either the parse succeeds or the leftmost field is
+one character in length. If the parse still fails at that point, the parse of
+the run fails.
+
+### Producing Normal Date Formats for a Locale
+
+Use createDateInstance to produce the normal date format for that country. There
+are other static factory methods available. Use createTimeInstance to produce
+the normal time format for that country. Use createDateTimeInstance to produce a
+DateFormat that formats both date and time. You can pass different options to
+these factory methods to control the length of the result; from SHORT to MEDIUM
+to LONG to FULL. The exact result depends on the locale, but generally:
+
+1.  SHORT is numeric, such as 12/13/52 or 3:30pm
+
+2.  MEDIUM is longer, such as Jan. 12, 1952
+
+3.  LONG is longer, such as January 12, 1952 or 3:30:32pm
+
+4.  FULL is completely specified, such as Tuesday, April 12, 1952 AD or
+    3:30:42pm PST
+
+For more general flexibility, the [DateTimePatternGenerator](index.md) can map a
+custom selection of time and date fields, along with various display styles for
+those fields, to a locale-appropriate format that can then be set as the format
+to use by the DateFormat.
+
+### Producing Relative Date Formats for a Locale
+
+ICU currently provides limited support for formatting dates using a “relative”
+style, specified using RELATIVE_SHORT, RELATIVE_MEDIUM, RELATIVE_LONG. or
+RELATIVE_FULL. As currently implemented, relative date formatting only affects
+the formatting of dates within a limited range of calendar days before or after
+the current date, based on the CLDR `<field type="day">`/`<relative>` data: For
+example, in English, "Yesterday", "Today", and "Tomorrow". Within this range,
+the specific relative style currently makes no difference. Outside of this
+range, relative dates are formatted using the corresponding non-relative style
+(SHORT, MEDIUM, etc.). Relative time styles are not currently supported, and
+behave just like the corresponding non-relative style.
+
+### Setting Time Zones
+
+You can set the time zone on the format. If you want more control over the
+format or parsing, cast the DateFormat you get from the factory methods to a
+SimpleDateFormat. This works for the majority of countries.
+
+> :point_right: **Note**: *Remember to check getDynamicClassID() before carrying out the cast.*
+
+### Working with Positions
+
+You can also use forms of the parse and format methods with ParsePosition and
+FieldPosition to enable you to:
+
+1.  Progressively parse through pieces of a string.
+
+2.  Align any particular field, or find out where it is for selection on the
+    screen.
+
+## SimpleDateFormat
+
+SimpleDateFormat is a concrete class used for formatting and parsing dates in a
+language-independent manner. It allows for formatting, parsing, and
+normalization. It formats or parses a date or time, which is the standard
+milliseconds since 24:00 GMT, Jan. 1, 1970.
+
+SimpleDateFormat is the only built-in implementation of DateFormat. It provides
+a programmable interface that can be used to produce formatted dates and times
+in a wide variety of formats. The formats include almost all of the most common
+ones.
+
+Create a date-time formatter using the following methods rather than
+constructing an instance of SimpleDateFormat. In this way, the program is
+guaranteed to get an appropriate formatting pattern of the locale.
+
+1.  DateFormat::getInstance()
+
+2.  getDateInstance()
+
+3.  getDateTimeInstance()
+
+If you need a more unusual pattern, construct a SimpleDateFormat directly and
+give it an appropriate pattern.
+
+### Date/Time Format Syntax
+
+A date pattern is a string of characters, where specific strings of characters
+are replaced with date and time data from a calendar when formatting or used to
+generate data for a calendar when parsing.
+
+The Date Field Symbol Table below contains the characters used in patterns to
+show the appropriate formats for a given locale, such as yyyy for the year.
+Characters may be used multiple times. For example, if y is used for the year,
+'yy' might produce '99', whereas 'yyyy' produces '1999'. For most numerical
+fields, the number of characters specifies the field width. For example, if h is
+the hour, 'h' might produce '5', but 'hh' produces '05'. For some characters,
+the count specifies whether an abbreviated or full form should be used, but may
+have other choices, as given below.
+
+Two single quotes represents a literal single quote, either inside or outside
+single quotes. Text within single quotes is not interpreted in any way (except
+for two adjacent single quotes). Otherwise all ASCII letter from a to z and A to
+Z are reserved as syntax characters, and require quoting if they are to
+represent literal characters. In addition, certain ASCII punctuation characters
+may become variable in the future (eg ":" being interpreted as the time
+separator and '/' as a date separator, and replaced by respective
+locale-sensitive characters in display).
+
+"Stand Alone" values refer to those designed to stand on their own, as opposed
+to being with other formatted values. "2nd quarter" would use the stand alone
+format (QQQQ), whereas "2nd quarter 2007" would use the regular format (qqqq
+yyyy).
+
+The pattern characters used in the Date Field Symbol Table are defined by CLDR;
+for more information see [CLDR Date Field Symbol Table](https://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table).
+Note that the examples may not reflect current CLDR data.
+
+#### Date Field Symbol Table
+
+| Symbol | Meaning | Pattern | Example Output |
+| --- | --- | --- | --- |
+| G | era designator | G, GG, or GGG<br/>GGGG<br/>GGGGG | AD<br/>Anno Domini<br/>A |
+| y | year | yy<br/>y or yyyy | 96<br/>1996 |
+| Y | year of "Week of Year" | Y | 1997 |
+| u | extended year | u | 4601 |
+| U | cyclic year name, as in Chinese lunar calendar | U | 甲子 |
+| r | related Gregorian year | r | 1996 |
+| Q | quarter | Q<br/>QQ<br/>QQQ<br/>QQQQ<br/>QQQQQ | 2<br/>02<br/>Q2<br/>2nd quarter<br/>2 |
+| q | Stand Alone quarter | q<br/>qq<br/>qqq<br/>qqqq<br/>qqqqq | 2<br/>02<br/>Q2<br/>2nd quarter<br/>2 |
+| M | month in year | M<br/>MM<br/>MMM<br/>MMMM<br/>MMMMM | 9<br/>09<br/>Sep<br/>September<br/>S |
+| L | Stand Alone month in year | L<br/>LL<br/>LLL<br/>LLLL<br/>LLLLL | 9<br/>09<br/>Sep<br/>September<br/>S |
+| w | week of year | w<br/>ww | 27<br/>27 |
+| W | week of month | W | 2 |
+| d | day in month | d<br/>dd | 2<br/>02 |
+| D | day of year | D | 189 |
+| F | day of week in month | F | 2 (2nd Wed in July) |
+| g | modified julian day | g | 2451334 |
+| E | day of week | E, EE, or EEE<br/>EEEE<br/>EEEEE<br/>EEEEEE | Tue<br/>Tuesday<br/>T<br/>Tu |
+| e | local day of week<br/>example: if Monday is 1st day, Tuesday is 2nd ) | e or ee<br/>eee<br/>eeee<br/>eeeee<br/>eeeeee | 2<br/>Tue<br/>Tuesday<br/>T<br/>Tu |
+| c | Stand Alone local day of week | c or cc<br/>ccc<br/>cccc<br/>ccccc<br/>cccccc | 2<br/>Tue<br/>Tuesday<br/>T<br/>Tu |
+| a | am/pm marker | a | pm |
+| h | hour in am/pm (1~12) | h<br/>hh | 7<br/>07 |
+| H | hour in day (0~23) | H<br/>HH | 0<br/>00 |
+| k | hour in day (1~24) | k<br/>kk | 24<br/>24 |
+| K | hour in am/pm (0~11) | K<br/>KK | 0<br/>00 |
+| m | minute in hour | m<br/>mm | 4<br/>04 |
+| s | second in minute | s<br/>ss | 5<br/>05 |
+| S | fractional second - truncates (like other time fields)<br/>to the count of letters when formatting. Appends<br/>zeros if more than 3 letters specified. Truncates at<br/>three significant digits when parsing. | S<br/>SS<br/>SSS<br/>SSSS | 2<br/>23<br/>235<br/>2350 |
+| A | milliseconds in day | A | 61201235 |
+| z | Time Zone: specific non-location | z, zz, or zzz<br/>zzzz | PDT<br/>Pacific Daylight Time |
+| Z | Time Zone: ISO8601 basic hms? / RFC 822<br/>Time Zone: long localized GMT (=OOOO)<br/>TIme Zone: ISO8601 extended hms? (=XXXXX) | Z, ZZ, or ZZZ<br/>ZZZZ<br/>ZZZZZ | -0800<br/>GMT-08:00<br/>-08:00, -07:52:58, Z |
+| O | Time Zone: short localized GMT<br/>Time Zone: long localized GMT (=ZZZZ) | O<br/>OOOO | GMT-8<br/>GMT-08:00 |
+| v | Time Zone: generic non-location<br/>(falls back first to VVVV) | v<br/>vvvv | PT<br/>Pacific Time or Los Angeles Time |
+| V | Time Zone: short time zone ID<br/>Time Zone: long time zone ID<br/>Time Zone: time zone exemplar city<br/>Time Zone: generic location (falls back to OOOO) | V<br/>VV<br/>VVV<br/>VVVV | uslax<br/>America/Los_Angeles<br/>Los Angeles<br/>Los Angeles Time |
+| X | Time Zone: ISO8601 basic hm?, with Z for 0<br/>Time Zone: ISO8601 basic hm, with Z<br/>Time Zone: ISO8601 extended hm, with Z<br/>Time Zone: ISO8601 basic hms?, with Z<br/>Time Zone: ISO8601 extended hms?, with Z | X<br/>XX<br/>XXX<br/>XXXX<br/>XXXXX | -08, +0530, Z<br/>-0800, Z<br/>-08:00, Z<br/>-0800, -075258, Z<br/>-08:00, -07:52:58, Z |
+| x | Time Zone: ISO8601 basic hm?, without Z for 0<br/>Time Zone: ISO8601 basic hm, without Z<br/>Time Zone: ISO8601 extended hm, without Z<br/>Time Zone: ISO8601 basic hms?, without Z<br/>Time Zone: ISO8601 extended hms?, without Z | x<br/>xx<br/>xxx<br/>xxxx<br/>xxxxx | -08, +0530<br/>-0800<br/>-08:00<br/>-0800, -075258<br/>-08:00, -07:52:58 |
+| ' | escape for text | ' | (nothing) |
+| ' ' | two single quotes produce one | ' ' | ' |
+
+> :point_right: **Note**: *Any characters in the pattern that are not in the ranges of
+['a'..'z'] and ['A'..'Z'] will be treated as quoted text. For instance,
+characters like ':', '.', ' ', '#' and '@' will appear in the resulting time
+text even they are not enclosed within single quotes.The single quote is used to
+'escape' letters. Two single quotes in a row, whether inside or outside a quoted
+sequence, represent a 'real' single quote.*
+
+> :point_right: **Note**: *A pattern containing any invalid pattern letter results in a failing UErrorCode
+result during formatting or parsing.*
+
+| Format Pattern | Result |
+| --- | --- |
+| yyyy.MM.dd G 'at' HH:mm:ss zzz | 1996.07.10 AD at 15:08:56 PDT |
+| EEE, MMM d, ''yy | Wed, July 10, '96 |
+| h:mm a | 12:08 PM |
+| hh 'o''clock' a, zzzz | 12 o'clock PM, Pacific Daylight Time |
+| K:mm a, z | 0:00 PM, PST |
+| yyyyy.MMMM.dd GGG hh:mm aaa | 01996.July.10 AD 12:08 PM |
+
+### Time Zone Display Names
+
+ICU supports time zone display names defined by the LDML ([Unicode Locale Data
+Markup Language](http://www.unicode.org/reports/tr35/) ) specification. Since
+ICU 3.8, the vast majority of localized time zone names are no longer associated
+with individual time zones. Instead, a set of localized time zone names are
+associated with a *metazone* and one or more individual time zones are mapped to
+the same *metazone*. For example, *metazone* “America_Pacific” has its own
+display name data such as “PST” “PDT” “PT” “Pacific Standard Time” “Pacific
+Daylight Time” “Pacific Time” and these names are shared by multiple individual
+time zones “America/Los_Angeles”, “America/Vancouver”, “America/Tijuana” and so
+on. The mapping from individual time zone to *metazone* is not a simple 1-to-1
+mapping, but it changes time to time. For example, time zone
+“America/Indiana/Tell_City” uses name data from *metazone* “America_Eastern”
+until April 2, 2006, but it changes to *metazone* “America_Central” after the
+date. So the display name used for “America/Indiana/Tell_City” before the date
+(e.g. “Eastern Time”) differs from the one after the date (e.g. “Central Time).
+
+> :point_right: **Note**: *Prior to ICU 3.8, a localized time zone name (except GMT format) and a time
+zone ID was in 1-to-1 relationship always. Therefore, a time zone name produced
+by DateFormat can be parsed back to the original time zone. This assumption no
+longer applies to ICU 3.8 and later releases for all time zone format types. If
+you program requires to roundtrip specific time zone ID, you must use the
+generic location format (“VVVV”) explained below.*
+
+There are several different display name types available in the LDML
+specification.
+
+#### Time Zone Display Name Types
+
+| Type | Description | Examples |
+| --- | --- | --- |
+| Generic non-location | Reflects wall time, suited for displaying recurring events, meetings or anywhere people do not want to be overly specific. Available in two length options – long and short. | Pacific Time<br/>PT |
+| Generic partial location | Reflects wall time, used as a fallback format when the generic non-location format is not specific enough. A generic partial location name is constructed from a generic non-location name with a location name. For example, “PT” is shared by multiple time zones via metazone “America_Pacific”. When GMT offset in the time zone at the given time differs from the preferred time zone of the metazone for the locale, location name is appended to generic non-location name to distinguish the time zone from the preferred zone. Available in two length options – long and short. | Pacific Time (Canada)<br/>PT (Yellowknife) |
+| Generic location | Reflects wall time, suited for populating choice list for time zones. If the time zone is the single time zone available in the region (country), the generic location name is constructed with the region name. Otherwise, the name is constructed from the region name and the city name. Unlike other format types, this name is unique per time zone. | United States (Los Angeles) Time<br/>Italy Time |
+| Specific non-location | Reflects a specific standard or daylight time. Available in two length options – long and short. | Pacific Standard Time<br/>PDT |
+| Localized GMT | A constant, specific offset from GMT in a localized form. | GMT-08:00 |
+| RFC822 GMT | A constant, specific offset from GMT in a locale insensitive format. | -0800 |
+
+Each format type in the above table is used as a primary type or a fallback in
+SimpleDateFormat. The table below explains how ICU time zone format pattern work
+and its characteristics.
+
+#### Time Zone Pattern Usage
+
+| Pattern | Behavior | Round-trip time at daylight transitions(\*) | Round-trip Time Zone | Suggested Usage |
+| --- | --- | --- | --- | --- |
+| z, zz, zzz | Short specific non-location format (e.g. “PST”).If the localized data is not available or the short abbreviation is not commonly used for the locale, localized GMT format is used (e.g. GMT-08:00). | yes | no | For displaying a time with a user friendly time zone name. |
+| zzzz | Long specific non-location format (e.g. “Pacific Standard Time”).If the localized data is not available, localized GMT format is used (e.g. GMT-08:00). | yes | no | Same as “z”, but longer format. |
+| v | Short generic non-location format (e.g. “PT”).If the localized data is not available or the short abbreviation is not commonly used for the locale, generic location format (e.g. “United States(Los Angeles) Time”) is used.If the localized data comes from metazone and the GMT offset at the given time in the specified time zone differs from the preferred time zone of the metazone for the locale, generic partial location format (e.g. “PT (Canada)”) is used. | no | no | For displaying a recurring wall time (e.g. events, meetings) or anywhere people do not want to be overly specific. |
+| vvvv | Long generic non-location format (e.g. “Pacific Time”).If the localized data is not available, generic location format (e.g. “United States(Los Angeles) Time”) is used. | no | no | Same as “v”, but longer format. |
+| V | Same as “z”, except using the short abbreviation even it is not commonly used for the locale. | yes | no | Same as “z”. |
+| VVVV | Generic location format (e.g. “United States (Los Angeles) Time”). | no | yes | For populating a choice list for time zones, because it supports 1-to-1 name/zone ID mapping and is more uniform than other text formats.Also, this is only the pattern supporting time zone round-trip. If your program requires to preserve the original time zone information, use this pattern. |
+| Z, ZZ, ZZZ | Localized GMT format (e.g. “GMT-08:00”). | yes | no | For displaying a time in UI in a uniformed manner. |
+| ZZZZ | RFC822 GMT format (e.g. “-0800”). | yes | no | For formatting a time for non-user-facing data. |
+
+\* At a transition from daylight saving time to standard time, there is a wall
+time interval occurs twice.
+
+## DateTimePatternGenerator
+
+The DateTimePatternGenerator class provides a way to map a request for a set of
+date/time fields, along with their width, to a locale-appropriate format
+pattern. The request is in the form of a “skeleton” which just contains pattern
+letters for the desired fields using the representation for the desired width.
+In a skeleton, anything other than a pattern letter is ignored, field order is
+insignificant, and there are two special additional pattern letters that may be
+used: 'j' requests the preferred hour-cycle type for the locale (it gets mapped
+to one of 'H', 'h', 'k', or 'K'); 'J' is similar but requests no AM/PM marker
+even if the locale’s preferred hour-cycle type is 'h' or 'K'.
+
+For example, a skeleton of “MMMMdjmm” might result in the following format
+patterns for different locales:
+
+| locale | format pattern for skeleton “MMMMdjmm” | example            |
+| ------ | -------------------------------------- | ------------------ |
+| en_US  | "MMMM d  'at'  h:mm a"                 | April 2 at 5:00 PM |
+| es_ES  | "d 'de' MMMM, H:mm"                    | 2 de abril, 17:00  |
+| ja_JP  | "M月d日 H:mm"                            | 4月2日 17:00         |
+
+The most important DateTimePatternGenerator methods are the varieties of
+getBestPattern.
+
+Note that the fields in the format pattern may be adjusted as appropriate for
+the locale and may not exactly match those in the skeleton. For example:
+
+*   In Russian (locale "ru"), the skeleton "yMMMM" will produce the format
+    pattern "LLLL y" (or "LLLL y 'г'.") since a month name without a day number
+    must be in nominative form, as indicated by LLLL.
+*   When using the Japanese calendar in the Japanese locale (locale
+    "ja@calendar=japanese"), the skeleton "yMMMd" will produce the format
+    pattern "Gy年M月d日" since the era must always be shown with the year in the
+    Japanese calendar.
+
+## DateFormatSymbols
+
+DateFormatSymbols is a public class for encapsulating localizable date-time
+formatting data, including time zone data. DateFormatSymbols is used by
+DateFormat and SimpleDateFormat.
+
+DateFormatSymbols specifies the exact character strings to use for various parts
+of a date or time For example, the names of the months and days of the week, the
+strings for AM and PM and the day of the week considered to be the first day of
+the week (used in drawing calendar grids) are controlled by DateFormatSymbols.
+
+Create a date-time formatter using the `createTimeInstance`, `createDateInstance`,
+or `createDateTimeInstance` methods in DateFormat. Each of these methods can
+return a date/time formatter initialized with a default format pattern, along
+with the date-time formatting data for a given or default locale. After a
+formatter is created, modify the format pattern using `applyPattern`.
+
+If you want to create a date-time formatter with a particular format pattern and
+locale, use one of the SimpleDateFormat constructors:
+
+```cpp
+UnicodeString aPattern("GyyyyMMddHHmmssSSZ", "");
+new SimpleDateFormat(aPattern, new DateFormatSymbols(Locale::getUS())
+```
+
+This loads the appropriate date-time formatting data from the locale.s
+
+## Programming Examples
+
+See [date and time formatting examples](examples.md) .
diff --git a/docs/userguide/format_parse/formatted_value.md b/docs/userguide/format_parse/formatted_value.md
index 00c27d2..4c60b09 100644
--- a/docs/userguide/format_parse/formatted_value.md
+++ b/docs/userguide/format_parse/formatted_value.md
@@ -11,9 +11,9 @@
 in Java, C++, and C.  For more details and a list of all implementing classes,
 refer to the API docs:
 
-- [C++ FormattedValue](http://icu-project.org/apiref/icu4c/classicu_1_1FormattedValue.html)
-- [C UFormattedValue](http://icu-project.org/apiref/icu4c/globals_u.html) -- search for "resultAsValue"
-- [Java FormattedValue](http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/FormattedValue.html)
+- [C++ FormattedValue](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1FormattedValue.html)
+- [C UFormattedValue](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/globals_u.html) -- search for "resultAsValue"
+- [Java FormattedValue](https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/FormattedValue.html)
 
 ## Nested Span Fields
 
diff --git a/docs/userguide/format_parse/index.md b/docs/userguide/format_parse/index.md
new file mode 100644
index 0000000..2f91f93
--- /dev/null
+++ b/docs/userguide/format_parse/index.md
@@ -0,0 +1,210 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Formatting and Parsing
+
+## Overview
+
+Formatters translate between binary data and human-readable textual
+representations of these values. For example, you cannot display the computer
+representation of the number 103. You can only display the numeral 103 as a
+textual representation (using three text characters). The result from a
+formatter is a string that contains text that the user will recognize as
+representing the internal value. A formatter can also parse a string by
+converting a textual representation of some value back into its internal
+representation. For example, it reads the characters 1, 0 and 3 followed by
+something other than a digit, and produces the value 103 as an internal binary
+representation.
+
+These classes encapsulate information about the display of localized times,
+days, numbers, currencies, and messages. Formatting classes do both formatting
+and parsing and allow the separation of the data that the end-user sees from the
+code. Separating the program code from the data allows a program to be more
+easily localized. Formatting is converting a date, time, number, message or
+other object from its internal representation into a string. Parsing is the
+reverse operation. It is the process of converting a string to an internal
+representation of the date, time, number, message or other object.
+
+Using the formatting classes is an important step in internationalizing your
+software because the `format()` and `parse()` methods in each of the classes make
+your software language neutral, by replacing implicit conversions with explicit
+formatting calls.
+
+## Internationalization Formatting Tips
+
+This section discusses some of the ways you can format and parse numbers,
+currencies, dates, times and text messages in your program so that the data is
+separate from the code and can be easily localized. This is the information your
+users see on their computer screens, so it needs to be in a language and format
+that conforms to their local conventions.
+
+Some things you need to keep in mind while you are creating your code are the
+following:
+
+*   Keep your code and your data separate
+
+*   Format the data in a locale-sensitive manner
+
+*   Keep your code locale-independent
+
+*   Avoid writing special routines to handle specific locales
+
+*   String objects formatted by `format()` are parseable by the `parse()` method\*
+
+> :point_right: **Note**: Although parsing is supported in several legacy ICU APIs,
+it is generally considered bad practice to parse localized strings.
+For more information, read [Why You Should Not Parse
+Localized Strings](https://blog.sffc.xyz/post/190943794505/why-you-should-not-parse-localized-strings).
+
+### Numbers and Currencies
+
+Programs store and operate on numbers using a locale-independent binary
+representation. When displaying or printing a number it is converted to a
+locale-specific string. For example, the number 12345.67 is "12,345.67" in the
+US, "12 345,67" in France and "12.345,67" in Germany.
+
+By invoking the methods provided by the `NumberFormat` class, you can format
+numbers, currencies, and percentages according to the specified or default
+locale. `NumberFormat` is locale-sensitive so you need to create a new
+`NumberFormat` for each locale. `NumberFormat` methods format primitive-type
+numbers, such as double and output the number as a locale-specific string.
+
+For currencies you call `getCurrencyInstance` to create a formatter that returns a
+string with the formatted number and the appropriate currency sign. Of course,
+the `NumberFormat` class is unaware of exchange rates so, the number output is the
+same regardless of the specified currency. This means that the same number has
+different monetary values depending on the currency locale. If the number is
+9988776.65 the results will be:
+
+*   9 988 776,65 € in France
+
+*   9.988.776,65 € in Germany
+
+*   $9,988,776.65 in the United States
+
+In order to format percentages, create a locale-specific formatter and call the
+`getPercentInstance` method. With this formatter, a decimal fraction such as 0.75
+is displayed as 75%.
+
+#### Customizing Number Formats
+
+If you need to customize a number format you can use the DecimalFormat (§) and
+the DecimalFormatSymbols (§) classes in the [Formatting
+Numbers](formatparse/numbers/index.md) chapter. This not usually necessary and
+it makes your code much more complex, but it is available for those rare
+instances where you need it. In general, you would do this by explicitly
+specifying the number format pattern.
+
+If you need to format or parse spelled-out numbers, you can use the
+RuleBasedNumberFormat class (§) (see the [Formatting
+Numbers](formatparse/numbers/index.md) chapter). You can instantiate a default
+formatter for a locale, or by using the RuleBasedNumberFormat rule syntax,
+specify your own.
+
+Using NumberFormat (§) class methods (see the [Formatting
+Numbers](formatparse/numbers/index.md) chapter) with a predefined locale is the
+easiest and the most accurate way to format numbers, and currencies.
+
+> :point_right: **Note**: *See [Properties and ICU Rule Syntax](strings/properties.md) for
+information regarding syntax characters.*
+
+### Date and Times
+
+You display or print a Date by first converting it to a locale-specific string
+that conforms to the conventions of the end user's Locale. For example, Germans
+recognize 20.4.98 as a valid date, and Americans recognize 4/20/98.
+
+> :point_right: **Note**: *The appropriate Calendar support is required for different locales. For
+example, the Buddhist calendar is the official calendar in Thailand so the
+typical assumption of Gregorian Calendar usage should not be used. ICU will pick
+the appropriate Calendar based on the locale you supply when opening a Calendar
+or DateFormat.*
+
+### Messages
+
+Message format helps make the order of display elements localizable. It helps
+address problems of grammatical differences in languages. For example, consider
+the sentence, "I go to work by car everyday." In Japanese, the grammar
+equivalent can be "Everyday, I to work by car go." Another example will be the
+plurals in text, for example, "no space for rent, one room for rent and many
+rooms for rent," where "for rent" is the only constant text among the three.
+
+## Formatting and Parsing Classes
+
+ICU provides four major areas and twelve classes for formatting numbers, dates
+and messages:
+
+### General Formatting
+
+*   `Format`:
+    The abstract superclass of all format classes. It provides the basic methods
+    for formatting and parsing numbers, dates, strings and other objects.
+
+*   `FieldPosition`:
+    A concrete class for holding the field constant and the begin and end
+    indices for number and date fields.
+
+*   `ParsePosition`:
+    A concrete class for holding the parse position in a string during parsing.
+
+*   `Formattable`:
+    Formattable objects can be passed to the Format class or its subclasses for
+    formatting. It encapsulates a polymorphic piece of data to be formatted and
+    is used with MessageFormat. Formattable is used by some formatting
+    operations to provide a single "type" that encompasses all formattable
+    values (e.g., it can hold a number, a date, or a string, and so on).
+
+*   `UParseError`:
+    UParseError is used to returned detailed information about parsing errors.
+    It is used by the ICU parsing engines that parse long rules, patterns, or
+    programs. This is helpful when the text being parsed is long enough that
+    more information than a UErrorCode is needed to localize the error.
+
+**Formatting Numbers**
+
+*   [NumberFormat](formatparse/numbers/index.md) (§)
+    The abstract superclass that provides the basic fields and methods for
+    formatting Number objects and number primitives to localized strings and
+    parsing localized strings to Number objects.
+
+*   [DecimalFormat](formatparse/numbers/index.md) (§)
+    A concrete class for formatting Number objects and number primitives to
+    localized strings and parsing localized strings to Number objects, in base
+    10.
+
+*   [RuleBasedNumberFormat](formatparse/numbers/index.md) (§)
+    A concrete class for formatting Number objects and number primitives to
+    localized text, especially spelled-out format such as found in check writing
+    (e.g. "two hundred and thirty-four"), and parsing text into Number objects.
+
+*   [DecimalFormatSymbols](formatparse/numbers/index.md) (§)
+    A concrete class for accessing localized number strings, such as the
+    grouping separators, decimal separator, and percent sign. Used by
+    DecimalFormat.
+
+**Formatting Dates and Times**
+
+*   [DateFormat](formatparse/datetime/index.md) (§)
+    The abstract superclass that provides the basic fields and methods for
+    formatting Date objects to localized strings and parsing date and time
+    strings to Date objects.
+
+*   [SimpleDateFormat](formatparse/datetime/index.md) (§)
+    A concrete class for formatting Date objects to localized strings and
+    parsing date and time strings to Date objects, using a GregorianCalendar.
+
+*   [DateFormatSymbols](formatparse/datetime/index.md) (§)
+    A concrete class for accessing localized date-time formatting strings, such
+    as names of the months, days of the week and the time zone.
+
+**Formatting Messages**
+
+*   [MessageFormat](formatparse/messages/index.md) (§)
+    A concrete class for producing a language-specific user message that
+    contains numbers, currency, percentages, date, time and string variables.
+
+*   [ChoiceFormat](formatparse/messages/index.md) (§)
+    A concrete class for mapping strings to ranges of numbers and for handling
+    plurals and names series in user messages.
diff --git a/docs/userguide/format_parse/messages/examples.md b/docs/userguide/format_parse/messages/examples.md
new file mode 100644
index 0000000..9eadecf
--- /dev/null
+++ b/docs/userguide/format_parse/messages/examples.md
@@ -0,0 +1,381 @@
+<!--
+© 2020 and later: Unicode, Inc. and others.
+License & terms of use: http://www.unicode.org/copyright.html
+-->
+
+# Message Formatting Examples
+
+## MessageFormat Class
+
+ICU's MessageFormat class can be used to format messages in a locale-independent
+manner to localize the user interface (UI) strings.
+
+### C++
+
+```cpp
+
+/* The strings below can be isolated into a resource bundle
+* and retrieved dynamically
+*/
+#define LANGUAGE_NAMES "{0}<{1}languages {2}>\n"
+#define LANG_ATTRIB   "{0}<language id=\"{1}\" >{2}</language>\n"
+#define MONTH_NAMES "{0}<monthNames>\n"
+#define END_MONTH_NAMES "{0}</monthNames>\n"
+#define MONTH   "{0}<month id=\"{1}\">{2}</month>\n"
+#define MONTH_ABBR "{0}<monthAbbr>\n"
+#define END_MONTH_ABBR "{0}</monthAbbr>\n"
+
+UnicodeString CXMLGenerator::formatString(UnicodeString& str,UnicodeString&
+argument){
+Formattable args[] ={ argument};
+UnicodeString result;
+MessageFormat format(str,mError);
+FieldPosition fpos=0;
+format.format(args,1, result,fpos,mError);
+if(U_FAILURE(mError)) {
+  return UnicodeString("Illegal argument");
+}
+
+return result;
+}
+
+void CXMLGenerator::writeLanguage(UnicodeString& xmlString){
+
+UnicodeString *itemTags, *items;
+char* key="Languages";
+int32_t numItems;
+
+if(U_FAILURE(mError)) {
+  return;
+}
+
+mRBundle.getTaggedArray(key,itemTags, items, numItems, mError);
+if(mError!=U_USING_DEFAULT_ERROR && U_SUCCESS(mError) &&
+mError!=U_ERROR_INFO_START){
+
+  Formattable args[]={indentOffset,"",""};
+  xmlString= formatString(UnicodeString(LANGUAGE_NAMES),args,3);
+  indentOffset.append("\t");
+  for(int32_t i=0;i<numItems;i++){
+
+    args[0] = indentOffset;
+    args[1] =itemTags[i] ;
+    args[2] = items[i] ;
+    xmlString.append(formatString(UnicodeString(LANG_ATTRIB),args,3));
+  }
+
+  chopIndent();
+  args[0]=indentOffset;
+  args[1] =(UnicodeString(XML_END_SLASH));
+  args[2] = "";
+  xmlString.append(formatString(UnicodeString(LANGUAGE_NAMES),args,3));
+
+  return;
+}
+mError=U_ZERO_ERROR;
+xmlString.remove();
+}
+
+
+void CXMLGenerator::writeMonthNames(UnicodeString& xmlString){
+
+int32_t lNum;
+const UnicodeString* longMonths=
+mRBundle.getStringArray("MonthNames",lNum,mError);
+if(mError!=U_USING_DEFAULT_ERROR && mError!=U_ERROR_INFO_START && mError !=
+U_MISSING_RESOURCE_ERROR){
+  xmlString.append(formatString(UnicodeString(MONTH_NAMES),indentOffset));
+  indentOffset.append("\t");
+  for(int i=0;i<lNum;i++){
+   char c;
+   itoa(i+1,&c,10);
+   Formattable args[]={indentOffset,UnicodeString(&c),longMonths[i]};
+   xmlString.append(formatString(UnicodeString(MONTH),args,3));
+  }
+  chopIndent();
+  xmlString.append(formatString(UnicodeString(END_MONTH_NAMES),indentOffset));
+  mError=U_ZERO_ERROR;
+  return;
+}
+xmlString.remove();
+mError= U_ZERO_ERROR;
+}
+```
+
+### C
+
+```c
+
+void msgSample1(){
+
+    UChar *result, *tzID, *str;
+    UChar pattern[100];
+    int32_t resultLengthOut, resultlength;
+    UCalendar *cal;
+    UDate d1;
+    UErrorCode status = U_ZERO_ERROR;
+    str=(UChar*)malloc(sizeof(UChar) * (strlen("disturbance in force") +1));
+    u_uastrcpy(str, "disturbance in force");
+    tzID=(UChar*)malloc(sizeof(UChar) * 4);
+     u_uastrcpy(tzID, "PST");
+     cal=ucal_open(tzID, u_strlen(tzID), "en_US", UCAL_TRADITIONAL, &status);
+     ucal_setDateTime(cal, 1999, UCAL_MARCH, 18, 0, 0, 0, &status);
+     d1=ucal_getMillis(cal, &status);
+     u_uastrcpy(pattern, "On {0, date, long}, there was a {1} on planet
+{2,number,integer}");
+     resultlength=0;
+     resultLengthOut=u_formatMessage( "en_US", pattern, u_strlen(pattern),
+NULL,
+resultlength, &status, d1, str, 7);
+     if(status==U_BUFFER_OVERFLOW_ERROR){
+         status=U_ZERO_ERROR;
+         resultlength=resultLengthOut+1;
+         result=(UChar*)realloc(result, sizeof(UChar) * resultlength);
+         u_formatMessage( "en_US", pattern, u_strlen(pattern), result,
+resultlength, &status, d1, str, 7);
+     }
+     printf("%s\n",austrdup(result) ); //austrdup( a function used to convert
+UChar* to char*)
+     free(tzID);
+     free(str);
+     free(result);
+}
+
+char *austrdup(const UChar* unichars)
+
+{
+    int   length;
+    char *newString;
+
+    length    = u_strlen ( unichars );
+    newString = (char*)malloc ( sizeof( char ) * 4 * ( length + 1 ) );
+    if ( newString == NULL )
+        return NULL;
+
+    u_austrcpy ( newString, unichars );
+
+    return newString;
+}
+
+This is a more practical sample which retrieves data from a resource bundle
+and
+feeds the data
+to u_formatMessage to produce a formatted string
+
+void msgSample3(){
+
+char* key="Languages";
+int32_t numItems;
+    /* This constant string can also be in the resouce bundle and retrieved at
+the time
+     * of formatting
+     * eg:
+     * UResouceBundle* myResB = ures_open("myResources",currentLocale,&err);
+     * UChar* Lang_Attrib = ures_getString(myResb,"LANG_ATTRIB",&err);
+     */
+    UChar* LANG_ATTRIB   =(UChar*) "{0}<language id=\"{1}\"
+>{2}</language>\n";
+    UChar *result;
+    UResourceBundle* pResB,*pDeltaResB=NULL;
+    UErrorCode err=U_ZERO_ERROR;
+    UChar* indentOffset = (UChar*)"\t\t\t";
+    pResB = ures_open("","en",&err);
+if(U_FAILURE(err)) {
+  return;
+}
+
+    ures_getByKey(pResB, key, pDeltaResB, &err);
+
+    if(U_SUCCESS(err)) {
+        const UChar *value = 0;
+        const char *key = 0;
+        int32_t len = 0;
+        int16_t indexR = -1;
+        int32_t resultLength=0,resultLengthOut=0;
+        numItems = ures_getSize(pDeltaResB);
+        for(;numItems-->0;){
+            key= ures_getKey(pDeltaResB);
+            value = ures_get(pDeltaResB,key,&err);
+            resultLength=0;
+            resultLengthOut=u_formatMessage( "en_US", LANG_ATTRIB,
+u_strlen(LANG_ATTRIB),
+                                                NULL, resultLength, &err,
+indentOffset, value, key);
+            if(err==U_BUFFER_OVERFLOW_ERROR){
+                 err=U_ZERO_ERROR;
+                 resultLength=resultLengthOut+1;
+                 result=(UChar*)realloc(result, sizeof(UChar) * resultLength);
+                 u_formatMessage("en_US",LANG_ATTRIB,u_strlen(LANG_ATTRIB),
+                                result,resultLength,&err,indentOffset,
+                                value,key);
+
+                 printf("%s\n", austrdup(result) );
+            }
+
+        }
+
+  return;
+
+}
+err=U_ZERO_ERROR;
+}
+```
+
+### Java
+
+```java
+import com.ibm.icu.text.*;
+import java.util.Date;
+import java.text.FieldPosition;
+
+public class TestMessageFormat{
+    public void runTest() {
+        String format = "At {1,time,::jmm} on {1,date,::dMMMM}, there was {2} on planet {3,number,integer}.";
+        MessageFormat mf = new MessageFormat(format);
+        Object objectsToFormat[] = { new Date(System.currentTimeMillis()), new Date(System.currentTimeMillis()), "a Disturbance in the Force", new Integer(5)};
+        FieldPosition fp = new FieldPosition(1);
+        StringBuffer sb = new StringBuffer();
+        try{
+            sb = mf.format(objectsToFormat, sb, fp);
+            System.out.println(sb.toString());
+        }catch(IllegalArgumentException e){
+            System.out.println("Exception during formating of type :" +e);
+        }
+    }
+
+    public static void main(String args[]){
+        try{
+            new TestMessageFormat().runTest();
+        }catch(Exception e){
+            System.out.println("Exception of type: "+e);
+        }
+    }
+}
+```
+
+## ChoiceFormat Class
+
+**Important:** The following documentation is outdated. *ChoiceFormat is
+probably not what you need. Please use MessageFormat with plural arguments for
+proper plural selection, and select arguments for simple selection among a fixed
+set of choices!*
+
+ICU's ChoiceFormat class provides more flexibility than the printf() and scanf()
+style functions for formatting UI strings. This interface can be useful if you
+would like a message to change according to the number of items you are
+displaying. Note: Some Asian languages do not have plural words or phrases.
+
+### C++