unicodetools/com/ibm/text/UCA/help.html - external/github.com/unicode-org/icu - Git at Google

 <html>

 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <meta http-equiv="Content-Language" content="en-us">
 <link rel="stylesheet" href="charts.css" type="text/css">
 <meta name="GENERATOR" content="Microsoft FrontPage 4.0">
 <meta name="ProgId" content="FrontPage.Editor.Document">
 <title>UCA Chart Help</title>
 <base target="main">
 </head>

 <body>

 <h2 align="center">UCA Chart Help</h2>
 <p>This set of charts shows the Unicode Collation Algorithm values for Unicode
 characters. The characters are arranged in the following groups:</p>
 <table cellspacing="0" cellpadding="4">
   <tr>
     <th align="left"><i>Null</i></th>
     <th class="x">Completely ignoreable (primary, secondary and tertiary levels)<br>
       These include control codes and various formatting codes.</th>
   </tr>
   <tr>
     <th align="left"><i>Ignorable</i></th>
     <th class="x">Ignorable at a primary level, but not at a secondary or
       tertiary level.<br>
       These include most accents and diacritics.</th>
   </tr>
   <tr>
     <th align="left"><i>Variable</i></th>
     <th class="x">Characters that may be set to ignorable by a programmatic
       switch.<br>
       These include spaces, punctuation marks, and most symbols.</th>
   </tr>
   <tr>
     <th align="left"><i>Common</i></th>
     <th class="x">Characters that are none of the above, but not considered
       letters.<br>
       These include numbers, currency symbols, etc.</th>
   <tr>
     <th align="left"><i>Letters</i></th>
     <th class="x">According to script</th>
   </tr>
   <tr>
     <th align="left"><i>Unsupported</i></th>
     <th class="x">Not explicitly supported in this version of UCA; uses
       code-point order</th>
   </tr>
 </table>
 <p>The characters* within each group are arranged in cells. The color of the
 cell indicates the strength of the difference between that character and the <i>previous</i>
 character in the chart, as follows.</p>
 <table cellspacing="0" cellpadding="4">
   <tr>
     <th colspan="2"><font size="3"><u>No Expansion</u></font>
     <th rowspan="5">&nbsp;
     <th colspan="2"><font size="3"><u>Expansion</u></font>
   </tr>
   <tr>
     <td class="p">a<br>
       <tt>0061</tt></td>
     <th class="x">Primary difference
     <td class="ep">ǳ<br>
       <tt>01F3</tt></td>
     <th class="x">Primary difference</th>
   </tr>
   <tr>
     <td class="s">á<br>
       <tt>00E1</tt></td>
     <th class="x">Secondary Difference</th>
     <td class="es">Ǳ<br>
       <tt>01F1</tt></td>
     <th class="x">Secondary Difference</th>
   </tr>
   <tr>
     <td class="t">A<br>
       <tt>0041</tt></td>
     <th class="x">Tertiary difference</th>
     <td class="et">ǲ<br>
       <tt>01F2</tt></td>
     <th class="x">Tertiary difference</th>
   <tr>
     <td class="q">Å<br>
       <tt>212B</tt></td>
     <th class="x">Quarternary difference<br>
       or no difference</th>
     <td class="eq">&nbsp;</td>
     <th class="x">Quarternary difference<br>
       or no difference</th>
   </tr>
 </table>
 <blockquote>
   <p align="left"><b>Note: </b>If tool-tips are enabled in your browser, then if
   you pause the mouse over any cell, you will see the name of the character and
   a representation of the sort key. In this representation, the separators
   between the weight levels are represented with &quot;|&quot;.</p>
 </blockquote>
 <table>
   <tr>
     <th>*</th>
     <th class="x">In some cases, the UCA data table also includes contractions.<br>
       They can be recognized by the multiple code point numbers, as in the
       following:</th>
     <td class="p">ஔ<br>
       <tt>0B92 0BD7</tt></td>
   </tr>
 </table>
 <h3><b>Notes</b></h3>
 <ul>
   <li>The UCA results are versioned <i>both</i> by the version of the UCA <i>and</i>
     by the version of The Unicode Standard used to process the data.</li>
   <li>These charts only provide one of the alternatives for handling variable
     characters (punctuation), whereby these characters are <b>non-ignorable.</b></li>
   <li>Characters from large blocks, such as CJK-Ideographs, Hangul Syllables,
     Private Use Area, etc. are represented by a sampling.</li>
   <li>Some unassigned code points, noncharacters and other edge cases are also
     added to the list for comparison.</li>
   <li>For more information, see <a href="http://www.unicode.org/unicode/reports/tr10/" target="_top">UTS
     #10: Unicode Collation Algorithm</a>.</li>
 </ul>

 </body>

 </html>
	<html>

	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<meta http-equiv="Content-Language" content="en-us">
	<link rel="stylesheet" href="charts.css" type="text/css">
	<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
	<meta name="ProgId" content="FrontPage.Editor.Document">
	<title>UCA Chart Help</title>
	<base target="main">
	</head>

	<body>

	<h2 align="center">UCA Chart Help</h2>
	<p>This set of charts shows the Unicode Collation Algorithm values for Unicode
	characters. The characters are arranged in the following groups:</p>
	<table cellspacing="0" cellpadding="4">
	<tr>
	<th align="left"><i>Null</i></th>
	<th class="x">Completely ignoreable (primary, secondary and tertiary levels)<br>
	These include control codes and various formatting codes.</th>
	</tr>
	<tr>
	<th align="left"><i>Ignorable</i></th>
	<th class="x">Ignorable at a primary level, but not at a secondary or
	tertiary level.<br>
	These include most accents and diacritics.</th>
	</tr>
	<tr>
	<th align="left"><i>Variable</i></th>
	<th class="x">Characters that may be set to ignorable by a programmatic
	switch.<br>
	These include spaces, punctuation marks, and most symbols.</th>
	</tr>
	<tr>
	<th align="left"><i>Common</i></th>
	<th class="x">Characters that are none of the above, but not considered
	letters.<br>
	These include numbers, currency symbols, etc.</th>
	<tr>
	<th align="left"><i>Letters</i></th>
	<th class="x">According to script</th>
	</tr>
	<tr>
	<th align="left"><i>Unsupported</i></th>
	<th class="x">Not explicitly supported in this version of UCA; uses
	code-point order</th>
	</tr>
	</table>
	<p>The characters* within each group are arranged in cells. The color of the
	cell indicates the strength of the difference between that character and the <i>previous</i>
	character in the chart, as follows.</p>
	<table cellspacing="0" cellpadding="4">
	<tr>
	<th colspan="2"><font size="3"><u>No Expansion</u></font>
	<th rowspan="5">
	<th colspan="2"><font size="3"><u>Expansion</u></font>
	</tr>
	<tr>
	<td class="p">a<br>
	<tt>0061</tt></td>
	<th class="x">Primary difference
	<td class="ep">ǳ<br>
	<tt>01F3</tt></td>
	<th class="x">Primary difference</th>
	</tr>
	<tr>
	<td class="s">á<br>
	<tt>00E1</tt></td>
	<th class="x">Secondary Difference</th>
	<td class="es">Ǳ<br>
	<tt>01F1</tt></td>
	<th class="x">Secondary Difference</th>
	</tr>
	<tr>
	<td class="t">A<br>
	<tt>0041</tt></td>
	<th class="x">Tertiary difference</th>
	<td class="et">ǲ<br>
	<tt>01F2</tt></td>
	<th class="x">Tertiary difference</th>
	<tr>
	<td class="q">Å<br>
	<tt>212B</tt></td>
	<th class="x">Quarternary difference<br>
	or no difference</th>
	<td class="eq"> </td>
	<th class="x">Quarternary difference<br>
	or no difference</th>
	</tr>
	</table>
	<blockquote>
	<p align="left"><b>Note: </b>If tool-tips are enabled in your browser, then if
	you pause the mouse over any cell, you will see the name of the character and
	a representation of the sort key. In this representation, the separators
	between the weight levels are represented with "\|".</p>
	</blockquote>
	<table>
	<tr>
	<th>*</th>
	<th class="x">In some cases, the UCA data table also includes contractions.<br>
	They can be recognized by the multiple code point numbers, as in the
	following:</th>
	<td class="p">ஔ<br>
	<tt>0B92 0BD7</tt></td>
	</tr>
	</table>
	<h3><b>Notes</b></h3>
	<ul>
	<li>The UCA results are versioned <i>both</i> by the version of the UCA <i>and</i>
	by the version of The Unicode Standard used to process the data.</li>
	<li>These charts only provide one of the alternatives for handling variable
	characters (punctuation), whereby these characters are <b>non-ignorable.</b></li>
	<li>Characters from large blocks, such as CJK-Ideographs, Hangul Syllables,
	Private Use Area, etc. are represented by a sampling.</li>
	<li>Some unassigned code points, noncharacters and other edge cases are also
	added to the list for comparison.</li>
	<li>For more information, see <a href="http://www.unicode.org/unicode/reports/tr10/" target="_top">UTS
	#10: Unicode Collation Algorithm</a>.</li>
	</ul>

	</body>

	</html>