docs/usermanual-clusters.xml - third_party/harfbuzz - Git at Google

 <?xml version="1.0"?>
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
                "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
   <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
   <!ENTITY version SYSTEM "version.xml">
 ]>
 <chapter id="clusters">
   <title>Clusters</title>
   <section id="clusters">
     <title>Clusters</title>
     <para>
       In text shaping, a <emphasis>cluster</emphasis> is a sequence of
       characters that needs to be treated as a single, indivisible
       unit.
     </para>
     <para>
       A cluster is distinct from a <emphasis>grapheme</emphasis>,
       which is the smallest unit of a writing system or script,
       because clusters are only relevant for script shaping and the
       layout of glyphs.
     </para>
     <para>
       For example, a grapheme may be a letter, a number, a logogram,
       or a symbol. When two letters form a ligature, however, they
       combine into a single glyph. They are therefore part of the same
       cluster and are treated as a unit &mdash; even though the two
       original, underlying letters are separate graphemes.
     </para>
     <para>
       During the shaping process, there are several shaping operations
       that may merge adjacent characters (for example, when two code
       points form a ligature or a conjunct form and are replaced by a
       single glyph) or split one character into several (for example,
       when decomposing a code point through the
       <literal>ccmp</literal> feature).
     </para>
     <para>
       HarfBuzz tracks clusters independently from how these
       shaping operations affect the individual glyphs that comprise the
       output HarfBuzz returns in a buffer. Consequently,
       a client program using HarfBuzz can utilize the cluster
       information to implement features such as:
     </para>
     <itemizedlist>
       <listitem>
 	<para>
 	  Correctly positioning the cursor within a shaped text run,
 	  even when characters have formed ligatures, composed or
 	  decomposed, reordered, or undergone other shaping operations.
 	</para>
       </listitem>
       <listitem>
 	<para>
 	  Correctly highlighting a text selection that includes some,
 	  but not all, of the characters in a word.
 	</para>
       </listitem>
       <listitem>
 	<para>
 	  Applying text attributes (such as color or underlining) to
 	  part, but not all, of a word.
 	</para>
       </listitem>
       <listitem>
 	<para>
 	  Generating output document formats (such as PDF) with
 	  embedded text that can be fully extracted.
 	</para>
       </listitem>
       <listitem>
 	<para>
 	  Determining the mapping between input characters and output
 	  glyphs, such as which glyphs are ligatures.
 	</para>
       </listitem>
       <listitem>
 	<para>
 	  Performing line-breaking, justification, and other
 	  line-level or paragraph-level operations that must be done
 	  after shaping is complete, but which require character-level
 	  properties.
 	</para>
       </listitem>
     </itemizedlist>
     <para>
       When you add text to a HarfBuzz buffer, each code point must be
       assigned a <emphasis>cluster value</emphasis>.
     </para>
     <para>
       This cluster value is an arbitrary number; HarfBuzz uses it only
       to distinguish between clusters. Many client programs will use
       the index of each code point in the input text stream as the
       cluster value. This is for the sake of convenience; the actual
       value does not matter.
     </para>
     <para>
       Client programs can choose how HarfBuzz handles clusters during
       shaping by setting the
       <literal>cluster_level</literal> of the
       buffer. HarfBuzz offers three <emphasis>levels</emphasis> of
       clustering support for this property:
     </para>
     <itemizedlist>
       <listitem>
 	<para><emphasis>Level 0</emphasis> is the default and
 	reproduces the behavior of the old HarfBuzz library.
 	</para>
 	<para>
 	  The distinguishing feature of level 0 behavior is that, at
 	  the beginning of processing the buffer, all code points that
 	  are categorized as <emphasis>marks</emphasis>,
 	  <emphasis>modifier symbols</emphasis>, or
 	  <emphasis>Emoji extended pictographic</emphasis> modifiers,
 	  as well as the <emphasis>Zero Width Joiner</emphasis> and
 	  <emphasis>Zero Width Non-Joiner</emphasis> code points, are
 	  assigned the cluster value of the closest preceding code
 	  point from <emphasis>different</emphasis> category.
 	</para>
 	<para>
 	  In essence, whenever a base character is followed by a mark
 	  character or a sequence of mark characters, those marks are
 	  reassigned to the same initial cluster value as the base
 	  character. This reassignment is referred to as
 	  "merging" the affected clusters. This behavior is based on
 	  the Grapheme Cluster Boundary specification in <ulink
 	  url="https://www.unicode.org/reports/tr29/#Regex_Definitions">Unicode
 	  Technical Report 29</ulink>.
 	</para>
 	<para>
 	  Client programs can specify level 0 behavior for a buffer by
 	  setting its <literal>cluster_level</literal> to
 	  <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</literal>.
 	</para>
       </listitem>
       <listitem>
 	<para>
 	  <emphasis>Level 1</emphasis> tweaks the old behavior
 	  slightly to produce better results. Therefore, level 1
 	  clustering is recommended for code that is not required to
 	  implement backward compatibility with the old HarfBuzz.
 	</para>
 	<para>
 	  Level 1 differs from level 0 by not merging the
 	  clusters of marks and other modifier code points with the
 	  preceding "base" code point's cluster. By preserving the
 	  separate cluster values of these marks and modifier code
 	  points, script shapers can perform additional operations
 	  that might lead to improved results (for example, reordering
 	  a sequence of marks).
 	</para>
 	<para>
 	  Client programs can specify level 1 behavior for a buffer by
 	  setting its <literal>cluster_level</literal> to
 	  <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</literal>.
 	</para>
       </listitem>
       <listitem>
 	<para>
 	  <emphasis>Level 2</emphasis> differs significantly in how it
 	  treats cluster values. In level 2, HarfBuzz never merges
 	  clusters.
 	</para>
 	<para>
 	  This difference can be seen most clearly when HarfBuzz processes
 	  ligature substitutions and glyph decompositions. In level 0
 	  and level 1, ligatures and glyph decomposition both involve
 	  merging clusters; in level 2, neither of these operations
 	  triggers a merge.
 	</para>
 	<para>
 	  Client programs can specify level 2 behavior for a buffer by
 	  setting its <literal>cluster_level</literal> to
 	  <literal>HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</literal>.
 	</para>
       </listitem>
     </itemizedlist>
     <para>
       As mentioned earlier, client programs using HarfBuzz often
       assign initial cluster values in a buffer by reusing the indices
       of the code points in the input text. This gives a sequence of
       cluster values that is monotonically increasing (for example,
       0,1,2,3,4,5).
     </para>
     <para>
       It is not <emphasis>required</emphasis> that the cluster values
       in a buffer be monotonically increasing. However, if the initial
       cluster values in a buffer are monotonic and the buffer is
       configured to use cluster level 0 or 1, then HarfBuzz
       guarantees that the final cluster values in the shaped buffer
       will also be monotonic. No such guarantee is made for cluster
       level 2.
     </para>
     <para>
       In levels 0 and 1, HarfBuzz implements the following conceptual
       model for cluster values:
     </para>
     <itemizedlist spacing="compact">
       <listitem>
 	<para>
           If the sequence of input cluster values is monotonic, the
 	  sequence of cluster values will remain monotonic.
 	</para>
       </listitem>
       <listitem>
 	<para>
           Each cluster value represents a single cluster.
 	</para>
       </listitem>
       <listitem>
 	<para>
           Each cluster contains one or more glyphs and one or more
           characters.
 	</para>
       </listitem>
     </itemizedlist>
     <para>
       In practice, this model offers several benefits. Assuming that
       the initial cluster values were monotonically increasing
       and distinct before shaping began, then, in the final output:
     </para>
     <itemizedlist spacing="compact">
       <listitem>
 	<para>
 	  All adjacent glyphs having the same final cluster
 	  value belong to the same cluster.
 	</para>
       </listitem>
       <listitem>
 	<para>
           Each character belongs to the cluster that has the highest
 	  cluster value <emphasis>not larger than</emphasis> its
 	  initial cluster value.
 	</para>
       </listitem>
     </itemizedlist>

   </section>
   <section id="a-clustering-example-for-levels-0-and-1">
     <title>A clustering example for levels 0 and 1</title>
     <para>
       The guarantees and benefits of level 0 and level 1 can be seen
       with some examples. First, let us examine what happens with cluster
       values when shaping involves cluster merging with ligatures and
       decomposition.
     </para>
     <para>
       Let's say we start with the following character sequence (top row) and
       initial cluster values (bottom row):
     </para>
     <programlisting>
       A,B,C,D,E
       0,1,2,3,4
     </programlisting>
     <para>
       During shaping, HarfBuzz maps these characters to glyphs from
       the font. For simplicity, let us assume that each character maps
       to the corresponding, identical-looking glyph:
     </para>
     <programlisting>
       A,B,C,D,E
       0,1,2,3,4
     </programlisting>
     <para>
       Now if, for example, <literal>B</literal> and <literal>C</literal>
       form a ligature, then the clusters to which they belong
       &quot;merge&quot;. This merged cluster takes for its cluster
       value the minimum of all the cluster values of the clusters that
       went in to the ligature. In this case, we get:
     </para>
     <programlisting>
       A,BC,D,E
       0,1 ,3,4
     </programlisting>
     <para>
       because 1 is the minimum of the set {1,2}, which were the
       cluster values of <literal>B</literal> and
       <literal>C</literal>.
     </para>
     <para>
       Next, let us say that the <literal>BC</literal> ligature glyph
       decomposes into three components, and <literal>D</literal> also
       decomposes into two components. These components each inherit the
       cluster value of their parent:
     </para>
     <programlisting>
       A,BC0,BC1,BC2,D0,D1,E
       0,1  ,1  ,1  ,3 ,3 ,4
     </programlisting>
     <para>
       Next, if <literal>BC2</literal> and <literal>D0</literal> form a
       ligature, then their clusters (cluster values 1 and 3) merge into
       <literal>min(1,3) = 1</literal>:
     </para>
     <programlisting>
       A,BC0,BC1,BC2D0,D1,E
       0,1  ,1  ,1    ,1 ,4
     </programlisting>
     <para>
       At this point, cluster 1 means: the character sequence
       <literal>BCD</literal> is represented by glyphs
       <literal>BC0,BC1,BC2D0,D1</literal> and cannot be broken down any
       further.
     </para>
   </section>
   <section id="reordering-in-levels-0-and-1">
     <title>Reordering in levels 0 and 1</title>
     <para>
       Another common operation in the more complex shapers is glyph
       reordering. In order to maintain a monotonic cluster sequence
       when glyph reordering takes place, HarfBuzz merges the clusters
       of everything in the reordering sequence.
     </para>
     <para>
       For example, let us again start with the character sequence (top
       row) and initial cluster values (bottom row):
     </para>
     <programlisting>
       A,B,C,D,E
       0,1,2,3,4
     </programlisting>
     <para>
       If <literal>D</literal> is reordered to before <literal>B</literal>,
       then HarfBuzz merges the <literal>B</literal>,
       <literal>C</literal>, and <literal>D</literal> clusters, and we
       get:
     </para>
     <programlisting>
       A,D,B,C,E
       0,1,1,1,4
     </programlisting>
     <para>
       This is clearly not ideal, but it is the only sensible way to
       maintain a monotonic sequence of cluster values and retain the
       true relationship between glyphs and characters.
     </para>
   </section>
   <section id="the-distinction-between-levels-0-and-1">
     <title>The distinction between levels 0 and 1</title>
     <para>
       The preceding examples demonstrate the main effects of using
       cluster levels 0 and 1. The only difference between the two
       levels is this: in level 0, at the very beginning of the shaping
       process, HarfBuzz also merges clusters between any base character
       and all Unicode marks (combining or not) that follow it.
     </para>
     <para>
       For example, let us start with the following character sequence
       (top row) and accompanying initial cluster values (bottom row):
     </para>
     <programlisting>
       A,acute,B
       0,1    ,2
     </programlisting>
     <para>
       The <literal>acute</literal> is a Unicode mark. If HarfBuzz is
       using cluster level 0 on this sequence, then the
       <literal>A</literal> and <literal>acute</literal> clusters will
       merge, and the result will become:
     </para>
     <programlisting>
       A,acute,B
       0,0    ,2
     </programlisting>
     <para>
       This initial cluster merging is the default behavior of the
       Windows shaping engine, and the old HarfBuzz codebase copied
       that behavior to maintain compatibility. Consequently, it has
       remained the default behavior in the new HarfBuzz codebase.
     </para>
     <para>
       But this initial cluster-merging behavior makes it impossible to
       color diacritic marks differently from their base
       characters. That is why, in level 1, HarfBuzz does not perform
       the initial merging step.
     </para>
     <para>
       For client programs that rely on HarfBuzz cluster values to
       perform cursor positioning, level 0 is more convenient. But
       relying on cluster boundaries for cursor positioning is wrong: cursor
       positions should be determined based on Unicode grapheme
       boundaries, not on shaping-cluster boundaries. As such, level 1
       clusters are preferred.
     </para>
     <para>
       One last note about levels 0 and 1. HarfBuzz currently does not allow a
       <literal>MultipleSubst</literal> lookup to replace a glyph with zero
       glyphs (in other words, to delete a glyph). But, in some other situations,
       glyphs can be deleted. In those cases, if the glyph being deleted is
       the last glyph of its cluster, HarfBuzz makes sure to merge the cluster
       with a neighboring cluster.
     </para>
     <para>
       This is done primarily to make sure that the starting cluster of the
       text always has the cluster index pointing to the start of the text
       for the run; more than one client currently relies on this
       guarantee.
     </para>
     <para>
       Incidentally, Apple's CoreText does something else to maintain the
       same promise: it inserts a glyph with id 65535 at the beginning of
       the glyph string if the glyph corresponding to the first character
       in the run was deleted. HarfBuzz might do something similar in the
       future.
     </para>
   </section>
   <section id="level-2">
     <title>Level 2</title>
     <para>
       HarfBuzz's level 2 cluster behavior uses a significantly
       different model than that of level 0 and level 1.
     </para>
     <para>
       The level 2 behavior is easy to describe, but it may be
       difficult to understand in practical terms. In brief, level 2
       performs no merging of clusters whatsoever.
     </para>
     <para>
       When glyphs form a ligature (or when some other feature
       substitutes multiple glyphs with one glyph), the cluster value
       of the first glyph is retained as the cluster value for the
       ligature. However, no subsequent clusters &mdash; including
       marks and modifiers &mdash; are affected.
     </para>
     <para>
       Level 2 cluster behavior is less complex than level 0 or level
       1, but there are a few cases in which processing cluster values
       produced at level 2 may be tricky.
     </para>
     <section id="ligatures-with-combining-marks-in-level-2">
       <title>Ligatures with combining marks in level 2</title>
       <para>
 	The first example of how HarfBuzz's level 2 cluster behavior
 	can be tricky is when the text to be shaped includes combining
 	marks attached to ligatures.
       </para>
       <para>
 	Let us start with an input sequence with the following
 	characters (top row) and initial cluster values (bottom row):
       </para>
       <programlisting>
 	A,acute,B,breve,C,circumflex
 	0,1    ,2,3    ,4,5
       </programlisting>
       <para>
 	If the sequence <literal>A,B,C</literal> forms a ligature,
 	then these are the cluster values HarfBuzz will return under
 	the various cluster levels:
       </para>
       <para>
 	Level 0:
       </para>
       <programlisting>
 	ABC,acute,breve,circumflex
 	0  ,0    ,0    ,0
       </programlisting>
       <para>
 	Level 1:
       </para>
       <programlisting>
 	ABC,acute,breve,circumflex
 	0  ,0    ,0    ,5
       </programlisting>
       <para>
 	Level 2:
       </para>
       <programlisting>
 	ABC,acute,breve,circumflex
 	0  ,1    ,3    ,5
       </programlisting>
       <para>
 	Making sense of the level 2 result is the hardest for a client
 	program, because there is nothing in the cluster values that
 	indicates that <literal>B</literal> and <literal>C</literal>
 	formed a ligature with <literal>A</literal>.
       </para>
       <para>
 	In contrast, the "merged" cluster values of the mark glyphs
 	that are seen in the level 0 and level 1 output are evidence
 	that a ligature substitution took place.
       </para>
     </section>
     <section id="reordering-in-level-2">
       <title>Reordering in level 2</title>
       <para>
 	Another example of how HarfBuzz's level 2 cluster behavior
 	can be tricky is when glyphs reorder. Consider an input sequence
 	with the following characters (top row) and initial cluster
 	values (bottom row):
       </para>
       <programlisting>
 	A,B,C,D,E
 	0,1,2,3,4
       </programlisting>
       <para>
 	Now imagine <literal>D</literal> moves before
 	<literal>B</literal> in a reordering operation. The cluster
 	values will then be:
       </para>
       <programlisting>
 	A,D,B,C,E
 	0,3,1,2,4
       </programlisting>
       <para>
 	Next, if <literal>D</literal> forms a ligature with
 	<literal>B</literal>, the output is:
       </para>
       <programlisting>
 	A,DB,C,E
 	0,3 ,2,4
       </programlisting>
       <para>
 	However, in a different scenario, in which the shaping rules
 	of the script instead caused <literal>A</literal> and
 	<literal>B</literal> to form a ligature
 	<emphasis>before</emphasis> the <literal>D</literal> reordered, the
 	result would be:
       </para>
       <programlisting>
 	AB,D,C,E
 	0 ,3,2,4
       </programlisting>
       <para>
 	There is no way for a client program to differentiate between
 	these two scenarios based on the cluster values
 	alone. Consequently, client programs that use level 2 might
 	need to undertake additional work in order to manage cursor
 	positioning, text attributes, or other desired features.
       </para>
     </section>
     <section id="other-considerations-in-level-2">
       <title>Other considerations in level 2</title>
       <para>
 	There may be other problems encountered with ligatures under
 	level 2, such as if the direction of the text is forced to
 	opposite of its natural direction (for example, left-to-right
 	Arabic). But, generally speaking, these other scenarios are
 	minor corner cases that are too obscure for most client
 	programs to need to worry about.
       </para>
     </section>
   </section>
 </chapter>
	<?xml version="1.0"?>
	<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
	"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
	<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
	<!ENTITY version SYSTEM "version.xml">
	]>
	<chapter id="clusters">
	<title>Clusters</title>
	<section id="clusters">
	<title>Clusters</title>
	<para>
	In text shaping, a <emphasis>cluster</emphasis> is a sequence of
	characters that needs to be treated as a single, indivisible
	unit.
	</para>
	<para>
	A cluster is distinct from a <emphasis>grapheme</emphasis>,
	which is the smallest unit of a writing system or script,
	because clusters are only relevant for script shaping and the
	layout of glyphs.
	</para>
	<para>
	For example, a grapheme may be a letter, a number, a logogram,
	or a symbol. When two letters form a ligature, however, they
	combine into a single glyph. They are therefore part of the same
	cluster and are treated as a unit — even though the two
	original, underlying letters are separate graphemes.
	</para>
	<para>
	During the shaping process, there are several shaping operations
	that may merge adjacent characters (for example, when two code
	points form a ligature or a conjunct form and are replaced by a
	single glyph) or split one character into several (for example,
	when decomposing a code point through the
	<literal>ccmp</literal> feature).
	</para>
	<para>
	HarfBuzz tracks clusters independently from how these
	shaping operations affect the individual glyphs that comprise the
	output HarfBuzz returns in a buffer. Consequently,
	a client program using HarfBuzz can utilize the cluster
	information to implement features such as:
	</para>
	<itemizedlist>
	<listitem>
	<para>
	Correctly positioning the cursor within a shaped text run,
	even when characters have formed ligatures, composed or
	decomposed, reordered, or undergone other shaping operations.
	</para>
	</listitem>
	<listitem>
	<para>
	Correctly highlighting a text selection that includes some,
	but not all, of the characters in a word.
	</para>
	</listitem>
	<listitem>
	<para>
	Applying text attributes (such as color or underlining) to
	part, but not all, of a word.
	</para>
	</listitem>
	<listitem>
	<para>
	Generating output document formats (such as PDF) with
	embedded text that can be fully extracted.
	</para>
	</listitem>
	<listitem>
	<para>
	Determining the mapping between input characters and output
	glyphs, such as which glyphs are ligatures.
	</para>
	</listitem>
	<listitem>
	<para>
	Performing line-breaking, justification, and other
	line-level or paragraph-level operations that must be done
	after shaping is complete, but which require character-level
	properties.
	</para>
	</listitem>
	</itemizedlist>
	<para>
	When you add text to a HarfBuzz buffer, each code point must be
	assigned a <emphasis>cluster value</emphasis>.
	</para>
	<para>
	This cluster value is an arbitrary number; HarfBuzz uses it only
	to distinguish between clusters. Many client programs will use
	the index of each code point in the input text stream as the
	cluster value. This is for the sake of convenience; the actual
	value does not matter.
	</para>
	<para>
	Client programs can choose how HarfBuzz handles clusters during
	shaping by setting the
	<literal>cluster_level</literal> of the
	buffer. HarfBuzz offers three <emphasis>levels</emphasis> of
	clustering support for this property:
	</para>
	<itemizedlist>
	<listitem>
	<para><emphasis>Level 0</emphasis> is the default and
	reproduces the behavior of the old HarfBuzz library.
	</para>
	<para>
	The distinguishing feature of level 0 behavior is that, at
	the beginning of processing the buffer, all code points that
	are categorized as <emphasis>marks</emphasis>,
	<emphasis>modifier symbols</emphasis>, or
	<emphasis>Emoji extended pictographic</emphasis> modifiers,
	as well as the <emphasis>Zero Width Joiner</emphasis> and
	<emphasis>Zero Width Non-Joiner</emphasis> code points, are
	assigned the cluster value of the closest preceding code
	point from <emphasis>different</emphasis> category.
	</para>
	<para>
	In essence, whenever a base character is followed by a mark
	character or a sequence of mark characters, those marks are
	reassigned to the same initial cluster value as the base
	character. This reassignment is referred to as
	"merging" the affected clusters. This behavior is based on
	the Grapheme Cluster Boundary specification in <ulink
	url="https://www.unicode.org/reports/tr29/#Regex_Definitions">Unicode
	Technical Report 29</ulink>.
	</para>
	<para>
	Client programs can specify level 0 behavior for a buffer by
	setting its <literal>cluster_level</literal> to
	<literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</literal>.
	</para>
	</listitem>
	<listitem>
	<para>
	<emphasis>Level 1</emphasis> tweaks the old behavior
	slightly to produce better results. Therefore, level 1
	clustering is recommended for code that is not required to
	implement backward compatibility with the old HarfBuzz.
	</para>
	<para>
	Level 1 differs from level 0 by not merging the
	clusters of marks and other modifier code points with the
	preceding "base" code point's cluster. By preserving the
	separate cluster values of these marks and modifier code
	points, script shapers can perform additional operations
	that might lead to improved results (for example, reordering
	a sequence of marks).
	</para>
	<para>
	Client programs can specify level 1 behavior for a buffer by
	setting its <literal>cluster_level</literal> to
	<literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</literal>.
	</para>
	</listitem>
	<listitem>
	<para>
	<emphasis>Level 2</emphasis> differs significantly in how it
	treats cluster values. In level 2, HarfBuzz never merges
	clusters.
	</para>
	<para>
	This difference can be seen most clearly when HarfBuzz processes
	ligature substitutions and glyph decompositions. In level 0
	and level 1, ligatures and glyph decomposition both involve
	merging clusters; in level 2, neither of these operations
	triggers a merge.
	</para>
	<para>
	Client programs can specify level 2 behavior for a buffer by
	setting its <literal>cluster_level</literal> to
	<literal>HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</literal>.
	</para>
	</listitem>
	</itemizedlist>
	<para>
	As mentioned earlier, client programs using HarfBuzz often
	assign initial cluster values in a buffer by reusing the indices
	of the code points in the input text. This gives a sequence of
	cluster values that is monotonically increasing (for example,
	0,1,2,3,4,5).
	</para>
	<para>
	It is not <emphasis>required</emphasis> that the cluster values
	in a buffer be monotonically increasing. However, if the initial
	cluster values in a buffer are monotonic and the buffer is
	configured to use cluster level 0 or 1, then HarfBuzz
	guarantees that the final cluster values in the shaped buffer
	will also be monotonic. No such guarantee is made for cluster
	level 2.
	</para>
	<para>
	In levels 0 and 1, HarfBuzz implements the following conceptual
	model for cluster values:
	</para>
	<itemizedlist spacing="compact">
	<listitem>
	<para>
	If the sequence of input cluster values is monotonic, the
	sequence of cluster values will remain monotonic.
	</para>
	</listitem>
	<listitem>
	<para>
	Each cluster value represents a single cluster.
	</para>
	</listitem>
	<listitem>
	<para>
	Each cluster contains one or more glyphs and one or more
	characters.
	</para>
	</listitem>
	</itemizedlist>
	<para>
	In practice, this model offers several benefits. Assuming that
	the initial cluster values were monotonically increasing
	and distinct before shaping began, then, in the final output:
	</para>
	<itemizedlist spacing="compact">
	<listitem>
	<para>
	All adjacent glyphs having the same final cluster
	value belong to the same cluster.
	</para>
	</listitem>
	<listitem>
	<para>
	Each character belongs to the cluster that has the highest
	cluster value <emphasis>not larger than</emphasis> its
	initial cluster value.
	</para>
	</listitem>
	</itemizedlist>

	</section>
	<section id="a-clustering-example-for-levels-0-and-1">
	<title>A clustering example for levels 0 and 1</title>
	<para>
	The guarantees and benefits of level 0 and level 1 can be seen
	with some examples. First, let us examine what happens with cluster
	values when shaping involves cluster merging with ligatures and
	decomposition.
	</para>
	<para>
	Let's say we start with the following character sequence (top row) and
	initial cluster values (bottom row):
	</para>
	<programlisting>
	A,B,C,D,E
	0,1,2,3,4
	</programlisting>
	<para>
	During shaping, HarfBuzz maps these characters to glyphs from
	the font. For simplicity, let us assume that each character maps
	to the corresponding, identical-looking glyph:
	</para>
	<programlisting>
	A,B,C,D,E
	0,1,2,3,4
	</programlisting>
	<para>
	Now if, for example, <literal>B</literal> and <literal>C</literal>
	form a ligature, then the clusters to which they belong
	"merge". This merged cluster takes for its cluster
	value the minimum of all the cluster values of the clusters that
	went in to the ligature. In this case, we get:
	</para>
	<programlisting>
	A,BC,D,E
	0,1 ,3,4
	</programlisting>
	<para>
	because 1 is the minimum of the set {1,2}, which were the
	cluster values of <literal>B</literal> and
	<literal>C</literal>.
	</para>
	<para>
	Next, let us say that the <literal>BC</literal> ligature glyph
	decomposes into three components, and <literal>D</literal> also
	decomposes into two components. These components each inherit the
	cluster value of their parent:
	</para>
	<programlisting>
	A,BC0,BC1,BC2,D0,D1,E
	0,1 ,1 ,1 ,3 ,3 ,4
	</programlisting>
	<para>
	Next, if <literal>BC2</literal> and <literal>D0</literal> form a
	ligature, then their clusters (cluster values 1 and 3) merge into
	<literal>min(1,3) = 1</literal>:
	</para>
	<programlisting>
	A,BC0,BC1,BC2D0,D1,E
	0,1 ,1 ,1 ,1 ,4
	</programlisting>
	<para>
	At this point, cluster 1 means: the character sequence
	<literal>BCD</literal> is represented by glyphs
	<literal>BC0,BC1,BC2D0,D1</literal> and cannot be broken down any
	further.
	</para>
	</section>
	<section id="reordering-in-levels-0-and-1">
	<title>Reordering in levels 0 and 1</title>
	<para>
	Another common operation in the more complex shapers is glyph
	reordering. In order to maintain a monotonic cluster sequence
	when glyph reordering takes place, HarfBuzz merges the clusters
	of everything in the reordering sequence.
	</para>
	<para>
	For example, let us again start with the character sequence (top
	row) and initial cluster values (bottom row):
	</para>
	<programlisting>
	A,B,C,D,E
	0,1,2,3,4
	</programlisting>
	<para>
	If <literal>D</literal> is reordered to before <literal>B</literal>,
	then HarfBuzz merges the <literal>B</literal>,
	<literal>C</literal>, and <literal>D</literal> clusters, and we
	get:
	</para>
	<programlisting>
	A,D,B,C,E
	0,1,1,1,4
	</programlisting>
	<para>
	This is clearly not ideal, but it is the only sensible way to
	maintain a monotonic sequence of cluster values and retain the
	true relationship between glyphs and characters.
	</para>
	</section>
	<section id="the-distinction-between-levels-0-and-1">
	<title>The distinction between levels 0 and 1</title>
	<para>
	The preceding examples demonstrate the main effects of using
	cluster levels 0 and 1. The only difference between the two
	levels is this: in level 0, at the very beginning of the shaping
	process, HarfBuzz also merges clusters between any base character
	and all Unicode marks (combining or not) that follow it.
	</para>
	<para>
	For example, let us start with the following character sequence
	(top row) and accompanying initial cluster values (bottom row):
	</para>
	<programlisting>
	A,acute,B
	0,1 ,2
	</programlisting>
	<para>
	The <literal>acute</literal> is a Unicode mark. If HarfBuzz is
	using cluster level 0 on this sequence, then the
	<literal>A</literal> and <literal>acute</literal> clusters will
	merge, and the result will become:
	</para>
	<programlisting>
	A,acute,B
	0,0 ,2
	</programlisting>
	<para>
	This initial cluster merging is the default behavior of the
	Windows shaping engine, and the old HarfBuzz codebase copied
	that behavior to maintain compatibility. Consequently, it has
	remained the default behavior in the new HarfBuzz codebase.
	</para>
	<para>
	But this initial cluster-merging behavior makes it impossible to
	color diacritic marks differently from their base
	characters. That is why, in level 1, HarfBuzz does not perform
	the initial merging step.
	</para>
	<para>
	For client programs that rely on HarfBuzz cluster values to
	perform cursor positioning, level 0 is more convenient. But
	relying on cluster boundaries for cursor positioning is wrong: cursor
	positions should be determined based on Unicode grapheme
	boundaries, not on shaping-cluster boundaries. As such, level 1
	clusters are preferred.
	</para>
	<para>
	One last note about levels 0 and 1. HarfBuzz currently does not allow a
	<literal>MultipleSubst</literal> lookup to replace a glyph with zero
	glyphs (in other words, to delete a glyph). But, in some other situations,
	glyphs can be deleted. In those cases, if the glyph being deleted is
	the last glyph of its cluster, HarfBuzz makes sure to merge the cluster
	with a neighboring cluster.
	</para>
	<para>
	This is done primarily to make sure that the starting cluster of the
	text always has the cluster index pointing to the start of the text
	for the run; more than one client currently relies on this
	guarantee.
	</para>
	<para>
	Incidentally, Apple's CoreText does something else to maintain the
	same promise: it inserts a glyph with id 65535 at the beginning of
	the glyph string if the glyph corresponding to the first character
	in the run was deleted. HarfBuzz might do something similar in the
	future.
	</para>
	</section>
	<section id="level-2">
	<title>Level 2</title>
	<para>
	HarfBuzz's level 2 cluster behavior uses a significantly
	different model than that of level 0 and level 1.
	</para>
	<para>
	The level 2 behavior is easy to describe, but it may be
	difficult to understand in practical terms. In brief, level 2
	performs no merging of clusters whatsoever.
	</para>
	<para>
	When glyphs form a ligature (or when some other feature
	substitutes multiple glyphs with one glyph), the cluster value
	of the first glyph is retained as the cluster value for the
	ligature. However, no subsequent clusters — including
	marks and modifiers — are affected.
	</para>
	<para>
	Level 2 cluster behavior is less complex than level 0 or level
	1, but there are a few cases in which processing cluster values
	produced at level 2 may be tricky.
	</para>
	<section id="ligatures-with-combining-marks-in-level-2">
	<title>Ligatures with combining marks in level 2</title>
	<para>
	The first example of how HarfBuzz's level 2 cluster behavior
	can be tricky is when the text to be shaped includes combining
	marks attached to ligatures.
	</para>
	<para>
	Let us start with an input sequence with the following
	characters (top row) and initial cluster values (bottom row):
	</para>
	<programlisting>
	A,acute,B,breve,C,circumflex
	0,1 ,2,3 ,4,5
	</programlisting>
	<para>
	If the sequence <literal>A,B,C</literal> forms a ligature,
	then these are the cluster values HarfBuzz will return under
	the various cluster levels:
	</para>
	<para>
	Level 0:
	</para>
	<programlisting>
	ABC,acute,breve,circumflex
	0 ,0 ,0 ,0
	</programlisting>
	<para>
	Level 1:
	</para>
	<programlisting>
	ABC,acute,breve,circumflex
	0 ,0 ,0 ,5
	</programlisting>
	<para>
	Level 2:
	</para>
	<programlisting>
	ABC,acute,breve,circumflex
	0 ,1 ,3 ,5
	</programlisting>
	<para>
	Making sense of the level 2 result is the hardest for a client
	program, because there is nothing in the cluster values that
	indicates that <literal>B</literal> and <literal>C</literal>
	formed a ligature with <literal>A</literal>.
	</para>
	<para>
	In contrast, the "merged" cluster values of the mark glyphs
	that are seen in the level 0 and level 1 output are evidence
	that a ligature substitution took place.
	</para>
	</section>
	<section id="reordering-in-level-2">
	<title>Reordering in level 2</title>
	<para>
	Another example of how HarfBuzz's level 2 cluster behavior
	can be tricky is when glyphs reorder. Consider an input sequence
	with the following characters (top row) and initial cluster
	values (bottom row):
	</para>
	<programlisting>
	A,B,C,D,E
	0,1,2,3,4
	</programlisting>
	<para>
	Now imagine <literal>D</literal> moves before
	<literal>B</literal> in a reordering operation. The cluster
	values will then be:
	</para>
	<programlisting>
	A,D,B,C,E
	0,3,1,2,4
	</programlisting>
	<para>
	Next, if <literal>D</literal> forms a ligature with
	<literal>B</literal>, the output is:
	</para>
	<programlisting>
	A,DB,C,E
	0,3 ,2,4
	</programlisting>
	<para>
	However, in a different scenario, in which the shaping rules
	of the script instead caused <literal>A</literal> and
	<literal>B</literal> to form a ligature
	<emphasis>before</emphasis> the <literal>D</literal> reordered, the
	result would be:
	</para>
	<programlisting>
	AB,D,C,E
	0 ,3,2,4
	</programlisting>
	<para>
	There is no way for a client program to differentiate between
	these two scenarios based on the cluster values
	alone. Consequently, client programs that use level 2 might
	need to undertake additional work in order to manage cursor
	positioning, text attributes, or other desired features.
	</para>
	</section>
	<section id="other-considerations-in-level-2">
	<title>Other considerations in level 2</title>
	<para>
	There may be other problems encountered with ligatures under
	level 2, such as if the direction of the text is forced to
	opposite of its natural direction (for example, left-to-right
	Arabic). But, generally speaking, these other scenarios are
	minor corner cases that are too obscure for most client
	programs to need to worry about.
	</para>
	</section>
	</section>
	</chapter>