docs/userguide/strings/stringprep.md - external/github.com/unicode-org/icu - Git at Google

 ---
 layout: default
 title: StringPrep
 nav_order: 7
 parent: Chars and Strings
 ---
 <!--
 © 2020 and later: Unicode, Inc. and others.
 License & terms of use: http://www.unicode.org/copyright.html
 -->

 # StringPrep

 ## Overview

 Comparing strings in a consistent manner becomes imperative when a large
 repertoire of characters such as Unicode is used in network protocols.
 StringPrep provides sets of rules for use of Unicode and syntax for prevention
 of spoofing. The implementation of StringPrep and IDNA services and their usage
 in ICU is described below.

 ## StringPrep

 StringPrep, the process of preparing Unicode strings for use in network
 protocols is defined in RFC 3454 (<http://www.rfc-editor.org/rfc/rfc3454.txt> ).
 The RFC defines a broad framework and rules for processing the strings.

 Protocols that prescribe use of StringPrep must define a profile of StringPrep,
 whose applicability is limited to the protocol. Profiles are a set of rules and
 data tables which describe the how the strings should be prepare. The profiles
 can choose to turn on or turn off normalization, checking for bidirectional
 characters. They can also choose to add or remove mappings, unassigned and
 prohibited code points from the tables provided.

 StringPrep uses Unicode Version 3.2 and defines a set of tables for use by the
 profiles. The profiles can chose to include or exclude tables or code points
 from the tables defined by the RFC.

 StringPrep defines tables that can be broadly classified into

 1.  *Unassigned Table*: Contains code points that are unassigned in Unicode
     Version 3.2. Unassigned code points may be allowed or disallowed in the
     output string depending on the application. The table in Appendix A.1 of the
     RFC contains the code points.

 1.  *Mapping Tables*: Code points that are commonly deleted from the output and
     code points that are case mapped are included in this table. There are two
     mapping tables in the Appendix namely B.1 and B.2

 2.  *Prohibited Tables*: Contains code points that are prohibited from the
     output string. Control codes, private use area code points, non-character
     code points, surrogate code points, tagging and deprecated code points are
     included in this table. There are nine mapping tables in Appendix which
     include the prohibited code points namely C.1, C.2, C.3, C.4, C.5, C.6, C.7,
     C.8 and C.9.

 The procedure for preparing strings for use can be described in the following
 steps:

 1.  *Map*: For each code point in the input check if it has a mapping defined in
     the mapping table, if so, replace it with the mapping in the output.

 2.  *Normalize*: Normalize the output of step 1 using Unicode Normalization Form
     NFKC, it the option is set. Normalization algorithm must conform to UAX 15.

 3.  *Prohibit*: For each code point in the output of step 2 check if the code
     point is present in the prohibited table, if so, fail returning an error.

 4.  *Check BiDi*: Check for code points with strong right-to-left directionality
     in the output of step 3. If present, check if the string satisfies the rules
     for bidirectional strings as specified.

 ## NamePrep

 NamePrep is a profile of StringPrep for use in IDNA. This profile in defined in
 RFC 3491(<http://www.rfc-editor.org/rfc/rfc3491.txt> ).

 The profile specifies the following rules:

 1.  *Map* : Include all code point mappings specified in the StringPrep.

 2.  *Normalize*: Normalize the output of step 1 according to NFKC.

 3.  *Prohibit*: Prohibit all code points specified as prohibited in StringPrep
     except for the space ( U+0020) code point from the output of step 2.

 4.  *Check BiDi*: Check for bidirectional code points and process according to
     the rules specified in StringPrep.

 ## Punycode

 Punycode is an encoding scheme for Unicode for use in IDNA. Punycode converts
 Unicode text to unique sequence of ASCII text and back to Unicode. It is an
 ASCII Compatible Encoding (ACE). Punycode is described in RFC 3492
 (<http://www.rfc-editor.org/rfc/rfc3492.txt> ).

 The Punycode algorithm is a form of a general Bootstring algorithm which allows
 strings composed of smaller set of code points to uniquely represent any string
 of code points from a larger set. Punycode represents Unicode code points from
 U+0000 to U+10FFFF by using the smaller ASCII set U+0000 to U+0007F. The
 algorithm can also preserve case information of the code points in the lager set
 while and encoding and decoding. This feature, however, is not used in IDNA.

 ## Internationalizing Domain Names in Applications (IDNA)

 The Domain Name Service (DNS) protocol defines the procedure for matching of
 ASCII strings case insensitively to the names in the lookup tables containing
 mapping of IP (Internet Protocol) addresses to server names. When Unicode is
 used instead of ASCII in server names then two problems arise which need to be
 dealt with differently. When the server name is displayed to the user then
 Unicode text should be displayed. When Unicode text is stored in lookup tables,
 for compatibility with older DNS protocol and the resolver libraries, the text
 should be the ASCII equivalent. The IDNA protocol, defined by RFC 3490
 (<http://www.rfc-editor.org/rfc/rfc3490.txt> ), satisfies the above
 requirements.

 Server names stored in the DNS lookup tables are usually formed by concatenating
 domain labels with a label separator, for example:

 The protocol defines operations to be performed on domain labels before the
 names are stored in the lookup tables and before the names fetched from lookup
 tables are displayed to the user. The operations are :

 1.  ToASCII: This operation is performed on domain labels before sending the
     name to a resolver and before storing the name in the DNS lookup table. The
     domain labels are processed by StringPrep algorithm by using the rules
     specified by NamePrep profile. The output of this step is then encoded by
     using Punycode and an ACE prefix is added to denote that the text is encoded
     using Punycode. IDNA uses “xn--” before the encoded label.

 1.  ToUnicode: This operation is performed on domain labels before displaying
     the names to to users. If the domain label is prefixed with the ACE prefix
     for IDNA, then the label excluding the prefix is decoded using Punycode. The
     output of Punycode decoder is verified by applying ToASCII operation and
     comparing the output with the input to the ToUnicode operation.

 Unicode contains code points that are glyphically similar to the ASCII Full Stop
 (U+002E). These code points must be treated as label separators when performing
 ToASCII operation. These code points are :

 1.  Ideographic Full Stop (U+3002)

 2.  Full Width Full Stop (U+FF0E)

 3.  Half Width Ideographic Full Stop (U+FF61)

 Unassigned code points in Unicode Version 3.2 as given in StringPrep tables are
 treated differently depending on how the processed string is used. For query
 operations, where a registrar is requested for information regarding
 availability of a certain domain name, unassigned code points are allowed to be
 present in the string. For storing the string in DNS lookup tables, unassigned
 code points are prohibited from the input.

 IDNA specifies that the ToUnicode and ToASCII have options to check for
 Letter-Digit-Hyphen code points and adhere to the STD3 ASCII Rules.

 IDNA specifies that domain labels are equivalent if and only if the output of
 ToASCII operation on the labels match using case insensitive ASCII comparison.

 ## StringPrep Service in ICU

 The StringPrep service in ICU is data driven. The service is based on
 Open-Use-Close pattern. A StringPrep profile is opened, the strings are
 processed according to the rules specified in the profile and the profile is
 closed once the profile is ready to be disposed.

 Tools for filtering RFC 3454 and producing a rule file that can be compiled into
 a binary format containing all the information required by the service are
 provided.

 The procedure for producing a StringPrep profile data file are as given below:

 1.  Run filterRFC3454.pl Perl tool, to filter the RFC file and produce a rule
     file. The text file produced can be edited by the clients to add/delete
     mappings or add/delete prohibited code points.

 2.  Run the gensprep tool to compile the rule file into a binary format. The
     options to turn on normalization of strings and checking of bidirectional
     code points are passed as command line options to the tool. This tool
     produces a binary profile file with the extension “spp”.

 3.  Open the StringPrep profile with path to the binary and name of the binary
     profile file as the options to the open call. The profile data files are
     memory mapped and cached for optimum performance.

 ### Code Snippets

 > :point_right: **Note**: The code snippets demonstrate the usage of the APIs. Applications should
 keep the profile object around for reuse, instead of opening and closing the
 profile each time.*

 #### C++

     UErrorCode status = U_ZERO_ERROR;
     UParseError parseError;
     /* open the StringPrep profile */
     UStringPrepProfile* nameprep = usprep_open("/usr/joe/mydata",
                                                "nfscsi", &status);
     if(U_FAILURE(status)) {
         /* handle the error */
     }
     /* prepare the string for use according
      * to the rules specified in the profile
      */
     int32_t retLen = usprep_prepare(src, srcLength, dest,
                                     destCapacity, USPREP_ALLOW_UNASSIGNED,
                                     nameprep, &parseError, &status);
     /* close the profile */
     usprep_close(nameprep);

 #### Java

     private static final StringPrep nfscsi = null;
     //singleton instance
     private static final NFSCSIStringPrep prep=new NFSCSIStringPrep();
     private NFSCSIStringPrep() {
         try {
             InputStream nfscsiFile = TestUtil.getDataStream("nfscsi.spp");
             nfscsi = new StringPrep(nfscsiFile);
             nfscsiFile.close();
         } catch(IOException e) {
             throw new RuntimeException(e.toString());
         }
     }
     private static byte[] prepare(byte[] src, StringPrep prep)
             throws StringPrepParseException, UnsupportedEncodingException {
         String s = new String(src, "UTF-8");
         UCharacterIterator iter = UCharacterIterator.getInstance(s);
         StringBuffer out = prep.prepare(iter,StringPrep.DEFAULT);
         return out.toString().getBytes("UTF-8");
     }

 ## IDNA API in ICU

 ICU provides APIs for performing the ToASCII, ToUnicode and compare operations
 as defined by the RFC 3490. Convenience methods for comparing IDNs are also
 provided. These APIs follow ICU policies for string manipulation and coding
 guidelines.

 ### Code Snippets

 > :point_right: **Note**: The code snippets demonstrate the usage of the APIs. Applications should
 keep the profile object around for reuse, instead of opening and closing the
 profile each time.*

 ### ToASCII operation

 ***C***

     UChar* dest = (UChar*) malloc(destCapacity * U_SIZEOF_UCHAR);
     destLen = uidna_toASCII(src, srcLen, dest, destCapacity,
                             UIDNA_DEFAULT, &parseError, &status);
     if(status == U_BUFFER_OVERFLOW_ERROR) {
         status = U_ZERO_ERROR;
         destCapacity= destLen + 1; /* for the terminating Null */
         free(dest); /* free the memory */
         dest = (UChar*) malloc(destLen * U_SIZEOF_UCHAR);
         destLen = uidna_toASCII(src, srcLen, dest, destCapacity,
                                 UIDNA_DEFAULT, &parseError, &status);
     }
     if(U_FAILURE(status)) {
         /* handle the error */
     }
     /* do interesting stuff with output*/

 ***Java***

     try {
         StringBuffer out= IDNA.convertToASCII(inBuf,IDNA.DEFAULT);
     } catch(StringPrepParseException ex) {
         /*handle the exception*/
     }

 ### toUnicode operation

 ***C***

     UChar * dest = (UChar *) malloc(destCapacity * U_SIZEOF_UCHAR);
     destLen = uidna_toUnicode(src, srcLen, dest, destCapacity,
                               UIDNA_DEFAULT
                               &parseError, &status);
     if(status == U_BUFFER_OVERFLOW_ERROR) {
         status = U_ZERO_ERROR;
         destCapacity= destLen + 1; /* for the terminating Null */
         /* free the memory */
         free(dest);
         dest = (UChar*) malloc(destLen * U_SIZEOF_UCHAR);
         destLen = uidna_toUnicode(src, srcLen, dest, destCapacity,
                                   UIDNA_DEFAULT, &parseError, &status);
     }
     if(U_FAILURE(status)) {
         /* handle the error */
     }
     /* do interesting stuff with output*/

 ***Java***

     try {
         StringBuffer out= IDNA.convertToUnicode(inBuf,IDNA.DEFAULT);
     } catch(StringPrepParseException ex) {
         // handle the exception
     }

 ### compare operation

 ***C***

     int32_t rc = uidna_compare(source1, length1,
                                source2, length2,
                                UIDNA_DEFAULT,
                                &status);
     if(rc==0) {
         /* the IDNs are same ... do something interesting */
     } else {
         /* the IDNs are different ... do something */
     }

 ***Java***

     try {
         int retVal = IDNA.compare(s1,s2,IDNA.DEFAULT);
         // do something interesting with retVal
     } catch(StringPrepParseException e) {
        // handle the exception
     }

 ## Design Considerations

 StringPrep profiles exhibit the following characteristics:

 1.  The profiles contain information about code points. StringPrep allows
     profiles to add/delete code points or mappings.

 2.  Options such as turning normalization and checking for bidirectional code
     points on or off are the properties of the profiles.

 3.  The StringPrep algorithm is not overridden by the profile.

 4.  Once defined, the profiles do not change.

 The StringPrep profiles are used in network protocols so runtime performance is
 important.

 Many profiles have been and are being defined, so applications should be able to
 plug-in arbitrary profiles and get the desired result out of the framework.

 ICU is designed for this usage by providing build-time tools for arbitrary
 StringPrep profile definitions, and loading them from application-supplied data
 in binary form with data structures optimized for runtime use.

 ## Demo

 A web application at <https://icu4c-demos.unicode.org/icu-bin/idnbrowser>
 illustrates the use of IDNA API. The source code for the application is
 available at <https://github.com/unicode-org/icu-demos/tree/main/idnbrowser>.

 ## Appendix

 #### NFS Version 4 Profiles

 Network File System Version 4 defined by RFC 3530
 (<http://www.rfc-editor.org/rfc/rfc3530.txt> ) defines use of Unicode text in
 the protocol. ICU provides the requisite profiles as part of test suite and code
 for processing the strings according the profiles as a part of samples.

 The RFC defines three profiles :

 1.  *nfs4_cs_prep Profile*: This profile is used for preparing file and path
     name strings. Normalization of code points and checking for bidirectional
     code points are turned off. Case mappings are included if the NFS
     implementation supports case insensitive file and path names.

 2.  *nfs4_cis_prep Profile*: This profile is used for preparing NFS server
     names. Normalization of code points and checking for bidirectional code
     points are turned on. This profile is equivalent to NamePrep profile.

 3.  *nfs4_mixed_prep Profile*: This profile is used for preparing strings in the
     Access Control Entries of NFS servers. These strings consist of two parts,
     prefix and suffix, separated by '@' (U+0040). The prefix is processed with
     case mappings turned off and the suffix is processed with case mappings
     turned on. Normalization of code points and checking for bidirectional code
     points are turned on.

 #### XMPP Profiles

 Extensible Messaging and Presence Protocol (XMPP) is an XML based protocol for
 near real-time extensible messaging and presence. This protocol defines use of
 two StringPrep profiles:

 1.  *ResourcePrep Profile*: This profile is used for processing the resource
     identifiers within XMPP. Normalization of code points and checking of
     bidirectional code points are turned on. Case mappings are excluded. The
     space code point (U+0020) is excluded from the prohibited code points set.

 2.  *NodePrep Profile*: This profile is used for processing the node identifiers
     within XMPP. Normalization of code points and checking of bidirectional code
     points are turned on. Case mappings are included. All code points specified
     as prohibited in StringPrep are prohibited. Additional code points are added
     to the prohibited set.
	---
	layout: default
	title: StringPrep
	nav_order: 7
	parent: Chars and Strings
	---
	<!--
	© 2020 and later: Unicode, Inc. and others.
	License & terms of use: http://www.unicode.org/copyright.html
	-->

	# StringPrep

	## Overview

	Comparing strings in a consistent manner becomes imperative when a large
	repertoire of characters such as Unicode is used in network protocols.
	StringPrep provides sets of rules for use of Unicode and syntax for prevention
	of spoofing. The implementation of StringPrep and IDNA services and their usage
	in ICU is described below.

	## StringPrep

	StringPrep, the process of preparing Unicode strings for use in network
	protocols is defined in RFC 3454 (<http://www.rfc-editor.org/rfc/rfc3454.txt> ).
	The RFC defines a broad framework and rules for processing the strings.

	Protocols that prescribe use of StringPrep must define a profile of StringPrep,
	whose applicability is limited to the protocol. Profiles are a set of rules and
	data tables which describe the how the strings should be prepare. The profiles
	can choose to turn on or turn off normalization, checking for bidirectional
	characters. They can also choose to add or remove mappings, unassigned and
	prohibited code points from the tables provided.

	StringPrep uses Unicode Version 3.2 and defines a set of tables for use by the
	profiles. The profiles can chose to include or exclude tables or code points
	from the tables defined by the RFC.

	StringPrep defines tables that can be broadly classified into

	1. Unassigned Table: Contains code points that are unassigned in Unicode
	Version 3.2. Unassigned code points may be allowed or disallowed in the
	output string depending on the application. The table in Appendix A.1 of the
	RFC contains the code points.

	1. Mapping Tables: Code points that are commonly deleted from the output and
	code points that are case mapped are included in this table. There are two
	mapping tables in the Appendix namely B.1 and B.2

	2. Prohibited Tables: Contains code points that are prohibited from the
	output string. Control codes, private use area code points, non-character
	code points, surrogate code points, tagging and deprecated code points are
	included in this table. There are nine mapping tables in Appendix which
	include the prohibited code points namely C.1, C.2, C.3, C.4, C.5, C.6, C.7,
	C.8 and C.9.

	The procedure for preparing strings for use can be described in the following
	steps:

	1. Map: For each code point in the input check if it has a mapping defined in
	the mapping table, if so, replace it with the mapping in the output.

	2. Normalize: Normalize the output of step 1 using Unicode Normalization Form
	NFKC, it the option is set. Normalization algorithm must conform to UAX 15.

	3. Prohibit: For each code point in the output of step 2 check if the code
	point is present in the prohibited table, if so, fail returning an error.

	4. Check BiDi: Check for code points with strong right-to-left directionality
	in the output of step 3. If present, check if the string satisfies the rules
	for bidirectional strings as specified.

	## NamePrep

	NamePrep is a profile of StringPrep for use in IDNA. This profile in defined in
	RFC 3491(<http://www.rfc-editor.org/rfc/rfc3491.txt> ).

	The profile specifies the following rules:

	1. Map : Include all code point mappings specified in the StringPrep.

	2. Normalize: Normalize the output of step 1 according to NFKC.

	3. Prohibit: Prohibit all code points specified as prohibited in StringPrep
	except for the space ( U+0020) code point from the output of step 2.

	4. Check BiDi: Check for bidirectional code points and process according to
	the rules specified in StringPrep.

	## Punycode

	Punycode is an encoding scheme for Unicode for use in IDNA. Punycode converts
	Unicode text to unique sequence of ASCII text and back to Unicode. It is an
	ASCII Compatible Encoding (ACE). Punycode is described in RFC 3492
	(<http://www.rfc-editor.org/rfc/rfc3492.txt> ).

	The Punycode algorithm is a form of a general Bootstring algorithm which allows
	strings composed of smaller set of code points to uniquely represent any string
	of code points from a larger set. Punycode represents Unicode code points from
	U+0000 to U+10FFFF by using the smaller ASCII set U+0000 to U+0007F. The
	algorithm can also preserve case information of the code points in the lager set
	while and encoding and decoding. This feature, however, is not used in IDNA.

	## Internationalizing Domain Names in Applications (IDNA)

	The Domain Name Service (DNS) protocol defines the procedure for matching of
	ASCII strings case insensitively to the names in the lookup tables containing
	mapping of IP (Internet Protocol) addresses to server names. When Unicode is
	used instead of ASCII in server names then two problems arise which need to be
	dealt with differently. When the server name is displayed to the user then
	Unicode text should be displayed. When Unicode text is stored in lookup tables,
	for compatibility with older DNS protocol and the resolver libraries, the text
	should be the ASCII equivalent. The IDNA protocol, defined by RFC 3490
	(<http://www.rfc-editor.org/rfc/rfc3490.txt> ), satisfies the above
	requirements.

	Server names stored in the DNS lookup tables are usually formed by concatenating
	domain labels with a label separator, for example:

	The protocol defines operations to be performed on domain labels before the
	names are stored in the lookup tables and before the names fetched from lookup
	tables are displayed to the user. The operations are :

	1. ToASCII: This operation is performed on domain labels before sending the
	name to a resolver and before storing the name in the DNS lookup table. The
	domain labels are processed by StringPrep algorithm by using the rules
	specified by NamePrep profile. The output of this step is then encoded by
	using Punycode and an ACE prefix is added to denote that the text is encoded
	using Punycode. IDNA uses “xn--” before the encoded label.

	1. ToUnicode: This operation is performed on domain labels before displaying
	the names to to users. If the domain label is prefixed with the ACE prefix
	for IDNA, then the label excluding the prefix is decoded using Punycode. The
	output of Punycode decoder is verified by applying ToASCII operation and
	comparing the output with the input to the ToUnicode operation.

	Unicode contains code points that are glyphically similar to the ASCII Full Stop
	(U+002E). These code points must be treated as label separators when performing
	ToASCII operation. These code points are :

	1. Ideographic Full Stop (U+3002)

	2. Full Width Full Stop (U+FF0E)

	3. Half Width Ideographic Full Stop (U+FF61)

	Unassigned code points in Unicode Version 3.2 as given in StringPrep tables are
	treated differently depending on how the processed string is used. For query
	operations, where a registrar is requested for information regarding
	availability of a certain domain name, unassigned code points are allowed to be
	present in the string. For storing the string in DNS lookup tables, unassigned
	code points are prohibited from the input.

	IDNA specifies that the ToUnicode and ToASCII have options to check for
	Letter-Digit-Hyphen code points and adhere to the STD3 ASCII Rules.

	IDNA specifies that domain labels are equivalent if and only if the output of
	ToASCII operation on the labels match using case insensitive ASCII comparison.

	## StringPrep Service in ICU

	The StringPrep service in ICU is data driven. The service is based on
	Open-Use-Close pattern. A StringPrep profile is opened, the strings are
	processed according to the rules specified in the profile and the profile is
	closed once the profile is ready to be disposed.

	Tools for filtering RFC 3454 and producing a rule file that can be compiled into
	a binary format containing all the information required by the service are
	provided.

	The procedure for producing a StringPrep profile data file are as given below:

	1. Run filterRFC3454.pl Perl tool, to filter the RFC file and produce a rule
	file. The text file produced can be edited by the clients to add/delete
	mappings or add/delete prohibited code points.

	2. Run the gensprep tool to compile the rule file into a binary format. The
	options to turn on normalization of strings and checking of bidirectional
	code points are passed as command line options to the tool. This tool
	produces a binary profile file with the extension “spp”.

	3. Open the StringPrep profile with path to the binary and name of the binary
	profile file as the options to the open call. The profile data files are
	memory mapped and cached for optimum performance.

	### Code Snippets

	> :point_right: Note: The code snippets demonstrate the usage of the APIs. Applications should
	keep the profile object around for reuse, instead of opening and closing the
	profile each time.*

	#### C++

	UErrorCode status = U_ZERO_ERROR;
	UParseError parseError;
	/* open the StringPrep profile */
	UStringPrepProfile* nameprep = usprep_open("/usr/joe/mydata",
	"nfscsi", &status);
	if(U_FAILURE(status)) {
	/* handle the error */
	}
	/* prepare the string for use according
	* to the rules specified in the profile
	*/
	int32_t retLen = usprep_prepare(src, srcLength, dest,
	destCapacity, USPREP_ALLOW_UNASSIGNED,
	nameprep, &parseError, &status);
	/* close the profile */
	usprep_close(nameprep);

	#### Java

	private static final StringPrep nfscsi = null;
	//singleton instance
	private static final NFSCSIStringPrep prep=new NFSCSIStringPrep();
	private NFSCSIStringPrep() {
	try {
	InputStream nfscsiFile = TestUtil.getDataStream("nfscsi.spp");
	nfscsi = new StringPrep(nfscsiFile);
	nfscsiFile.close();
	} catch(IOException e) {
	throw new RuntimeException(e.toString());
	}
	}
	private static byte[] prepare(byte[] src, StringPrep prep)
	throws StringPrepParseException, UnsupportedEncodingException {
	String s = new String(src, "UTF-8");
	UCharacterIterator iter = UCharacterIterator.getInstance(s);
	StringBuffer out = prep.prepare(iter,StringPrep.DEFAULT);
	return out.toString().getBytes("UTF-8");
	}

	## IDNA API in ICU

	ICU provides APIs for performing the ToASCII, ToUnicode and compare operations
	as defined by the RFC 3490. Convenience methods for comparing IDNs are also
	provided. These APIs follow ICU policies for string manipulation and coding
	guidelines.

	### Code Snippets

	> :point_right: Note: The code snippets demonstrate the usage of the APIs. Applications should
	keep the profile object around for reuse, instead of opening and closing the
	profile each time.*

	### ToASCII operation

	*C*

	UChar* dest = (UChar) malloc(destCapacity U_SIZEOF_UCHAR);
	destLen = uidna_toASCII(src, srcLen, dest, destCapacity,
	UIDNA_DEFAULT, &parseError, &status);
	if(status == U_BUFFER_OVERFLOW_ERROR) {
	status = U_ZERO_ERROR;
	destCapacity= destLen + 1; /* for the terminating Null */
	free(dest); /* free the memory */
	dest = (UChar) malloc(destLen U_SIZEOF_UCHAR);
	destLen = uidna_toASCII(src, srcLen, dest, destCapacity,
	UIDNA_DEFAULT, &parseError, &status);
	}
	if(U_FAILURE(status)) {
	/* handle the error */
	}
	/* do interesting stuff with output*/

	*Java*

	try {
	StringBuffer out= IDNA.convertToASCII(inBuf,IDNA.DEFAULT);
	} catch(StringPrepParseException ex) {
	/handle the exception/
	}

	### toUnicode operation

	*C*

	UChar * dest = (UChar ) malloc(destCapacity U_SIZEOF_UCHAR);
	destLen = uidna_toUnicode(src, srcLen, dest, destCapacity,
	UIDNA_DEFAULT
	&parseError, &status);
	if(status == U_BUFFER_OVERFLOW_ERROR) {
	status = U_ZERO_ERROR;
	destCapacity= destLen + 1; /* for the terminating Null */
	/* free the memory */
	free(dest);
	dest = (UChar) malloc(destLen U_SIZEOF_UCHAR);
	destLen = uidna_toUnicode(src, srcLen, dest, destCapacity,
	UIDNA_DEFAULT, &parseError, &status);
	}
	if(U_FAILURE(status)) {
	/* handle the error */
	}
	/* do interesting stuff with output*/

	*Java*

	try {
	StringBuffer out= IDNA.convertToUnicode(inBuf,IDNA.DEFAULT);
	} catch(StringPrepParseException ex) {
	// handle the exception
	}

	### compare operation

	*C*

	int32_t rc = uidna_compare(source1, length1,
	source2, length2,
	UIDNA_DEFAULT,
	&status);
	if(rc==0) {
	/* the IDNs are same ... do something interesting */
	} else {
	/* the IDNs are different ... do something */
	}

	*Java*

	try {
	int retVal = IDNA.compare(s1,s2,IDNA.DEFAULT);
	// do something interesting with retVal
	} catch(StringPrepParseException e) {
	// handle the exception
	}

	## Design Considerations

	StringPrep profiles exhibit the following characteristics:

	1. The profiles contain information about code points. StringPrep allows
	profiles to add/delete code points or mappings.

	2. Options such as turning normalization and checking for bidirectional code
	points on or off are the properties of the profiles.

	3. The StringPrep algorithm is not overridden by the profile.

	4. Once defined, the profiles do not change.

	The StringPrep profiles are used in network protocols so runtime performance is
	important.

	Many profiles have been and are being defined, so applications should be able to
	plug-in arbitrary profiles and get the desired result out of the framework.

	ICU is designed for this usage by providing build-time tools for arbitrary
	StringPrep profile definitions, and loading them from application-supplied data
	in binary form with data structures optimized for runtime use.

	## Demo

	A web application at <https://icu4c-demos.unicode.org/icu-bin/idnbrowser>
	illustrates the use of IDNA API. The source code for the application is
	available at <https://github.com/unicode-org/icu-demos/tree/main/idnbrowser>.

	## Appendix

	#### NFS Version 4 Profiles

	Network File System Version 4 defined by RFC 3530
	(<http://www.rfc-editor.org/rfc/rfc3530.txt> ) defines use of Unicode text in
	the protocol. ICU provides the requisite profiles as part of test suite and code
	for processing the strings according the profiles as a part of samples.

	The RFC defines three profiles :

	1. nfs4_cs_prep Profile: This profile is used for preparing file and path
	name strings. Normalization of code points and checking for bidirectional
	code points are turned off. Case mappings are included if the NFS
	implementation supports case insensitive file and path names.

	2. nfs4_cis_prep Profile: This profile is used for preparing NFS server
	names. Normalization of code points and checking for bidirectional code
	points are turned on. This profile is equivalent to NamePrep profile.

	3. nfs4_mixed_prep Profile: This profile is used for preparing strings in the
	Access Control Entries of NFS servers. These strings consist of two parts,
	prefix and suffix, separated by '@' (U+0040). The prefix is processed with
	case mappings turned off and the suffix is processed with case mappings
	turned on. Normalization of code points and checking for bidirectional code
	points are turned on.

	#### XMPP Profiles

	Extensible Messaging and Presence Protocol (XMPP) is an XML based protocol for
	near real-time extensible messaging and presence. This protocol defines use of
	two StringPrep profiles:

	1. ResourcePrep Profile: This profile is used for processing the resource
	identifiers within XMPP. Normalization of code points and checking of
	bidirectional code points are turned on. Case mappings are excluded. The
	space code point (U+0020) is excluded from the prohibited code points set.

	2. NodePrep Profile: This profile is used for processing the node identifiers
	within XMPP. Normalization of code points and checking of bidirectional code
	points are turned on. Case mappings are included. All code points specified
	as prohibited in StringPrep are prohibited. Additional code points are added
	to the prohibited set.