blob: 7ff70f91c87de3f1c0f3068d429d4e571d97e07e [file] [log] [blame] [view]
---
layout: default
title: Formatting
nav_order: 1000
has_children: true
---
<!--
© 2020 and later: Unicode, Inc. and others.
License & terms of use: http://www.unicode.org/copyright.html
-->
# Formatting and Parsing
{: .no_toc }
## Contents
{: .no_toc .text-delta }
1. TOC
{:toc}
---
## Overview
Formatters translate between binary data and human-readable textual
representations of these values. For example, you cannot display the computer
representation of the number 103. You can only display the numeral 103 as a
textual representation (using three text characters). The result from a
formatter is a string that contains text that the user will recognize as
representing the internal value. A formatter can also parse a string by
converting a textual representation of some value back into its internal
representation. For example, it reads the characters 1, 0 and 3 followed by
something other than a digit, and produces the value 103 as an internal binary
representation.
These classes encapsulate information about the display of localized times,
days, numbers, currencies, and messages. Formatting classes do both formatting
and parsing and allow the separation of the data that the end-user sees from the
code. Separating the program code from the data allows a program to be more
easily localized. Formatting is converting a date, time, number, message or
other object from its internal representation into a string. Parsing is the
reverse operation. It is the process of converting a string to an internal
representation of the date, time, number, message or other object.
Using the formatting classes is an important step in internationalizing your
software because the `format()` and `parse()` methods in each of the classes make
your software language neutral, by replacing implicit conversions with explicit
formatting calls.
## Internationalization Formatting Tips
This section discusses some of the ways you can format and parse numbers,
currencies, dates, times and text messages in your program so that the data is
separate from the code and can be easily localized. This is the information your
users see on their computer screens, so it needs to be in a language and format
that conforms to their local conventions.
Some things you need to keep in mind while you are creating your code are the
following:
* Keep your code and your data separate
* Format the data in a locale-sensitive manner
* Keep your code locale-independent
* Avoid writing special routines to handle specific locales
* String objects formatted by `format()` are parseable by the `parse()` method\*
> :point_right: **Note**: Although parsing is supported in several legacy ICU APIs,
it is generally considered bad practice to parse localized strings.
For more information, read [Why You Should Not Parse
Localized Strings](https://blog.sffc.xyz/post/190943794505/why-you-should-not-parse-localized-strings).
### Numbers and Currencies
Programs store and operate on numbers using a locale-independent binary
representation. When displaying or printing a number it is converted to a
locale-specific string. For example, the number 12345.67 is "12,345.67" in the
US, "12 345,67" in France and "12.345,67" in Germany.
By invoking the methods provided by the `NumberFormat` class, you can format
numbers, currencies, and percentages according to the specified or default
locale. `NumberFormat` is locale-sensitive so you need to create a new
`NumberFormat` for each locale. `NumberFormat` methods format primitive-type
numbers, such as double and output the number as a locale-specific string.
For currencies you call `getCurrencyInstance` to create a formatter that returns a
string with the formatted number and the appropriate currency sign. Of course,
the `NumberFormat` class is unaware of exchange rates so, the number output is the
same regardless of the specified currency. This means that the same number has
different monetary values depending on the currency locale. If the number is
9988776.65 the results will be:
* 9 988 776,65 € in France
* 9.988.776,65 € in Germany
* $9,988,776.65 in the United States
In order to format percentages, create a locale-specific formatter and call the
`getPercentInstance` method. With this formatter, a decimal fraction such as 0.75
is displayed as 75%.
#### Customizing Number Formats
If you need to customize a number format you can use the `DecimalFormat` and
the `DecimalFormatSymbols` classes in the [Formatting
Numbers](numbers/index#formatting-numbers) chapter. This not usually necessary and
it makes your code much more complex, but it is available for those rare
instances where you need it. In general, you would do this by explicitly
specifying the number format pattern.
If you need to format or parse spelled-out numbers, you can use the
`RuleBasedNumberFormat` class (see the [Formatting Numbers](numbers/index#formatting-numbers) chapter).
You can instantiate a default formatter for a locale, or by using the
`RuleBasedNumberFormat` rule syntax, specify your own.
Using `NumberFormat` class methods (see the [Formatting Numbers](numbers/index#formatting-numbers) chapter)
with a predefined locale is the easiest and the most accurate way to format numbers, and currencies.
> :point_right: **Note**: *See [Properties and ICU Rule Syntax](../strings/properties) for
information regarding syntax characters.*
### Date and Times
You display or print a Date by first converting it to a locale-specific string
that conforms to the conventions of the end user's Locale. For example, Germans
recognize 20.4.98 as a valid date, and Americans recognize 4/20/98.
> :point_right: **Note**: *The appropriate Calendar support is required for different locales. For
example, the Buddhist calendar is the official calendar in Thailand so the
typical assumption of Gregorian Calendar usage should not be used. ICU will pick
the appropriate Calendar based on the locale you supply when opening a `Calendar`
or `DateFormat`.*
### Messages
Message format helps make the order of display elements localizable. It helps
address problems of grammatical differences in languages. For example, consider
the sentence, "I go to work by car everyday." In Japanese, the grammar
equivalent can be "Everyday, I to work by car go." Another example will be the
plurals in text, for example, "no space for rent, one room for rent and many
rooms for rent," where "for rent" is the only constant text among the three.
## Formatting and Parsing Classes
ICU provides four major areas and twelve classes for formatting numbers, dates
and messages:
### General Formatting
* `Format`:
The abstract superclass of all format classes. It provides the basic methods
for formatting and parsing numbers, dates, strings and other objects.
* `FieldPosition`:
A concrete class for holding the field constant and the begin and end
indices for number and date fields.
* `ParsePosition`:
A concrete class for holding the parse position in a string during parsing.
* `Formattable`:
`Formattable` objects can be passed to the `Format` class or its subclasses for
formatting. It encapsulates a polymorphic piece of data to be formatted and
is used with `MessageFormat`. `Formattable` is used by some formatting
operations to provide a single "type" that encompasses all formattable
values (e.g., it can hold a number, a date, or a string, and so on).
* `UParseError`:
`UParseError` is used to returned detailed information about parsing errors.
It is used by the ICU parsing engines that parse long rules, patterns, or
programs. This is helpful when the text being parsed is long enough that
more information than a `UErrorCode` is needed to localize the error.
**Formatting Numbers**
* [`NumberFormat`](numbers/legacy-numberformat#numberformat)
The abstract superclass that provides the basic fields and methods for
formatting `Number` objects and number primitives to localized strings and
parsing localized strings to `Number` objects.
* [`DecimalFormat`](numbers/legacy-numberformat#decimalformat)
A concrete class for formatting `Number` objects and number primitives to
localized strings and parsing localized strings to `Number` objects, in base 10.
* [`RuleBasedNumberFormat`](numbers/rbnf)
A concrete class for formatting `Number` objects and number primitives to
localized text, especially spelled-out format such as found in check writing
(e.g. "two hundred and thirty-four"), and parsing text into `Number` objects.
* [`DecimalFormatSymbols`](numbers/legacy-numberformat#decimalformatsymbols)
A concrete class for accessing localized number strings, such as the
grouping separators, decimal separator, and percent sign. Used by
`DecimalFormat`.
**Formatting Dates and Times**
* [`DateFormat`](datetime/index#dateformat)
The abstract superclass that provides the basic fields and methods for
formatting `Date` objects to localized strings and parsing date and time
strings to `Date` objects.
* [`SimpleDateFormat`](datetime/index#simpledateformat)
A concrete class for formatting `Date` objects to localized strings and
parsing date and time strings to `Date` objects, using a `GregorianCalendar`.
* [`DateFormatSymbols`](datetime/index#dateformatsymbols)
A concrete class for accessing localized date-time formatting strings, such
as names of the months, days of the week and the time zone.
**Formatting Messages**
* [`MessageFormat`](messages/index#messageformat)
A concrete class for producing a language-specific user message that
contains numbers, currency, percentages, date, time and string variables.
* [`ChoiceFormat`](messages/examples#choiceformat-class)
A concrete class for mapping strings to ranges of numbers and for handling
plurals and names series in user messages.