blob: 2b871e2531ad92c40ccac3fb49666c1b1c1813d8 [file] [log] [blame]
# © 2016 and later: Unicode, Inc. and others.
# License & terms of use: http://www.unicode.org/copyright.html#License
#
# File: Hira_Kana.txt
# Generated from CLDR
#
# note: a global filter is more efficient, but MUST include all source chars
:: [\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ;
:: NFKC ();
# Hiragana-Katakana
# This is largely a one-to-one mapping, but it has a
# few kinks:
# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
# Hiragana equivalents. We use Hiragana wa/wi/we/wo
# (308F-3092) with a voicing mark (3099), which is
# semantically equivalent. However, this is a non-
# roundtripping transformation.
# 2. The Katakana small ka/ke (30F5,30F6) have no
# Hiragana equiavlents. We convert them to normal
# Hiragana ka/ke (304B,3051). This is a one-way
# information-losing transformation and precludes
# round-tripping of 30F5 and 30F6.
# 3. The combining marks 3099-309C are in the Hiragana
# block, but they apply to Katakana as well, so we
# leave them untouched.
# 4. The Katakana prolonged sound mark 30FC doubles the
# preceding vowel. This is a one-way information-
# losing transformation from Katakana to Hiragana.
# 5. The Katakana middle dot separates words in foreign
# expressions; we leave this unmodified.
# The above points preclude successful round-trip
# transformations of arbitrary input text. However,
# they provide naturalistic results that should conform
# to user expectations.
# Combining equivalents va/vi/ve/vo
\u3099 ヷ;
\u3099 ヸ;
\u3099 ヹ;
\u3099 ヺ;
# One-to-one mappings, main block
# 3041:3094 ↔ 30A1:30F4
# 309D,E ↔ 30FD,E
ァ;
ア;
ィ;
イ;
ゥ;
ウ;
ェ;
エ;
ォ;
オ;
カ;
ガ;
キ;
ギ;
ク;
グ;
ケ;
ゲ;
コ;
ゴ;
サ;
ザ;
シ;
ジ;
ス;
ズ;
セ;
ゼ;
ソ;
ゾ;
タ;
ダ;
チ;
ヂ;
ッ;
ツ;
ヅ;
テ;
デ;
ト;
ド;
ナ;
ニ;
ヌ;
ネ;
ノ;
ハ;
バ;
パ;
ヒ;
ビ;
ピ;
フ;
ブ;
プ;
ヘ;
ベ;
ペ;
ホ;
ボ;
ポ;
マ;
ミ;
ム;
メ;
モ;
ャ;
ヤ;
ュ;
ユ;
ョ;
ヨ;
ラ;
リ;
ル;
レ;
ロ;
ヮ;
ワ;
ヰ;
ヱ;
ヲ;
ン;
ヴ;
ヽ;
ヾ;
# One-way Katakana-Hiragana xform of small K ka/ke to
# normal H ka/ke.
ヵ;
ヶ;
# Katakana followed by a prolonged sound mark 30FC has
# its final vowel doubled. This is a Katakana-Hiragana
# one-way information-losing transformation. We
# include the small Katakana (e.g., small A 3041) and
# do not distinguish them from their large
# counterparts. It doesn't make sense to double a
# small counterpart vowel as a small Hiragana vowel, so
# we don't do so. In natural text this should never
# occur anyway. If a 30FC is seen without a preceding
# vowel sound (e.g., after n 30F3) we do not change it.
### $long = ー;
# The following categories are Hiragana, not Katakana
# as might be expected, since by the time we get to the
# 30FC, the preceding character will have already been
# transformed to Hiragana.
# {The following mechanically generated from the
# Unicode 3.0 data:}
$xa = [ \
\
\
\
];
$xi = [ \
\
\
\
];
$xu = [ \
\
\
\
];
$xe = [ \
\
\
\
];
$xo = [ \
\
\
\
];
$xa {ー};
$xi {ー};
$xu {ー};
$xe {ー};
$xo {ー};
:: (NFKC) ;
# note: a global filter is more efficient, but MUST include all source chars!!
:: ([\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]);
# eof