blob: 4a6105d0b0d86eda4f25aff43dfb52ffded6c7d1 [file] [log] [blame]
# © 2016 and later: Unicode, Inc. and others.
# License & terms of use: http://www.unicode.org/copyright.html#License
#
# File: sat_Olck_sat_FONIPA.txt
# Generated from CLDR
#
# Santali (Ol Chiki) → Santali (International Phonetic Alphabet)
# Output
# ------
# m mː n nː ɳ ɳː ɲ ɲː ŋ ŋː
# p pʰ pʼ b bʰ t tʰ tʼ d dʰ ʈ ʈʰ ɖ ɖʰ c cʰ cʼ k kʰ kʼ ɡ ʔ
# s sː h
# d\u0361ʒ
# ɽ r
# l lː
# w wː w\u0303 w\u0303ː
#
# i iː ĩ ĩː u uː ũ ũː
# e eː ẽ ẽː ə əː ə\u0303 ə\u0303ː o oː õ õː
# ɛ ɛː ɛ\u0303 ɛ\u0303ː ɔ ɔː ɔ\u0303 ɔ\u0303ː
# a aː ã ãː
# References
# ----------
# [1] Michael Everson: Final proposal to encode the Ol Chiki script
# in the UCS. ISO/IEC JTC1/SC2/WG2 Working Group Document N2984R,
# September 21, 2005. http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2984.pdf
#
# [2] George L. Campbell: Compendium of the World's Languages.
# Volume 2: Ladakhi to Zuni. ISBN 0-415-20297-3. Taylor & Francis, 2000.
# Pages 1454 to 1458.
# Notes
# -----
# According to [1] (page 3), ᱽ can only follow the four ejective
# consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/; these become
# ᱵᱽ /b/, ᱫᱽ /d/, ᱡᱽ /d\u0361ʒ/, and ᱜᱽ /ɡ/. In online texts, however,
# we have occasionally encountered ᱽ following non-ejective plosives,
# for example after ᱯ /p/. These might possibly be typos. Our rules
# try to be resilient and handle ᱯᱽ as /b/.
#
# According to [1] (page 2), U+1C7C PHAARKAA follows the four “glottal”
# consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/ (these are actually
# ejective, not glottal). In online texts, however, we have frequently
# encountered ᱼ following non-ejective consonants.
$inword = [[:L:][:M:]];
# Some online texts use a decomposed form of U+1C7A MU-GAAHLAA TTUDDAG.
ᱹᱸ ;
ᱸᱹ ;
::null();
# To simplify the rules below, enforce a uniform ordering of marks.
ᱻᱹ ᱹᱻ ;
ᱻᱸ ᱸᱻ ;
ᱻᱺ ᱺᱻ ;
ᱼᱹ ᱹᱼ ;
ᱼᱸ ᱸᱼ ;
ᱼᱺ ᱺᱼ ;
::null();
# Some online texts use U+1C7C PHAARKAA instead of U+1C7B RELAA for indicating
# long phonemes, presumably because the graphemes look similar in some fonts.
# Since phaarkaa is used for voicing ejectives and plosives (which cannot
# be lenghtened), we rewrite phaarkaa to relaa.
[ᱚᱟᱤᱩᱮᱳᱶᱢᱝᱞᱱ] [ᱹᱸᱺ]* {ᱼ} ;
::null();
ᱚᱹᱻ ɔː ;
ᱚᱹ ɔ ;
ᱚᱸᱻ ɔ\u0303ː ;
ᱚᱸ ɔ\u0303 ;
ᱚᱺᱻ ɔ\u0303ː ;
ᱚᱺ ɔ\u0303 ;
ᱚᱻ ɔː ;
ɔ ;
ᱛᱼ t ;
ᱛᱷ tʰ ;
ᱛᱽ d ;
$inword {ᱛ} d ;
t ;
ᱜᱼ kʼ ;
ᱜᱷ kʰ ;
ᱜᱽ ɡ ;
$inword {ᱜ} ɡ ;
kʼ ;
ᱝᱻ ŋː ;
ŋ ;
ᱞᱻ lː ;
l ;
ᱟᱹᱻ əː ;
ᱟᱹ ə ;
ᱟᱸᱻ ãː ;
ᱟᱸ ã ;
ᱟᱺᱻ ə\u0303ː ;
ᱟᱺ ə\u0303 ;
ᱟᱻ aː ;
a ;
ᱠᱼ k ;
ᱠᱷ kʰ ;
ᱠᱽ ɡ ;
k ;
ᱡᱼ cʼ ;
ᱡᱷ cʰ ;
ᱡᱽ d\u0361ʒ ;
$inword {ᱡ} d\u0361ʒ ;
cʼ ;
ᱢᱻ mː ;
m ;
# According to [1], ᱣ is sometimes /v/ and sometimes /w/.
# TODO: Find out if there is a rule for this.
ᱣᱸ w\u0303 ;
w ;
ᱤᱹᱻ iː ;
ᱤᱹ i ;
ᱤᱸᱻ ĩː ;
ᱤᱸ ĩ ;
ᱤᱺᱻ ĩː ;
ᱤᱺ ĩ ;
ᱤᱻ iː ;
i ;
ᱥᱻ sː ;
s ;
# According to [1], ᱦ is sometimes /h/ and sometimes /ʔ/.
# TODO: Find out if there is a rule for this.
h ;
ᱧᱻ ɲː ;
ɲ ;
ᱨᱻ r ;
r ;
ᱩᱹᱻ uː ;
ᱩᱹ u ;
ᱩᱸᱻ ũː ;
ᱩᱸ ũ ;
ᱩᱺᱻ ũː ;
ᱩᱺ ũ ;
ᱩᱻ uː ;
u ;
ᱪᱼ c ;
ᱪᱷ cʰ ;
ᱪᱽ d\u0361ʒ ;
c ;
ᱫᱼ tʼ ;
ᱫᱷ tʰ ;
ᱫᱽ d ;
$inword {ᱫ} d ;
tʼ ;
ᱬᱻ ɳː ;
ɳ ;
# TODO: ᱵᱷᱭᱨᱚᱵ → bʰhrɔb seems unlikely; would be good to verify.
h ;
ᱮᱹᱻ ɛː ;
ᱮᱹ ɛ ;
ᱮᱺᱻ ɛ\u0303ː ;
ᱮᱺ ɛ\u0303 ;
ᱮᱸᱻ ẽː ;
ᱮᱸ ;
ᱮᱻ eː ;
e ;
ᱯᱼ p ;
ᱯᱷ pʰ ;
ᱯᱽ b ;
p ;
ᱰᱷ ɖʰ ;
ɖ ;
ᱱᱻ nː ;
n ;
ᱲᱻ ɽ ;
ɽ ;
ᱳᱸᱻ õː ;
ᱳᱸ õ ;
ᱳᱻ oː ;
o ;
ᱴᱼ ʈ ;
ᱴᱷ ʈʰ ;
ᱴᱽ ɖ ;
ʈ ;
ᱵᱼ pʼ ;
ᱵᱷ bʰ ;
ᱵᱽ b ;
$inword {ᱵ} b ;
pʼ ;
ᱶᱻ w\u0303ː ;
w\u0303 ;