commit | 14bcaaf58eefc2c139350253d3cdfb1940bca7e4 | [log] [tgz] |
---|---|---|
author | Andy Heninger <andy.heninger@gmail.com> | Thu Feb 13 21:40:28 2020 -0800 |
committer | Andy Heninger <andy.heninger@gmail.com> | Tue Feb 18 18:28:10 2020 -0800 |
tree | 8d1f838d924139d1a853c2a70cba9606aca3bf56 | |
parent | ed9ea2e7accc316d4086ac377f622ca90f2c016d [diff] |
ICU-20876 Regex Grapheme Cluster matching with Break Iterators. Change the implementation of grapheme cluster matching in regex to use an ICU break iterator instead of a little one-off state machine. The old implementation had fallen behind the Unicode UAX-29 specification for graphem clusters, and could not be easily updated. The implementation follows the same general pattern that is used for finding word boundaries with an ICU break iterator. In reviewing that code, a few improvements to the handling of ICU error codes were also made. Also note that this change adds a new dependency on Break Iteration. Regex patterns that previously would work with ICU builds that were configured with no break iteration will now fail. But only if they include \X for matching grapheme cluster boundaries.
This is the repository for the International Components for Unicode. The ICU project is under the stewardship of The Unicode Consortium.
Build | Status |
---|---|
TravisCI | |
Azure Pipelines | |
Azure Pipelines (Exhaustive Tests) | |
AppVeyor | |
Fuzzing |
icu4c/
ICU for C/C++icu4j/
ICU for Javatools/
Toolsvendor/
Vendor dependenciesPlease see ./icu4c/LICENSE (C and J are under an identical license file.)
Copyright © 2016 and later Unicode, Inc. and others. All Rights Reserved. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. Terms of Use and License