ICU-11548 Improve regex static UnicodeSets handling

Compiled regular expression patterns make use of several shared common
UnicodeSets. This change simplifies the creation and use of these
static UnicodeSets.

- Pointer fields to the static sets are removed from the compiled patterns,
  and the static variables are accessed directly. The deleted pointers
  were a hold-over from earlier code that did not use shared statics.

- The UnicodeSet pattern literals are changed from hex constants to
  u"string literals".

- The size of fRuleSets (from regexst.h) is changed from a hard-coded 10
  to the number of UnicodeSets actually required. Doing this required
  a change to regexcst.pl to export the required size. Changing and
  rerunning this perl code resulted in massive but benign changes to
  the generated file regexcst.h, the result of perl having changed its
  order of enumeration of hashes since the file was last regenerated.

- UnicodeSets are frozen when possible. Should result in faster matching.
9 files changed
tree: 3b96b0505aedd3b738bd95bbac59246c1e91d7da
  1. .ci-builds/
  2. .github/
  3. docs/
  4. icu4c/
  5. icu4j/
  6. tools/
  7. vendor/
  8. .appveyor.yml
  9. .cpyskip.txt
  10. .gitattributes
  11. .gitignore
  12. .travis.yml
  13. KEYS
  14. README.md
README.md

International Components for Unicode

This is the repository for the International Components for Unicode. The ICU project is under the stewardship of The Unicode Consortium.

ICU Logo

Build Status (master branch)

BuildStatus
TravisCIBuild Status
Azure PipelinesBuild Status
Azure Pipelines (Exhaustive Tests)Build Status
AppVeyorBuild status
FuzzingFuzzing Status

Subdirectories and Information

License

Please see ./icu4c/LICENSE (C and J are under an identical license file.)

Copyright © 2016 and later Unicode, Inc. and others. All Rights Reserved. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. Terms of Use and License