Implement bidirectional bracket support
The single rule N0 in the Unicode Bidirectional Algorithm may not
sound like much, but it packs quite a punch and required some deep
work.
It wasn't exactly made simpler by the fact that the document is very
convoluted and not easy to follow. However, it helps to have experience
from the other algorithms and the automatic tests allow very broad
confirmation of proper function.
In particular, the following changes needed to be made: The generator
had to be modified to
- Implement a decompositon to match canonically equivalent
brackets. This requires us to have UnicodeData.txt present,
but what matters is that the end result is fast and small.
- The LUT-printing automatically detects type, because it's just
too fragile otherwise.
The implementation of the algorithm itself had the following changes:
- The last strong type property of an isolate runner has been
refactored to be stateless. Otherwise, you can end up with
subtle bugs where strong types are added beforehand, yielding
a TOCTOU-problem.
- The bracket parsing makes use of a novel FIFO structure that
combines the best of both worlds between a stack and naive
implementation.
As an end result, we now pass all ~900k bidi tests from the Unicode
standard.
Signed-off-by: Laslo Hunhold <dev@frign.de>
8 files changed