[OTLayout] Accelerate lookups by batching

If we need to apply many many lookups, we can fasten that up by applying
them in batches.  For each batch we keep the union of the coverage of
the lookups participating.  We can then skip glyph ranges that do NOT
participate in any lookup in the batch.  The batch partition is
determined optimally by a mathematical probability model on the glyphs
and a dynamic-program to optimize the partition.

The net effect is 30% speedup on Amiri.  the downside is more memory
consuption as each batch will keep an hb_set_t of its coverage.

I'm not yet convinced that the tradeoff is worth pursuing.  I'm trying
to find out ways to optimized this more, with less memory overhead.

This work also ignores the number of subtables per lookup.  That may
prove to be very important for the performance numbers from here on.
4 files changed