RP2350 HSTX RGB111 text mode theory:
* Cached font in memory is 13 pixels across, organized as the low 26 bits of 32-bit words
* A 14th pixel is always black
* Pixels in the cache are all adjacent
* Each output pixel is 1 byte
* "R2G2B2" values are created by selecting 2 bits out of the font data and multiplying them by the color value (SWAR)
The old implementation performed one multiply per output pixel, or 13
8-bit multiplies per character. The new implementation carefully
re-orders the data in the font cache so that 32-bit multiplies can be
performed instead. In this case, 4 multiplies per character are needed.
Each two characters make 28 bytes (7 32-bit values) in the output buffer,
so the character generator is unrolled manually once, making all stores
to the output buffer 32 bits at a time.
This gains enough efficiency that the loop can be written in C instead
of assembler and also there's enough time to add background color. The
background color is XOR'd into each output pixel.
The final new trick is reduced intensity: When reduced intensity is
selected, the low bit of the font data is masked away, so that instead
of intensities 0/1/2/3, the possible intensities are 0/0/2/2.
As neither the regular nor reduced intensity text are visible on the
matching background color, there are effectively 8 * 14 = 112 useful
combinations.