[HarfBuzz] Clusters chapter

Behdad Esfahbod behdad at behdad.org
Fri Nov 2 21:00:04 UTC 2018


On Fri, Nov 2, 2018 at 4:48 PM Nathan Willis <nwillis at glyphography.com>
wrote:

> Hiyo. I'm revisiting the 'clusters' chapter in the User Manual, to make it
> more consistent with the rest and hopefully easier to understand. Rereading
> it has raised some questions....
>
> 1) The opening sentence says "a cluster is a sequence of code points [...]"
>  ...which might be true for the initial buffer contents, but all the
> interesting stuff happens after replacing them with glyphs. So that will
> certainly need some changing. I know it's a can of worms to open, but what
> if we said "characters" here? Explaining the relationship between code
> points, characters, and glyphs can be tricky, but then again explaining
> clusters to new readers is already difficult....
>

Right.  "code points" by itself doesn't mean anything.  They might be
Unicode code points, aka characters, or glyph code points, aka glyphs.

A cluster refers to a sequence of characters and their corresponding
glyphs.  Or the other way around, depending on your taste.

2) "Most clients will use UTF-8, UTF-16, or UTF-32 indices, but the actual
> number does not matter" ... is "indices" here referring to the buffer
> contents (code points)?
>

Refers to position in the text that was passed to hb_buffer_add_utf8/16/32.


> 3) "Moreover, it is not required for the cluster values to be
> monotonically increasing. Most of HarfBuzz's tests are performed on
> monotonically increasing cluster numbers but, there is no such assumption
> in the code itself."
>

Some of that sentence is implementation detals and can go.  The first
sentence is enough.

... This is the big one. The examples that follow in subsequent subsections
> hinge on the fact that the cluster values need to be monotonically
> increasing. Keeping them monotonic & increasing is given as the reason that
> clusters get merged when reordering (levels 0 and 1). So this sentence
> sticks out. I'm not sure how to resolve that discrepancy; can anyone
> explain how both of those pieces are supposed to fit together?
>

There's two things:

  1. Whether or not the input clusters are monotonic,

  2. Whether buffer cluster-level is set to any of the monotonic enum
values.

The promise is that *if* both of those are true, then the output cluster
values are monotonic.  If any of the above is false, there's no guarantee.


> Finally, I am adding a short "why your software cares about clusters"
> paragraph to the beginning. I've got cursor positioning, coloring
> diacritics, and line breaking in mind; anything else worth mentioning?
>

Text selection.  Is same as positioning but still.

Thanks,
> Nate
> --
> nathan.p.willis
> nwillis at glyphography.com <http://identi.ca/n8>
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/harfbuzz
>


-- 
behdad
http://behdad.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/harfbuzz/attachments/20181102/788efab1/attachment.html>


More information about the HarfBuzz mailing list