[HarfBuzz] Clusters chapter

Behdad Esfahbod behdad at behdad.org
Fri Nov 2 21:00:33 UTC 2018


Also things like generating PDFs that have better extractable text.

On Fri, Nov 2, 2018 at 5:00 PM Behdad Esfahbod <behdad at behdad.org> wrote:

> On Fri, Nov 2, 2018 at 4:48 PM Nathan Willis <nwillis at glyphography.com>
> wrote:
>
>> Hiyo. I'm revisiting the 'clusters' chapter in the User Manual, to make
>> it more consistent with the rest and hopefully easier to understand.
>> Rereading it has raised some questions....
>>
>> 1) The opening sentence says "a cluster is a sequence of code points
>> [...]"
>>  ...which might be true for the initial buffer contents, but all the
>> interesting stuff happens after replacing them with glyphs. So that will
>> certainly need some changing. I know it's a can of worms to open, but what
>> if we said "characters" here? Explaining the relationship between code
>> points, characters, and glyphs can be tricky, but then again explaining
>> clusters to new readers is already difficult....
>>
>
> Right.  "code points" by itself doesn't mean anything.  They might be
> Unicode code points, aka characters, or glyph code points, aka glyphs.
>
> A cluster refers to a sequence of characters and their corresponding
> glyphs.  Or the other way around, depending on your taste.
>
> 2) "Most clients will use UTF-8, UTF-16, or UTF-32 indices, but the actual
>> number does not matter" ... is "indices" here referring to the buffer
>> contents (code points)?
>>
>
> Refers to position in the text that was passed to hb_buffer_add_utf8/16/32.
>
>
>> 3) "Moreover, it is not required for the cluster values to be
>> monotonically increasing. Most of HarfBuzz's tests are performed on
>> monotonically increasing cluster numbers but, there is no such assumption
>> in the code itself."
>>
>
> Some of that sentence is implementation detals and can go.  The first
> sentence is enough.
>
> ... This is the big one. The examples that follow in subsequent
>> subsections hinge on the fact that the cluster values need to be
>> monotonically increasing. Keeping them monotonic & increasing is given as
>> the reason that clusters get merged when reordering (levels 0 and 1). So
>> this sentence sticks out. I'm not sure how to resolve that discrepancy; can
>> anyone explain how both of those pieces are supposed to fit together?
>>
>
> There's two things:
>
>   1. Whether or not the input clusters are monotonic,
>
>   2. Whether buffer cluster-level is set to any of the monotonic enum
> values.
>
> The promise is that *if* both of those are true, then the output cluster
> values are monotonic.  If any of the above is false, there's no guarantee.
>
>
>> Finally, I am adding a short "why your software cares about clusters"
>> paragraph to the beginning. I've got cursor positioning, coloring
>> diacritics, and line breaking in mind; anything else worth mentioning?
>>
>
> Text selection.  Is same as positioning but still.
>
> Thanks,
>> Nate
>> --
>> nathan.p.willis
>> nwillis at glyphography.com <http://identi.ca/n8>
>> _______________________________________________
>> HarfBuzz mailing list
>> HarfBuzz at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/harfbuzz
>>
>
>
> --
> behdad
> http://behdad.org/
>


-- 
behdad
http://behdad.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/harfbuzz/attachments/20181102/c3adac45/attachment.html>


More information about the HarfBuzz mailing list