[HarfBuzz] Improvement for hb_buffer_guess_segment_properties()

Mon Mar 4 20:50:26 PST 2013

That was an attempt to put this data to .rodata section.
Implementation could be optimized with using bsearch instead of
for-loop.
But if you think switch-case loop is more efficient, I can rewrite.

What about Han script? I mean, what can we do here if Unicode defines
only one script where CLDR subdivides it into Hani/Hans/Hant ?

regards,
Konstantin

2013/3/5 Grigori Goronzy <greg at chown.ath.cx>:
> On 12/24/2012 03:29 PM, Konstantin Ritt wrote:
>> Here is an implementation of
>> hb_language_get_default_for_script(hb_script_t) that could be used in
>> hb_buffer_guess_segment_properties() by default, when there is no
>> language set for a segment.
>>
>
> Good idea, but this is quite different from the implementation in Pango
> (pango-language.c) that Behdad pointed me to some time ago. For some
> ambiguous scripts that I know, this maps to specific languages - for
> instance consider the HAN script. It is used equally in Japanese,
> Chinese and in various other languages. So there is no representative
> language.
>
> Also, shouldn't it be more efficient to use a big switch-case lookup?
> This leaves compilers more room for optimization. What do you think
> about the implementation in [1]?
>
> I'd really like to see an implementation of such a script-to-language
> mapping in Harfbuzz - not only in guess_segment_properties, it should be
> offered as a separate function.
>
> Best regards
> Grigori
>
> [1]
> http://code.google.com/p/libass/source/detail?r=fc3b05f3178a88e5af1e994d91e43fdb0fda1059