[FriBidi] FriBidi and emoji modifiers

Dov Grobgeld dov.grobgeld at gmail.com
Thu Sep 17 06:08:42 PDT 2015


It is correct according to the Bidi Algorithm as both the emoji and its
modifiers are neutral characters, and thus they inherit their direction
from the context, and in this case become RTL. I can think off of my head
think about a couple of options of how to resolve this:

1. Before doing directional rearrange, surround all runs of emoji like
characters with LRI/PDI (requires non-released fribidi).
2. Check the run level of the emoji after the bidi algo and increase the
run level to the nearest higher even run-level before calling fribidi
reorder lines.

I'm sure there are other options as well. Probably Behdad will come up with
a simple solution that I hadn't thought of. :-)

Regards,
Dov




On Thu, Sep 17, 2015 at 3:54 PM, Romain Ouabdelkader <
romain.ouabdelkader at gmail.com> wrote:

> Hi,
>
> I'm having some trouble to get emojis to work with FriBidi.
> Basic emoji works fine, but when using emoji modifiers in a RTL language,
> the modifiers end up before the emoji.
>
> Here I have a string with an arabic text, the emoji U+1f476 (a baby) and a
> tone modifier U+1f3fb:
>
>
> const char utf8_input[] = u8"اختبار \U0001f476\U0001f3fb";
> int utf8_len = sizeof(utf8_input) - 1;
>
> std::unique_ptr<FriBidiChar[]> unicode_str(new FriBidiChar[utf8_len]);
>
> FriBidiCharSet utf8_charset = fribidi_parse_charset("UTF-8");
> int len_unicode = fribidi_charset_to_unicode(utf8_charset, utf8_input,
>                                              utf8_len, unicode_str.get());
>
> std::unique_ptr<FriBidiCharType[]> bidi_types(new
> FriBidiCharType[len_unicode]);
> std::unique_ptr<FriBidiLevel[]> levels(new FriBidiLevel[len_unicode]);
> FriBidiParType base = FRIBIDI_PAR_ON;
>
> fribidi_get_bidi_types(unicode_str.get(), len_unicode, bidi_types.get());
> fribidi_get_par_embedding_levels(bidi_types.get(), len_unicode, &base,
> levels.get());
>
> fribidi_reorder_line(0, bidi_types.get(), len_unicode, 0, base,
>                      levels.get(), unicode_str.get(), NULL);
>
> std::cout << std::hex;
> for (int i = 0; i < len_unicode; ++i)
>   {
>     std::cout << "\\u" << unicode_str[i];
>   }
> std::cout << std::endl;
>
>
> Output:
> \u1f3fb\u1f476\u20\u631\u627\u628\u62a\u62e\u627
>
> As you can see the tone modifier U+1f3fb is first and then the emoji
> U+1f476 is next.
> I've also tested this with log2vis().
>
> Is this a bug? If not, what is the correct way to handle emojis?
>
> Regards,
> Romain Ouabdelkader.
>
> _______________________________________________
> fribidi mailing list
> fribidi at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/fribidi
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/fribidi/attachments/20150917/939b89cd/attachment.html>


More information about the fribidi mailing list