[RESOLVED] ICU 4.9 causes a bug with writer (words aren't split correctly)

David Tardon dtardon at redhat.com
Mon May 14 00:43:55 PDT 2012


On Sun, May 13, 2012 at 11:14:56PM +0100, Caolán McNamara wrote:
> On Sun, 2012-05-13 at 14:35 +0300, Lior Kaplan wrote:
> > Hi,
> > 
> > See #49849.
> > 
> > Same LibO version (3.5.3) on Debian/Ubuntu is built with ICU 4.8 and
> > doesn't have the problem.
> > 
> > Would be nice to get another confirmation and to understand if that's
> > a pure problem with ICU or also something in LibO.
> 
> Yeah, this began to affect master now that the internal icu was bumped
> to icu 49 so I've fixed it with
> http://cgit.freedesktop.org/libreoffice/core/commit/?id=20c24114143d6d38774b56a142fd4ae05094308e
> 
> Latest line breaking UAX brought in some special hebrew rules, so a new
> hebrew letter class got added, so, seeing as we have customized rules,
> this ended up meaning that hebrew characters got totally ignored by our
> rules cause they didn't know about the new character class :-)
> 
> I merged in the changes for these specific new rules to our existing
> rules, which makes it work like before. Added a regression test for it,
> and a README to document where our rules got derived to, and an easy
> hack (https://bugs.freedesktop.org/show_bug.cgi?id=49885) to review the
> changes that have accumulated to those rules to see if we even still
> need customized line-break rules. And, if we do, to sync them up with
> the latest icu ones, We're stale since at least 2006 on line-breaking it
> appears.

Unfortunately this does not work with genbrk from older ICUs (ICU 4.6
here). Commit dd49c193de9c4515335ad4a29778ceff225e3c38 attempts to avoid
the problem (by filtering references to the new character class out).
Maybe we should require newer ICU (I hope this does work with 4.8)?

D.


More information about the LibreOffice mailing list