[Libreoffice-bugs] [Bug 104597] Text runs of RTL scripts (e.g. Arabic, Hebrew, Persian) from imported PDF are reversed, PDFIProcessor::mirrorString not behaving

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Thu Jul 15 10:58:15 UTC 2021


https://bugs.documentfoundation.org/show_bug.cgi?id=104597

--- Comment #47 from Kevin Suo <suokunlong at 126.com> ---
Further info:

If xpdf generated the following output:
drawChar 462.400000 770.989000 466.900000 770.989000 1.000000 0.000000 0.000000
1.000000 12.000000 ة

then sdext pdfimport will produce the following Transformation values, in the
order in the metrics: (assume this is the rCurGC):

            (0,0)   (0,1)   (0,2)       (1,0)   (1,1)       (1,2)
---------------------------------------------------------------------
rCurGC:     1200    0       46240       0       1200        5674.08

If xpdf generated the following: 
drawChar 466.900000 770.989000 469.828000 770.989000 1.000000 0.000000 0.000000
1.000000 12.000000 ي

then in sdext pdfimport the Transformation values are: (assume this is the
rNextGC):
            (0,0)   (0,1)   (0,2)       (1,0)   (1,1)       (1,2)
---------------------------------------------------------------------
rNextGC:    1200    0       46690       0       1200        5674.08

Apparently rCurGC.Transformation != rNextGC.Transformation. The different is in
the (0,2): one is 46240, the other one is 46690. What are these two values? the
position of the characters on the page??

Below is the full output of rCurGC.Transformation and rNextGC.Transformation:
(this is generated by adding a SAR_WARN above the if block in
DrawXmlOptimizer::optimizeTextElements
in file drawtreevisiting.cxx:
                std::cout << "rCurGC: " << rCurGC.Transformation.get(0,0) << "
" << rCurGC.Transformation.get(0,1) << " " << rCurGC.Transformation.get(0,2) <<
" ";
                std::cout << rCurGC.Transformation.get(1,0) << " " <<
rCurGC.Transformation.get(1,1) << " " << rCurGC.Transformation.get(1,2) <<
std::endl;

rCurGC: 1200 0 46240 0 1200 5674.08
rCurGC: 1200 0 46690 0 1200 5674.08
rCurGC: 1200 0 46980.4 0 1200 5674.08
rCurGC: 1200 0 47070.4 0 1200 5674.08
rCurGC: 1200 0 47659.6 0 1200 5674.08
rCurGC: 1200 0 48130 0 1200 5674.08
rCurGC: 1200 0 48370 0 1200 5674.08
rCurGC: 1200 0 48618.4 0 1200 5674.08
rCurGC: 1200 0 49007.2 0 1200 5674.08
rCurGC: 1200 0 49637.2 0 1200 5674.08
rCurGC: 1200 0 49927.6 0 1200 5674.08
rCurGC: 1200 0 50218 0 1200 5674.08
rCurGC: 1200 0 50496.4 0 1200 5674.08
rCurGC: 1200 0 50806 0 1200 5674.08
rCurGC: 1200 0 51276.4 0 1200 5674.08
rCurGC: 1200 0 51524.8 0 1200 5674.08
rCurGC: 1200 0 51773.2 0 1200 5674.08
rCurGC: 1200 0 52153.6 0 1200 5674.08
rCurGC: 1200 0 52603.6 0 1200 5674.08
rCurGC: 1200 0 53093.2 0 1200 5674.08

As you can see, all other values are the same, but the value in position (0,1)
of the metrics is increasing one by one. I think this value should not be used
to determine whether these characters should be combined into a string.

This is beyond my knowledge as it involves the basegfx::B2DHomMatrix staff
which I know nothing, so it need an expert to investigate.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20210715/b80b7d05/attachment-0001.htm>


More information about the Libreoffice-bugs mailing list