[poppler] Vertical or horizontal writing?
mpsuzuki at hiroshima-u.ac.jp
mpsuzuki at hiroshima-u.ac.jp
Wed Jul 28 00:32:20 PDT 2010
Hi,
On Wed, 28 Jul 2010 15:04:53 +0800 (CST)
"cobra.yu" <cobra.yu at hyweb.com.tw> wrote:
> Of course, such fake vertical writing mode is unacceptable.
Thanks.
>So, it shows that we can't only count on the wMode of the font
>information, but also take the real arrangent of text words on
>pages into consideration?
Yes, WMode is insufficient. As Deri analyzed, MS Office addin
draws vertical text by repeating "draw a glyph, move current
point vertically, draw a glyph...". So, it might be possible
to detect the text flow direction by tracking the moving of
current point. But, if our interest is only text search, the
tracking of current point won't be essential, I think. Maybe
collecting all glyphs in drawing order is sufficient for text
search. I will check more detail in poppler-qt4 binding.
Regards,
mpsuzuki
>-----Original message-----
>From:suzuki toshiya <mpsuzuki at hiroshima-u.ac.jp>
>To:cobra.yu at hyweb.com.tw
>Cc:poppler <poppler at lists.freedesktop.org>
>Date:Wed, 28 Jul 2010 15:18:58 +0900
>Subject:Re: [poppler] Vertical or horizontal writing?
>
>
>Hi,
>
>Please find attached fake vertical text produced by MS Excel
>2007. Is it acceptable for you to exclude such fake vertical
>text from your target?
>
>If you try to select the text on Adobe Reader, you can find
>that the order of glyph drawing is horizontal, it is stupid
>fake from the viewpoint of page rendering language.
>
>Regards,
>mpsuzuki
>
>cobra.yu wrote:
>> Hi,
>>
>> The original requirement to detect the direction of text flow is for "searching". The present "search" function of Poppler::Page is searching horizontally only. So, for CJK users, I must add one vertical search function for the vertical writing mode.
>> I could sort out all the textboxes in every page by (x,y) of the bounding box to make a vertical-like textbox list, but I encountered a fundamental problem: If I can't know the exact direction of text flow first, how could I know when to use vertical or horizontal search?
>> BTW, I've accomplished the vertical text selection by the same way as my vertical search right now, but it's rather simpler than searching indeed.
>>
>> Cobra
>>
>>
>> -----Original message-----
>> From:mpsuzuki at hiroshima-u.ac.jp
>> To:Deri James <deri at chuzzlewit.demon.co.uk>
>> Cc:poppler at lists.freedesktop.org,cobra.yu at hyweb.com.tw
>> Date:Wed, 28 Jul 2010 01:59:40 +0900
>> Subject:Re: [poppler] Vertical or horizontal writing?
>>
>> Dear Deri,
>>
>> On Tue, 27 Jul 2010 17:22:14 +0100
>> Deri James <deri at chuzzlewit.demon.co.uk> wrote:
>>
>>> When looking at the two PDFs you are using with acroread using the text
>>> selection tool:-
>>>
>>> P1 of 'vert-horiz-ipa-std.pdf' selection caret is drawn horizontally.
>>> 'msword2010-vert2.pdf' selection caret is drawn vertically.
>>>
>>> So, it seems acroread can't detect the vertical text in this file, i.e. it is
>>> actually horizontal text placed one glyph at a time (apart from 'MS Word 2010'
>>> which is horizontal text rotated 90 degrees).
>>>
>>> The contents of the stream confirms this:-
>>>
>>> stream
>>> /P <</MCID 0/Lang (en-US)>> BDC BT
>>> /F1 10.56 Tf
>>> 0.000000001 -1 1 0.000000001 496.54 756.84 Tm
>>> 0 g
>>> 0 G
>>> [(MS)6( )5(W)61(ord)-4( )5(20)10(10)] TJ
>>> ET
>>> EMC /P <</MCID 1>> BDC BT
>>> /F2 10.56 Tf
>>> 1 0.000000017 -0.000000017 1 495.29 673.7 Tm
>>> <085B>Tj
>>> ET
>>> EMC /P <</MCID 2>> BDC BT
>>> 1 0.000000017 -0.000000017 1 495.29 663.14 Tm
>>> <29AA>Tj
>>>
>>
>>
>>> ...
>>>
>>> So this PDF does not have any true vertical text.
>>>
>>
>> Yes, yes, just I've reached exactly same conclusion.
>> Thank you for checking the content of PDF.
>>
>> The PDF generated by MS Office addin uses the font object
>> for horizontal writing mode, in PDF design, at least. So
>> the text flow detection in PDF font level does not work
>> with such PDF. Higher level recognization is needed.
>>
>> It brings a philosophical question: what is vertical text?
>> Some people makes vertical serie of CJK glyphs by using
>> very very narrow text box, is this wrong vertical text?
>> If they are not vertical text, why we should distinguish?
>> The invalid shape of the punctuations & arrows? Or...
>>
>> I have to ask Cobra about what is the original requirement
>> why the text direction should be detected. Cobra, could
>> you describe why you needed to detect the direction of
>> text flow?
>>
>> Regards,
>> mpsuzuki
>>
>
>
More information about the poppler
mailing list