[poppler] Vertical or horizontal writing?

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Tue Jul 27 09:59:40 PDT 2010


Dear Deri,

On Tue, 27 Jul 2010 17:22:14 +0100
Deri James <deri at chuzzlewit.demon.co.uk> wrote:
>When looking at the two PDFs you are using with acroread using the text 
>selection tool:-
>
>P1 of 'vert-horiz-ipa-std.pdf' selection caret is drawn horizontally.
>'msword2010-vert2.pdf' selection caret is drawn vertically.
>
>So, it seems acroread can't detect the vertical text in this file, i.e. it is 
>actually horizontal text placed one glyph at a time (apart from 'MS Word 2010' 
>which is horizontal text rotated 90 degrees).
>
>The contents of the stream confirms this:-
>
>stream
> /P <</MCID 0/Lang (en-US)>> BDC BT
>/F1 10.56 Tf
>0.000000001 -1 1 0.000000001 496.54 756.84 Tm
>0 g
>0 G
>[(MS)6( )5(W)61(ord)-4( )5(20)10(10)] TJ
>ET
> EMC  /P <</MCID 1>> BDC BT
>/F2 10.56 Tf
>1 0.000000017 -0.000000017 1 495.29 673.7 Tm
><085B>Tj
>ET
> EMC  /P <</MCID 2>> BDC BT
>1 0.000000017 -0.000000017 1 495.29 663.14 Tm
><29AA>Tj

>...
>
>So this PDF does not have any true vertical text.

Yes, yes, just I've reached exactly same conclusion.
Thank you for checking the content of PDF.

The PDF generated by MS Office addin uses the font object
for horizontal writing mode, in PDF design, at least. So
the text flow detection in PDF font level does not work
with such PDF. Higher level recognization is needed.

It brings a philosophical question: what is vertical text?
Some people makes vertical serie of CJK glyphs by using
very very narrow text box, is this wrong vertical text?
If they are not vertical text, why we should distinguish?
The invalid shape of the punctuations & arrows? Or...

I have to ask Cobra about what is the original requirement
why the text direction should be detected. Cobra, could
you describe why you needed to detect the direction of
text flow?

Regards,
mpsuzuki


More information about the poppler mailing list