[poppler] poppler Digest, Vol 65, Issue 48

srinivas adicherla srinivas.adicherla at gmail.com
Tue Jul 27 22:32:38 PDT 2010


*Finding a way to sort the Pdf Text Blocks,    find the
     number of columns         in a page.


*@Albert qt methods don't expose the selections, but if we can make the
block sortings in the backend poppler  code it self, so that we can expose
to glib or qt whenever we need. How about it?
*
*

On Wed, Jul 28, 2010 at 9:00 AM, <poppler-request at lists.freedesktop.org>wrote:

> Send poppler mailing list submissions to
>        poppler at lists.freedesktop.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.freedesktop.org/mailman/listinfo/poppler
> or, via email, send a message with subject or body 'help' to
>        poppler-request at lists.freedesktop.org
>
> You can reach the person managing the list at
>        poppler-owner at lists.freedesktop.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of poppler digest..."
>
>
> Today's Topics:
>
>   1. Re: Finding a way to sort the Pdf Text Blocks,    find the
>      number of columns         in a page. (Albert Astals Cid)
>   2. Re: Vertical or horizontal writing? (Albert Astals Cid)
>   3. FYI: embedded fonts for vertical text in PDF by MS Office
>      2007/2010 (suzuki toshiya)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 27 Jul 2010 20:36:56 +0100
> From: Albert Astals Cid <aacid at kde.org>
> Subject: Re: [poppler] Finding a way to sort the Pdf Text Blocks,
> find
>        the number of columns   in a page.
> To: poppler at lists.freedesktop.org
> Message-ID: <201007272036.57262.aacid at kde.org>
> Content-Type: Text/Plain;  charset="us-ascii"
>
> A Dimarts, 27 de juliol de 2010, srinivas adicherla va escriure:
> > Hi all,
> >
> >         I used the poppler_page_get_selection_
> > region() to find the line rectangles of each and every line in a page.
> > From that I find the blocks, then I find the columns of the page. From
> the
> > number of columns of the page, Iam able to sort the blocks. So that the
> > selection is very good.
> >
> > Right now in poppler the selection is bit a problem. After doing all
> these
> > its almost look like Adobe Reader's Selection.
> >
> > Please give me suggestions on improving this.
>
> Carlos? The qt frontends don't expose the selection method so i think it's
> up
> to you for the moment.
>
> >
> > I attached two files with this mail.
> >
> > getcol.c is able to sort the blocks in single/multicolumn pdfs.
> > getcolumn.c is based on the above sorting used to do the selection.
> >
> >
> > *I sent patch about getting the PDF ID from the document before. Albert
> > said it was ok. But he asked carlos ?
> >
> > Please give me the status about it. *
>
> Carlos?
>
> Albert
>
> >
> >
> > Thanks
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 27 Jul 2010 20:41:52 +0100
> From: Albert Astals Cid <aacid at kde.org>
> Subject: Re: [poppler] Vertical or horizontal writing?
> To: poppler at lists.freedesktop.org
> Message-ID: <201007272041.55309.aacid at kde.org>
> Content-Type: Text/Plain;  charset="us-ascii"
>
> A Dimarts, 27 de juliol de 2010, mpsuzuki at hiroshima-u.ac.jp va escriure:
> > Dear Albert,
> >
> > On Tue, 27 Jul 2010 10:32:45 +0900
> >
> > mpsuzuki at hiroshima-u.ac.jp wrote:
> > >>But i'd prefer you to use an enum instead of an int, at least on the
> > >>poppler- qt4 level, can you do the appropiate changes?
> > >
> > >OK, I will improve, of course. But please let me ask
> > >your comment about the appropriate design.
> > >
> > >When CMap->parse() parses CMap resource, it can load any
> > >integer value to CMap->wMode. And, The type of the return
> > >value from CMap->getWMode() (and GfxFont->getWMode()) is
> > >int.
> > >
> > >In FontInfo class, should I restrict the writing mode
> > >enumeration value to 2 correct values: 0/horizontal or
> > >1/vertical?
> > >
> > >Or, it is better to have 3 values: 0/horizontal, 1/vertical
> > >and -1 (or 2, or anything else) for broken writing mode
> > >info?
>
> Well, reading the specification it says that 0 is the default so i
> understand
> that if there is a value different than 0 or 1, 0 should be used.
>
> Albert
>
> >
> > Just I've drafted a patch using enum type in Poppler::FontInfo::wMode
> > and its copy in Qt4/GLib/cpp binding. Please find attached
> > patch.
> >
> > --
> >
> > But, Cobra had found the font-level writing mode detection
> > is insufficient even we restrict the scope to the PDF
> > generated by popular applications. I attached a PDF
> > including vertical text which is generated by MS Office
> > 2010 PDF generator addin. The embedded font is connected
> > with Identity-H, so my patch recognizes the font is for
> > horizontal. I try to detect the expected result by using
> > text level information. So, please don't hurry to evaluate
> > this patch. I mush work more.
> >
> >
> > Regards,
> > mpsuzuki
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 28 Jul 2010 12:29:29 +0900
> From: suzuki toshiya <mpsuzuki at hiroshima-u.ac.jp>
> Subject: [poppler] FYI: embedded fonts for vertical text in PDF by MS
>        Office  2007/2010
> To: poppler at lists.freedesktop.org
> Message-ID: <4C4FA419.5000502 at hiroshima-u.ac.jp>
> Content-Type: text/plain; charset="iso-2022-jp"
>
> Hi,
>
> When I check the PDFs generated by MS Office 2007 & 2010
> addin, I found a difference in font embedding feature of
> them.
>
> * MS Office 2007
> The embedded font is named with prefix "@". If I use
> MS Mincho, the font name is "@MS Mincho". Such @-prefixed
> names are legacy style. If the source document uses
> both of horizontal and vertical text, non-prefixed and
> @-prefixed font objects are embedded to the PDF.
>
> * MS Office 2010.
> The embedded font is always non-prefixed. If the source
> document uses both of horizontal and vertical text,
> single non-prefixed font object covering the glyphs in both
> texts is embeded to the PDF.
>
> For concrete examples, please find attached PDFs.
> I was thinking @-prefixed font names are only used by
> legacy application when Win32 GUI framework didn't support
> vertical text edit. Seeing such names in the applications
> in 21st century was interesting experience for me.
>
> Regards,
> mpsuzuki
>
>
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: msword2010-vert4.pdf
> Type: application/pdf
> Size: 38863 bytes
> Desc: not available
> URL: <
> http://lists.freedesktop.org/archives/poppler/attachments/20100728/d13e9f5f/attachment.pdf
> >
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: msword2007-vert.pdf
> Type: application/pdf
> Size: 50509 bytes
> Desc: not available
> URL: <
> http://lists.freedesktop.org/archives/poppler/attachments/20100728/d13e9f5f/attachment-0001.pdf
> >
>
> ------------------------------
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
>
> End of poppler Digest, Vol 65, Issue 48
> ***************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100728/64dd19f6/attachment.htm>


More information about the poppler mailing list