[poppler] poppler Digest, Vol 65, Issue 48

Albert Astals Cid aacid at kde.org
Thu Jul 29 13:36:48 PDT 2010


A Dimecres, 28 de juliol de 2010, srinivas adicherla va escriure:
> *Finding a way to sort the Pdf Text Blocks,    find the
>      number of columns         in a page.
> 
> 
> *@Albert qt methods don't expose the selections, but if we can make the
> block sortings in the backend poppler  code it self, so that we can expose
> to glib or qt whenever we need. How about it?

I'm always open to improvements :-)

Albert

> *
> *
> 
> On Wed, Jul 28, 2010 at 9:00 AM, <poppler-
request at lists.freedesktop.org>wrote:
> > Send poppler mailing list submissions to
> > 
> >        poppler at lists.freedesktop.org
> > 
> > To subscribe or unsubscribe via the World Wide Web, visit
> > 
> >        http://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> > or, via email, send a message with subject or body 'help' to
> > 
> >        poppler-request at lists.freedesktop.org
> > 
> > You can reach the person managing the list at
> > 
> >        poppler-owner at lists.freedesktop.org
> > 
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of poppler digest..."
> > 
> > Today's Topics:
> >   1. Re: Finding a way to sort the Pdf Text Blocks,    find the
> >   
> >      number of columns         in a page. (Albert Astals Cid)
> >   
> >   2. Re: Vertical or horizontal writing? (Albert Astals Cid)
> >   3. FYI: embedded fonts for vertical text in PDF by MS Office
> >   
> >      2007/2010 (suzuki toshiya)
> > 
> > ----------------------------------------------------------------------
> > 
> > Message: 1
> > Date: Tue, 27 Jul 2010 20:36:56 +0100
> > From: Albert Astals Cid <aacid at kde.org>
> > Subject: Re: [poppler] Finding a way to sort the Pdf Text Blocks,
> > find
> > 
> >        the number of columns   in a page.
> > 
> > To: poppler at lists.freedesktop.org
> > Message-ID: <201007272036.57262.aacid at kde.org>
> > Content-Type: Text/Plain;  charset="us-ascii"
> > 
> > A Dimarts, 27 de juliol de 2010, srinivas adicherla va escriure:
> > > Hi all,
> > > 
> > >         I used the poppler_page_get_selection_
> > > 
> > > region() to find the line rectangles of each and every line in a page.
> > > From that I find the blocks, then I find the columns of the page. From
> > 
> > the
> > 
> > > number of columns of the page, Iam able to sort the blocks. So that the
> > > selection is very good.
> > > 
> > > Right now in poppler the selection is bit a problem. After doing all
> > 
> > these
> > 
> > > its almost look like Adobe Reader's Selection.
> > > 
> > > Please give me suggestions on improving this.
> > 
> > Carlos? The qt frontends don't expose the selection method so i think
> > it's up
> > to you for the moment.
> > 
> > > I attached two files with this mail.
> > > 
> > > getcol.c is able to sort the blocks in single/multicolumn pdfs.
> > > getcolumn.c is based on the above sorting used to do the selection.
> > > 
> > > 
> > > *I sent patch about getting the PDF ID from the document before. Albert
> > > said it was ok. But he asked carlos ?
> > > 
> > > Please give me the status about it. *
> > 
> > Carlos?
> > 
> > Albert
> > 
> > > Thanks
> > 
> > ------------------------------
> > 
> > Message: 2
> > Date: Tue, 27 Jul 2010 20:41:52 +0100
> > From: Albert Astals Cid <aacid at kde.org>
> > Subject: Re: [poppler] Vertical or horizontal writing?
> > To: poppler at lists.freedesktop.org
> > Message-ID: <201007272041.55309.aacid at kde.org>
> > Content-Type: Text/Plain;  charset="us-ascii"
> > 
> > A Dimarts, 27 de juliol de 2010, mpsuzuki at hiroshima-u.ac.jp va escriure:
> > > Dear Albert,
> > > 
> > > On Tue, 27 Jul 2010 10:32:45 +0900
> > > 
> > > mpsuzuki at hiroshima-u.ac.jp wrote:
> > > >>But i'd prefer you to use an enum instead of an int, at least on the
> > > >>poppler- qt4 level, can you do the appropiate changes?
> > > >
> > > >OK, I will improve, of course. But please let me ask
> > > >your comment about the appropriate design.
> > > >
> > > >When CMap->parse() parses CMap resource, it can load any
> > > >integer value to CMap->wMode. And, The type of the return
> > > >value from CMap->getWMode() (and GfxFont->getWMode()) is
> > > >int.
> > > >
> > > >In FontInfo class, should I restrict the writing mode
> > > >enumeration value to 2 correct values: 0/horizontal or
> > > >1/vertical?
> > > >
> > > >Or, it is better to have 3 values: 0/horizontal, 1/vertical
> > > >and -1 (or 2, or anything else) for broken writing mode
> > > >info?
> > 
> > Well, reading the specification it says that 0 is the default so i
> > understand
> > that if there is a value different than 0 or 1, 0 should be used.
> > 
> > Albert
> > 
> > > Just I've drafted a patch using enum type in Poppler::FontInfo::wMode
> > > and its copy in Qt4/GLib/cpp binding. Please find attached
> > > patch.
> > > 
> > > --
> > > 
> > > But, Cobra had found the font-level writing mode detection
> > > is insufficient even we restrict the scope to the PDF
> > > generated by popular applications. I attached a PDF
> > > including vertical text which is generated by MS Office
> > > 2010 PDF generator addin. The embedded font is connected
> > > with Identity-H, so my patch recognizes the font is for
> > > horizontal. I try to detect the expected result by using
> > > text level information. So, please don't hurry to evaluate
> > > this patch. I mush work more.
> > > 
> > > 
> > > Regards,
> > > mpsuzuki
> > 
> > ------------------------------
> > 
> > Message: 3
> > Date: Wed, 28 Jul 2010 12:29:29 +0900
> > From: suzuki toshiya <mpsuzuki at hiroshima-u.ac.jp>
> > Subject: [poppler] FYI: embedded fonts for vertical text in PDF by MS
> > 
> >        Office  2007/2010
> > 
> > To: poppler at lists.freedesktop.org
> > Message-ID: <4C4FA419.5000502 at hiroshima-u.ac.jp>
> > Content-Type: text/plain; charset="iso-2022-jp"
> > 
> > Hi,
> > 
> > When I check the PDFs generated by MS Office 2007 & 2010
> > addin, I found a difference in font embedding feature of
> > them.
> > 
> > * MS Office 2007
> > The embedded font is named with prefix "@". If I use
> > MS Mincho, the font name is "@MS Mincho". Such @-prefixed
> > names are legacy style. If the source document uses
> > both of horizontal and vertical text, non-prefixed and
> > @-prefixed font objects are embedded to the PDF.
> > 
> > * MS Office 2010.
> > The embedded font is always non-prefixed. If the source
> > document uses both of horizontal and vertical text,
> > single non-prefixed font object covering the glyphs in both
> > texts is embeded to the PDF.
> > 
> > For concrete examples, please find attached PDFs.
> > I was thinking @-prefixed font names are only used by
> > legacy application when Win32 GUI framework didn't support
> > vertical text edit. Seeing such names in the applications
> > in 21st century was interesting experience for me.
> > 
> > Regards,
> > mpsuzuki
> > 
> > 
> > 
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: msword2010-vert4.pdf
> > Type: application/pdf
> > Size: 38863 bytes
> > Desc: not available
> > URL: <
> > http://lists.freedesktop.org/archives/poppler/attachments/20100728/d13e9f
> > 5f/attachment.pdf
> > 
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: msword2007-vert.pdf
> > Type: application/pdf
> > Size: 50509 bytes
> > Desc: not available
> > URL: <
> > http://lists.freedesktop.org/archives/poppler/attachments/20100728/d13e9f
> > 5f/attachment-0001.pdf
> > 
> > 
> > ------------------------------
> > 
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> > 
> > End of poppler Digest, Vol 65, Issue 48
> > ***************************************


More information about the poppler mailing list