[poppler] usage of TextOutputDev in poppler

Albert Astals Cid aacid at kde.org
Thu Apr 19 09:26:23 PDT 2012


El Dijous, 19 d'abril de 2012, a les 17:11:50, Ihar `Philips` Filipau va 
escriure:
> Hi All!
> 
> Today, per chance I have looked into the poppler sources and usage of
> the TextOutputDev and have noticed few things:
> 
> (1) TextOutputDev is used inside page::search(), but it is created
> newly every time the page::search() is called. Wouldn't it be better
> to keep cached an instance of TectOutputDev for searches? This looks
> like an explanation why in Okular the search is slow (speed is
> constant) on large documents (800-1200 pages; think CPU instruction
> manual), even if one searches for the same thing second time.

I don't think so, since Okular does not use the search() function.

> 
> Same pattern in the Qt4's Page::search(), with the difference that
> TextOutputDev parameters are not constant. But that also
> (theoritecally) not a problem: one can remember c'tor parameters of
> the cached TextOutputDev and if they need to be changed, discard old
> copy and create new cached copy with new parameters.
> 
> That would be a great performance enhancement. If that of course is
> possible to implement.
> 
> (2) More of a question. page::text()/Page::textList() both use the
> TextOutputDev to extract text - as plain text. Do I understand
> correctly that that is the reason why poppler based viewers wouldn't
> be able to "Copy" into the clipboard text with styles like bold or
> italic? Is that on any TODO? Is there any open-source PDF viewer which
> can copy into clipboard text with formatting?

TextWord has a TextFontInfo, probably is not 100% accurate would could be 
used.

Cheers,
  Albert

> 
> Thanks.
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list