[poppler] usage of TextOutputDev in poppler
Albert Astals Cid
aacid at kde.org
Thu Apr 19 09:26:23 PDT 2012
El Dijous, 19 d'abril de 2012, a les 17:11:50, Ihar `Philips` Filipau va
escriure:
> Hi All!
>
> Today, per chance I have looked into the poppler sources and usage of
> the TextOutputDev and have noticed few things:
>
> (1) TextOutputDev is used inside page::search(), but it is created
> newly every time the page::search() is called. Wouldn't it be better
> to keep cached an instance of TectOutputDev for searches? This looks
> like an explanation why in Okular the search is slow (speed is
> constant) on large documents (800-1200 pages; think CPU instruction
> manual), even if one searches for the same thing second time.
I don't think so, since Okular does not use the search() function.
>
> Same pattern in the Qt4's Page::search(), with the difference that
> TextOutputDev parameters are not constant. But that also
> (theoritecally) not a problem: one can remember c'tor parameters of
> the cached TextOutputDev and if they need to be changed, discard old
> copy and create new cached copy with new parameters.
>
> That would be a great performance enhancement. If that of course is
> possible to implement.
>
> (2) More of a question. page::text()/Page::textList() both use the
> TextOutputDev to extract text - as plain text. Do I understand
> correctly that that is the reason why poppler based viewers wouldn't
> be able to "Copy" into the clipboard text with styles like bold or
> italic? Is that on any TODO? Is there any open-source PDF viewer which
> can copy into clipboard text with formatting?
TextWord has a TextFontInfo, probably is not 100% accurate would could be
used.
Cheers,
Albert
>
> Thanks.
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list