[poppler] usage of TextOutputDev in poppler
Ihar `Philips` Filipau
thephilips at gmail.com
Thu Apr 19 08:11:50 PDT 2012
Today, per chance I have looked into the poppler sources and usage of
the TextOutputDev and have noticed few things:
(1) TextOutputDev is used inside page::search(), but it is created
newly every time the page::search() is called. Wouldn't it be better
to keep cached an instance of TectOutputDev for searches? This looks
like an explanation why in Okular the search is slow (speed is
constant) on large documents (800-1200 pages; think CPU instruction
manual), even if one searches for the same thing second time.
Same pattern in the Qt4's Page::search(), with the difference that
TextOutputDev parameters are not constant. But that also
(theoritecally) not a problem: one can remember c'tor parameters of
the cached TextOutputDev and if they need to be changed, discard old
copy and create new cached copy with new parameters.
That would be a great performance enhancement. If that of course is
possible to implement.
(2) More of a question. page::text()/Page::textList() both use the
TextOutputDev to extract text - as plain text. Do I understand
correctly that that is the reason why poppler based viewers wouldn't
be able to "Copy" into the clipboard text with styles like bold or
italic? Is that on any TODO? Is there any open-source PDF viewer which
can copy into clipboard text with formatting?
More information about the poppler