[poppler] Searching for text in PDF files is wrong

Радомир Хаџић radomirhadzic46 at gmail.com
Mon Dec 3 15:57:41 UTC 2018


From: Радомир Хаџић <radomirhadzic46 at gmail.com>
Date: Fri, 30 Nov 2018 21:56:12 +0100
Subject: Searching for text in PDF files is wrong
To: gtk-app-devel-list <gtk-app-devel-list at gnome.org>

Hi.

I use poppler_page_find_text() to find text in PDF files. This returns
GList of pointers to PopplerRectangles. Then I use
poppler_page_render_selection() to mark the found text.

What is wrong is that PopplerRectangles returned by
poppler_page_find_text() are incompatible with those that
poppler_page_render_selection() requests, which is why the wrong text
is selected.

I have found that to make those two compatible, I have to do the
following to PopplerRectangles returned by poppler_page_find_text():
1) SWAP(rectangle.x1, rectangle.x2);
2) SWAP(rectangle.y1, rectangle.y2);
3) rectangle.y1 = page_height - rectangle.y1;
4) rectangle.y2 = page_height - rectangle.y2;
But this does not solve the problem because the marked text cycles
between right and wrong again while resizing the window.

I have created a small program that illustrates the problem. Here it
is: https://pastebin.com/h3F56Yv7 (I've also sent an attachment but
last time you didn't get it so this paste is a fallback in case you
don't get it again.)
You ought to supply two arguments when running the program: the
absolute path to a PDF file and the text you want to search for,
respectively. Pay attention to the selected text with and without
lines 54-57.

How can I make the found text to be marked properly? This "workaround"
does not work very well and it is an ugly solution anyway.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.c
Type: text/x-csrc
Size: 3056 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20181203/a90b8d3e/attachment.c>


More information about the poppler mailing list