[poppler] poppler/gilib: poppler_page_find_text() returns wrong PopplerRectangle
Kenji Okimoto
okimoto at clear-code.com
Wed Dec 28 06:36:27 UTC 2016
Hi,
I'm parsing some PDF files by following code:
Use this PDF, for example.
https://www.sedl.org/afterschool/toolkits/science/pdf/ast_sci_data_tables_sample.pdf
#include <stdio.h>
#include <stdlib.h>
#include <glib/poppler.h>
int main(void)
{
GError *error = NULL;
PopplerDocument *doc =
poppler_document_new_from_file("file:///path/to/ast_sci_data_tables_sample.pdf",
NULL, &error);
if (!doc) {
printf("%s\n", error->message);
return 0;
}
PopplerPage *page = poppler_document_get_page(doc, 1);
// char *content = poppler_page_get_text(page);
char *content;
double height;
poppler_page_get_size(page, NULL, &height);
GList *list = poppler_page_find_text(page, "Wingfoot Express");
for (GList *node = list; node != NULL; node = node->next) {
PopplerRectangle *rec = (PopplerRectangle *)node->data;
content = poppler_page_get_text_for_area(page, rec);
printf("%s\n", content); // Displays unexpected wrong text
printf("x1=%f, y1=%f, x2=%f, y2=%f\n", rec->x1, rec->y1, rec->x2,
rec->y2);
rec->y1 = height - rec->y1;
rec->y2 = height - rec->y2;
content = poppler_page_get_text_for_area(page, rec);
printf("%s\n", content); // Displays expected text
printf("x1=%f, y1=%f, x2=%f, y2=%f\n", rec->x1, rec->y1, rec->x2,
rec->y2);
}
return 0;
}
I'm confusing that I cannot use PopplerRectangle returned by
poppler_page_find_text() with poppler_page_get_text_for_area().
I think that poppler_page_get_text_for_area() should return proper text
without editing poppler_page_find_text() result.
In C++ version,
I can use poppler::rectangle returned by page->search() with page->text().
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <memory>
#include <iostream>
#include <sstream>
#include <poppler-document.h>
#include <poppler-page.h>
#include <poppler-rectangle.h>
int main(int argx, char **argv)
{
const std::string path("/path/to/ast_sci_data_tables_sample.pdf");
poppler::document *doc = poppler::document::load_from_file(path);
poppler::page *page = doc->create_page(1);
poppler::rectangle <double>r;
poppler::ustring text = poppler::ustring::from_latin1("Wingfoot Express");
bool matched = page->search(text, r,
poppler::page::search_direction_enum::search_from_top,
poppler::case_sensitivity_enum::case_sensitive);
std::cout << text.to_latin1() << std::endl
<< matched << std::endl
<< r.x() << "," << r.y() << std::endl
<< r.left() << "," << r.right() << std::endl
<< r.top() << "," << r.bottom() << std::endl
<< r.width() << "," << r.height() << std::endl
<< r << std::endl;
poppler::ustring t = page->text(r);
std::cout << t.to_latin1() << std::endl;
return 0;
}
Thanks
--
Kenji Okimoto <okimoto at clear-code.com>
More information about the poppler
mailing list