[poppler] -bbox option in pdftotext
Tom Gleason
tom at buildadam.com
Fri Feb 18 09:48:29 PST 2011
Hi,
The -bbox option is great, but there are two problems that I've noticed.
First, the <body> and <html> tags aren't closed, which cause a problem
for parsing the xml:
--- a/utils/pdftotext.cc
+++ b/utils/pdftotext.cc
@@ -361,6 +361,8 @@ int main(int argc, char *argv[]) {
}
fprintf(f, "</doc>\n");
}
+ fprintf(f, "</body>\n");
+ fprintf(f, "</html>\n");
fclose(f);
delete textOut;
} else {
second, though the program outputs the data fine, I get a segmentation
fault. I'm not a C programmer so I'm not sure how to debug this
tom at tom:~/Desktop$ pdftotext -bbox thrift-20070401.pdf
Segmentation fault
Hope that helps.
--
Tom Gleason, PHP Developer
Exploring ResourceSpace at:
http://resourcespace.blogspot.com
ResourceSpace Support Services
https://www.buildadam.com/muse2
More information about the poppler
mailing list