[poppler] -bbox option in pdftotext

Tom Gleason tom at buildadam.com
Fri Feb 18 09:48:29 PST 2011


Hi,

The -bbox option is great, but there are two problems that I've noticed.

First, the <body> and <html> tags aren't closed, which cause a problem
for parsing the xml:

--- a/utils/pdftotext.cc
+++ b/utils/pdftotext.cc
@@ -361,6 +361,8 @@ int main(int argc, char *argv[]) {
       }
       fprintf(f, "</doc>\n");
     }
+    fprintf(f, "</body>\n");
+    fprintf(f, "</html>\n");
     fclose(f);
     delete textOut;
   } else {


second, though the program outputs the data fine, I get a segmentation
fault. I'm not a C programmer so I'm not sure how to debug this

tom at tom:~/Desktop$ pdftotext -bbox thrift-20070401.pdf
Segmentation fault


Hope that helps.

-- 
Tom Gleason, PHP Developer

Exploring ResourceSpace at:
http://resourcespace.blogspot.com

ResourceSpace Support Services
https://www.buildadam.com/muse2


More information about the poppler mailing list