[poppler] What Triggers PDFtoHtml to convert pdf page to image?
Alfredo Jr. Go
frederick0291 at gmail.com
Wed Aug 4 14:18:12 UTC 2021
Hi,
I am trying to convert pdf files to html. Running it with pdftohtml -c -s
input output works fine on simple PDFs. PDFtoHTML converts the file
properly into the intended html file with para tags.
But, when I tried testing it on PDF files (court documents), PDFtoHTML just
converts them into a PNG file and then links them in the output html file.
So I have an HTML file that just links an image.
Sample:
<!-- Page 5 -->
<a name="5"></a>
<style type="text/css">
<!--
p {margin: 0; padding: 0;} .ft519{font-size:27px;font-family:Helvetica;
color:#000000;}
-->
</style>
<div id="page5-div" style="position:relative;width:918px;height:1188px;">
<img
width="918" height="1188" src="SEP272019_02A6245005.png"
alt="background image"/>
</div>
<!-- Page 6 -->
<a name="6"></a>
<style type="text/css">
<!--
p {margin: 0; padding: 0;} .ft620{font-size:13px;font-family:Times;
color:#000000;}
.ft621{font-size:10px;font-family:Helvetica;color:#000000;}
-->
</style>
<div id="page6-div" style="position:relative;width:918px;height:1188px;">
<img
width="918" height="1188" src="SEP272019_02A6245006.png"
alt="background image"/>
</div>
<!-- Page 7 -->
<a name="7"></a>
<style type="text/css">
<!--
p {margin: 0; padding: 0;} .ft722{font-size:15px;font-family:Helvetica;
color:#000000;}
-->
</style>
<div id="page7-div" style="position:relative;width:918px;height:1188px;">
<img
width="918" height="1188" src="SEP272019_02A6245007.png"
alt="background image"/>
</div>
<!-- Page 8 -->
<a name="8"></a>
<style type="text/css">
<!--
p {margin: 0; padding: 0;} .ft823{font-size:28px;font-family:Helvetica;
color:#000000;}
.ft824{font-size:72px;font-family:Helvetica;color:#000000;}
-->
</style>
<div id="page8-div" style="position:relative;width:918px;height:1188px;">
<img
width="918" height="1188" src="SEP272019_02A6245008.png"
alt="background image"/>
</div>
<!-- Page 9 -->
<a name="9"></a>
<style type="text/css">
<!--
p {margin: 0; padding: 0;}-->
</style>
<div id="page9-div" style="position:relative;width:918px;height:1188px;">
<img
width="918" height="1188" src="SEP272019_02A6245009.png"
alt="background image"/>
</div>
What triggers this behavior? I was hoping that it would try to convert the
PDFs to a HTML file with text in tags but it just converts them into images
and links them in the output html file.
I am not allowed to share the PDF files since they are legal documents.
Regards,
Fred.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20210804/98d866a5/attachment.htm>
More information about the poppler
mailing list