<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Is there any way to prevent pdftops from subsetting fonts? I want to be able to convert the ps back to a PDF and still be able to extract text with pdftotext.</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I have a large single page PDF. When I drag to copy text in atril or okular or run pdftotext, it finds the text.</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
pdffonts shows about 40 fonts. They are all similar:</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: "Courier New", monospace;">name type encoding emb sub uni object ID</span><span><br>
</span>
<div><span style="font-family: "Courier New", monospace;">------------------------------------ ----------------- ---------------- --- --- --- ---------</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">HelveticaNeueLTStd-Roman--Identity-H CID Type 0C Identity-H yes no yes 214 0</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">HelveticaNeueLTStd-BdIt--Identity-H CID Type 0C Identity-H yes no yes 236 0</span><br>
</div>
<span></span><span style="font-family: "Courier New", monospace;">...</span></div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: "Courier New", monospace;">HelveticaLTStd-Bold--Identity-H CID Type 0C Identity-H yes no yes 70 0</span><span><br>
</span>
<div><span style="font-family: "Courier New", monospace;">Berkeley-Bold--Identity-H CID Type 0C Identity-H yes no yes 60 0</span></div>
<div><span style="font-family: "Courier New", monospace;"><br>
</span></div>
<div><span style="font-family: "Courier New", monospace;">pdfinfo shows</span></div>
<div><span style="font-family: "Courier New", monospace;"><span><span>ModDate: Fri Jun 26 21:27:37 2020 WEST<br>
</span>
<div>Tagged: no<br>
</div>
<div>UserProperties: no<br>
</div>
<div>Suspects: no<br>
</div>
<div>Form: none<br>
</div>
<div>JavaScript: no<br>
</div>
<div>Pages: 1<br>
</div>
<div>Encrypted: no<br>
</div>
<div>Page size: 702 x 1296 pts<br>
</div>
<div>Page rot: 0<br>
</div>
<div>File size: 13501736 bytes<br>
</div>
<div>Optimized: no<br>
</div>
<div>PDF version: 1.6</div>
</span></span></div>
<div><span style="font-family: "Courier New", monospace;"><br>
</span></div>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
When I run the PDF through pdftops, it subsets the fonts, and then when I convert it back into a PDF with ghostscript ps2pdf, the text shows, but copying it or running pdftotext does not work.</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
The end of the generated ps is</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span>%%+ font BHQHNF+MinionPro-Regular<br>
</span>
<div>%%+ font BHQHNG+Berkeley-Book<br>
</div>
<div>%%+ font BHQHNH+HelveticaLTStd-Bold<br>
</div>
<div>%%+ font BHQHNI+Berkeley-Bold<br>
</div>
<div>%%EOF<br>
</div>
<span></span>so it looks like pdftops is subsetting the fonts.</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span>"grep Berkeley-Bold", for example, shows<br>
</span>
<div>%%BeginResource: font BHQHNI+Berkeley-Bold<br>
</div>
<div>/CIDFontName /BHQHNI+Berkeley-Bold def<br>
</div>
<div>/F60_0 /BHQHNI+Berkeley-Bold 0 pdfMakeFont16L3<br>
</div>
<div>%%+ font BHQHNI+Berkeley-Bold<br>
</div>
<div><br>
</div>
<div><span>"grep -A 1 ' Tc$' x.ps | grep '(' | head" also appears to show that the fonts have been subsetted.<br>
</span>
<div>(\000\025\000\014)<br>
</div>
<div>(\000\015\000\024)<br>
</div>
<div>(\000\001\000*)<br>
</div>
<div>(\000\002\000\003\000\012)<br>
</div>
<div>(\000\006\000\015)<br>
</div>
<div>(\000\014\000\017\000\005\000\007)<br>
</div>
<div>(\000\033\000\031)<br>
</div>
<div>(\000\013\000"\000"\000\026\000\022)<br>
</div>
<div>(\000\012\000\004)<br>
</div>
<div>(\000\024\000\023\000\017\000\001)</div>
</div>
<div><br>
</div>
<span></span></div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
In testing, I also noticed that some pdftops options like -level3 generate ps files that crash ghostscript, but for now I think that is a ghostscript issue. <a href="https://bugs.ghostscript.com/show_bug.cgi?id=702526" id="LPlnk453486">https://bugs.ghostscript.com/show_bug.cgi?id=702526</a></div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
The ghostscript bug report has a copy of the PDF.</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I can post this as a poppler bug report, but I wanted to check first that I didn't miss a pdftops option or that there wasn't an internal flag that I could expose as an option in pdftops.</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
William</div>
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
</body>
</html>