[poppler] Bug 69485

Adrian Johnson ajohnson at redneon.com
Tue Jan 7 04:58:29 PST 2014


On 07/01/14 17:34, Ross Moore wrote:
> Hi Alex,
> 
> On 07/01/2014, at 4:35 PM, Alex Korobkin wrote:
> 
>>> Hi Ross, 
>>>
>>> 2014/1/5 Ross Moore <ross.moore at mq.edu.au>
>>>
>>>>> While we're on this subject, maybe you could have a look at the PS output produced by pdftops, when processing the same file?
>>>>> The resulting level 3 PostScript cannot be parsed by Distiller either, the error is
>>>>>
>>>>> %%[ Error: undefined; OffendingCommand: xyshow ]%%
>>>
>>> OK. I can reproduce this.
>>>
>>> Again  ps2pdf  has no problem with it, but Apple's  pstopdf
>>> also fails to do the conversion.
>>>
>>>
>>> This is most perplexing as the  xyshow  command is handled
>>> correctly 10 times, but fails on the 11th usage.
>>>
>>>
>> Just to be sure I understand this correctly: I only see xyshow being used once in the document, when defining Tj macro. 
> 
> That's correct.
> 
>> Do you refer to the 11th invocation of Tj macro? 
> 
> Yes.
> I think this is called for each syllable or group of letters in each word.
> In particular I think it is called for each individual chinese character,
> or a group of characters.
> 
>>>  
>>> It seems that the difficulty is first encountered
>>> when handling the chinese characters in the heading of
>>> the Form.
>>> (Preceding this are the characters of:  "Form V.2013"
>>> at top-right of page 1. These are done OK. )
>>>
>>> Here's how I can check this:
>>>
>>>>> % text string operators
>>>>> /xyshow where {
>>>>>   pop
>>>>>   /xyshow2 {
>>>>>     dup length array
>>>>>     0 2 2 index length 1 sub {
>>>>>       2 index 1 index 2 copy get 3 1 roll 1 add get
>>>>>       pdfTextMat dtransform
>>>>>       4 2 roll 2 copy 6 5 roll put 1 add 3 1 roll dup 4 2 roll put
>>>>>     } for
>>>>>     exch pop
>>>   mark /xyshow where pstack cleartomark  %<--- insert this line
>>>>>     xyshow
>>>>>   } def
>>>>> }{
>>>>>   /xyshow2 {
>>>>>     currentfont /FontType get 0 eq {
>>>     ... etc ...
>>>
>>> This causes the contents of the stack to be written to the Log
>>> immediately before  xyshow  is called.
>>> The indication is that  xyshow  is indeed well-defined,
>>> yet something still goes wrong.
> 
>>
>> Thank you for the hint, I will try to do more debugging using this technique. 
> 
> Postscript has a few neat constructions that help with debugging.
> Another, which might come in helpful here, is to use the 'stopped'
> operator.
> With this, you can cause messages to be printed to the log,
> only when an error has been encountered. This kind of debugging
> would most likely be using 'pstack', 'mark' and 'cleartomark'
> if you want to see what was on the stack when the error occurs.
> 
> 
>>> By commenting out groups of lines like this, I can process
>>> further and further into the file.
>>>
>>>
>>> Does it mean that the error is not caused by any particular call to xyshow, but more likely by the number of such calls made consequently? 
>> Maybe it is some kind of nesting issue, or stack not been freed properly issue?
> 
> Not the number of calls.
> There can be a large number of instances of  xyshow  between those
> which fail, or they can be more or less adjacent.
> 
> I suspect that the character strings that  xyshow  is trying to set
> are faulty in some way; that is, correspond to non-existent code-points
> or glyphs within the subsetted font.
> But I'm no expert on this, so it is pretty much guesswork. 

I agree it is probably the character string. You could confirm this by
replacing the offsets array and Tj with "show". eg

(A\361+cB'>\230) show

Does it also fail if you convert to level 2 PS? The level 2 output uses
composite fonts instead of CID fonts for 16-bit fonts. If level 2 fails
the problem is more likely to be in the sfnt data. At this point you
need to convert the sfnt data to binary and examine it using tools like
showttf, ttx, and fontforge.

Since the file works with ghostscript and some printers the problem
won't be something that is completely wrong like using a non existent
glyph. It will be something more subtle like incorrectly aligned data or
an invalid value in an optional field.


>>>  
>>> It seems that the errors occur with chinese characters,
>>> always when coming from the font referenced as:  /F243_0
>>> which is:
>>>
>>> /F243_0 /ZJWNJQ+SimSun 0 pdfMakeFont16L3
>>
>>
>>> Thus it would seem that there could be something badly wrong
>>> with this font, or with the way it is being used in this document.
>>>
>>> Note carefully what I am saying here.
>>> Not every instance of this font's usage causes an error,
>>> but all the errors that I have found are associated with
>>> an instance of this font's use.
>>>
>>>
>>>
>>>>>
>>>>> The PS file can be retrieved from here, it is 18Mb in size. (Unlike pdftocairo, pdftops generates huge PS files. This particular one gets 10x larger when I provide licensed fonts to pdftops.)
>>>
>>> Yes.
>>> Almost all of the first 87% of the file is devoted to the fonts.
>>>
>>
>> I kind of wonder why pdftops embeds so many instances of the font, while both pdftocairo and pdf2ps somehow avoid this problem and create smaller PS documents. But, that's a subject for another discussion. 
> 
> 
> A lot of large fonts are being included.
> Some are not even used, I'd guess.
> viz.
> 
> [GlenMorangie:] rossmoor% grep -n font china-visa-application-without-fonts.ps | grep BeginResource
> 509:%%BeginResource: font ZJWNJQ+SimSun
> 11688:%%BeginResource: font AASELS+TimesNewRoman,Bold
> 14028:%%BeginResource: font JEIVZQ+SimSun
> 25207:%%BeginResource: font HRUUFF+SimSun
> 36454:%%BeginResource: font AdobeSongStd-Light
> 61390:%%BeginResource: font SimHei
> 86326:%%BeginResource: font SimSun
> 111296:%%BeginResource: font MicrosoftYaHei
> 136232:%%BeginResource: font MicrosoftYaHei,Bold
> 160033:%%BeginResource: font NSimSun
> 185037:%%BeginResource: font AdobeSongStd#20Light
> 209973:%%BeginResource: font FF487_0_ZJWNJQ+SimSun
> 221186:%%BeginResource: font FF589_0_ZJWNJQ+SimSun
> 232365:%%BeginResource: font QGJLNI+CambriaMath
> 234968:%%BeginResource: font FF520_0_AdobeSongStd-Light
> 259938:%%BeginResource: font KozMinPr6N-Regular
> 
> The subset:  ZJWNJQ+SimSun  isn't too large, at roughly 11000 lines. 
> Whereas the un-subset  SimSun  is roughly 25000 lines.
> But then there seem to be 2 more subsets:  
>   FF487_0_ZJWNJQ+SimSun   and   FF589_0_ZJWNJQ+SimSun .
> 
>>
>> I only point to this location because it's the first mention of the [1.447 0 1.447 0 1.447 0 1.447 0] sequence, referred to by Distiller error message. Perhaps I misinterpret Distiller's message. 
> 
> Arrays with these numbers occur quite a lot.
> It determines the spacing between successive characters or glyphs.
> 
> e.g. at line 284996 we get a failing block:
> 
> (A\361+cB'>\230)
>  [1.447
>  0
>  1.447
>  0
>  1.447
>  0
>  1.447
>  0] Tj
> 
> There are 4 instances of 16-bit character or glyph-ids here:
>   "A\361", "+c", "B'", ">\230"
> where each character or octal code (\xxx) represents 8 binary bits.
> If I'm converting these into Hex correctly, they should correspond
> to the unicode characters:
> 
> Ux0041F1 : 䇱
> Ux002B63 :  ???
> Ux004227 : 䈧
> Ux003E98 : 㺘
> 
> That 2nd one looks suspicious.
> So maybe these codes do not map directly to Unicode.
> We would have to look more closely at the font itself,
> which is not so easy to do --- at least not for me.
> 
> Also in this vein, this string works OK, from line 284976 :
>  (\004]\011~\004\352"A)
>  Ux000456 : і
>  Ux00117E : ᅾ
>  Ux0004DA : Ӛ
>  Ux002241 : ≁
> but the symbols are not all chinese.
> 
> So this probably isn't the correct interpretation.
> 
>>
>> Hope this helps,
>>
>> It helps greatly, thanks again. 
>>
>>
>> -Alex
> 
> I don't think that there is anything more that I can do.
> Let me know if you get anywhere further with this.
> 
> 
> Cheers,
> 
> 	Ross
> 
> ------------------------------------------------------------------------
> Ross Moore                                       ross.moore at mq.edu.au 
> Mathematics Department                           office: E7A-206      
> Macquarie University                             tel: +61 (0)2 9850 8955
> Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
> ------------------------------------------------------------------------
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
> 



More information about the poppler mailing list