[poppler] Bug 69485

Tue Jan 7 11:02:18 PST 2014

2014/1/7 Adrian Johnson <ajohnson at redneon.com>

> On 07/01/14 17:34, Ross Moore wrote:
> > Hi Alex,
> >
> > On 07/01/2014, at 4:35 PM, Alex Korobkin wrote:
> >
> >>> Hi Ross,
> >>>
> >>> 2014/1/5 Ross Moore <ross.moore at mq.edu.au>
> >>>
> >>>>> While we're on this subject, maybe you could have a look at the PS
> output produced by pdftops, when processing the same file?
> >>>>> The resulting level 3 PostScript cannot be parsed by Distiller
> either, the error is
> >>>>>
> >>>>> %%[ Error: undefined; OffendingCommand: xyshow ]%%
> >>>
> >>> OK. I can reproduce this.
> >>>
> >>> Again  ps2pdf  has no problem with it, but Apple's  pstopdf
> >>> also fails to do the conversion.
> >>>
> >>>
> >>> This is most perplexing as the  xyshow  command is handled
> >>> correctly 10 times, but fails on the 11th usage.
> >>>
> >>>
> >> Just to be sure I understand this correctly: I only see xyshow being
> used once in the document, when defining Tj macro.
> >
> > That's correct.
> >
> >> Do you refer to the 11th invocation of Tj macro?
> >
> > Yes.
> > I think this is called for each syllable or group of letters in each
> word.
> > In particular I think it is called for each individual chinese character,
> > or a group of characters.
> >
> >>>
> >>> It seems that the difficulty is first encountered
> >>> when handling the chinese characters in the heading of
> >>> the Form.
> >>> (Preceding this are the characters of:  "Form V.2013"
> >>> at top-right of page 1. These are done OK. )
> >>>
> >>> Here's how I can check this:
> >>>
> >>>>> % text string operators
> >>>>> /xyshow where {
> >>>>>   pop
> >>>>>   /xyshow2 {
> >>>>>     dup length array
> >>>>>     0 2 2 index length 1 sub {
> >>>>>       2 index 1 index 2 copy get 3 1 roll 1 add get
> >>>>>       pdfTextMat dtransform
> >>>>>       4 2 roll 2 copy 6 5 roll put 1 add 3 1 roll dup 4 2 roll put
> >>>>>     } for
> >>>>>     exch pop
> >>>   mark /xyshow where pstack cleartomark  %<--- insert this line
> >>>>>     xyshow
> >>>>>   } def
> >>>>> }{
> >>>>>   /xyshow2 {
> >>>>>     currentfont /FontType get 0 eq {
> >>>     ... etc ...
> >>>
> >>> This causes the contents of the stack to be written to the Log
> >>> immediately before  xyshow  is called.
> >>> The indication is that  xyshow  is indeed well-defined,
> >>> yet something still goes wrong.
> >
> >>
> >> Thank you for the hint, I will try to do more debugging using this
> technique.
> >
> > Postscript has a few neat constructions that help with debugging.
> > Another, which might come in helpful here, is to use the 'stopped'
> > operator.
> > With this, you can cause messages to be printed to the log,
> > only when an error has been encountered. This kind of debugging
> > would most likely be using 'pstack', 'mark' and 'cleartomark'
> > if you want to see what was on the stack when the error occurs.
> >
> >
> >>> By commenting out groups of lines like this, I can process
> >>> further and further into the file.
> >>>
> >>>
> >>> Does it mean that the error is not caused by any particular call to
> xyshow, but more likely by the number of such calls made consequently?
> >> Maybe it is some kind of nesting issue, or stack not been freed
> properly issue?
> >
> > Not the number of calls.
> > There can be a large number of instances of  xyshow  between those
> > which fail, or they can be more or less adjacent.
> >
> > I suspect that the character strings that  xyshow  is trying to set
> > are faulty in some way; that is, correspond to non-existent code-points
> > or glyphs within the subsetted font.
> > But I'm no expert on this, so it is pretty much guesswork.
>
> I agree it is probably the character string. You could confirm this by
> replacing the offsets array and Tj with "show". eg
>
> (A\361+cB'>\230) show
>
> Does it also fail if you convert to level 2 PS? The level 2 output uses
> composite fonts instead of CID fonts for 16-bit fonts. If level 2 fails
> the problem is more likely to be in the sfnt data. At this point you
> need to convert the sfnt data to binary and examine it using tools like
> showttf, ttx, and fontforge.
>
>
Yes, it fails with level 2 and produces exactly the same error.

> Since the file works with ghostscript and some printers the problem
> won't be something that is completely wrong like using a non existent
> glyph. It will be something more subtle like incorrectly aligned data or
> an invalid value in an optional field.

Let's summarize it:
When we convert china-visa-application.pdf to level 3 PostScript,
pdftocairo (with your patch), GhostScript, and Distiller all produce valid
PostScript that can be read by Distiller and printed by Ricoh printers. At
the same time, pdftops produces the PostScript that is huge in size and
cannot be read by Distiller, but looks to be valid.

I'm curious what does poppler pdftops do with fonts that makes such a
noticeable difference in the resulting file? Could anything be done to make
the resulting PostScript be more compatible with Adobe products?

-Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20140107/148466a9/attachment-0001.html>