[Poppler-bugs] [Bug 107303] "8" shown instead of "x" inside checkbox when converting LibreOffice-generated form to PostScript

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Aug 3 07:17:57 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=107303

--- Comment #9 from Michael Weghorn <m.weghorn at posteo.de> ---
I have had a closer look at some aspects now.

(In reply to Tobias Deiminger from comment #5)
> We're having a similar discussion atm. here [0], and also here [1], because
> of our current GSoC project. Maybe you have a look, esp. at the UML diagram
> [2] that shows the relationship of the different font objects and give your
> two cents if we're on the right track.
Thanks for mentioning these, there's lots of helpful information.
The UML diagram looks good to me. (I just realized that not all members are
shown for all types, e.g. the 'CapHeight' member for the 'Font descriptor' is
not mentioned, and some of the font dictionary members mentioned in section
9.6.2 in the PDF spec, but that may be intentional.)


> Btw., your attached PDF document is actually strange because it has a /DR
> entry in the Annot dictionary, which is not specified for Widget Annotation
> Dictionaries. At least not in PDF 1.7 32000-1:8. The /DR entry is meant to
> be in the global AcroForm dictionary. Has this changed in PDF 2.0?
I also can't find a specification for the the '/DR' for the Annot dictionary in
the PDF 1.7 spec. I don't know about PDF 2.0, but at a quick glance, the
corresponding code in LibreOffice has been there for a long time, so I doubt
it's related to any newer PDF standard.
However (as far as I can see), the behaviour is still the same after manually
removing the '/DR' entry from the Annot dictionary (object '3 0 obj'). (The
AcroForm dictionary also specifies the font in its '/DR' entry.)

(In reply to Tobias Deiminger from comment #8)
> I could not yet discover the place where the [(8)-77]TJ gets formed. An
> obvious location to generate the stream is AnnotAppearanceBuilder::drawText,
> but I debugged it and it produces slightly different content
> q
> BT
> 0.13725 0.14901 0.15294 rg /ZaDb 11.00 Tf 1 0 0 1 2.43 1.55 Tm
> (8) Tj
> ET
> Q
> 
> Maybe AnnotAppearanceBuilder::drawText is used, and there is some post
> processing that I'm not aware of? Michael, do you know?

Just to be sure: Are you using the "Print to File (PDF)" option from Okular to
print to PDF? (I can reproduce the behaviour when doing so.) In this case,
Okular first generates a PostScript file using Poppler's PSConverter, and then
runs `ps2pdf` on that file (s. method `FilePrinter::doPrintFiles` in
`core/fileprinter.cpp`, therefore the related PDF code should be be formed in
that conversion done by Ghostscript (with `ps2pdf` being a Ghostscript tool).
Therefore, two conversions are actually involved (PDF -> PS -> PDF).



> 
> Anyway there seems to be a fundamental problem. All the Annotation classes
> dynamically generate in-memory appearance streams and may depend on
> in-memory resources. If we simply take this generated appearance streams and
> write them into a PDF file for printing, then dependent in-memory resources
> like the fake font are missing. We would have to write the resource objects
> to the PDF too, but that's not yet done.
> 
> In your patch you prefer an existing zapf dingbats font over the in-memory
> fake font which works then. If we had a document with no zapf dingbat font
> and no CA defined, then GooString checkMark("3") will be used (see
> AnnotAppearanceBuilder::drawFormFieldButton) and we get the same bug again,
> is it?

Yes, I think the problem reappears then. So if I understand correctly,
what should be done is to write the objects currently only created
in-memory to the PDF document and this would solve the problem for both
cases (the original document and the case you describe here).

Still, one aspect that I currently haven't understood is why
`forceZapfDingbats` is always set to 'true' whenever a checkbox is drawn
via `AnnotAppearanceBuilder::drawFormFieldButton [case formButtonCheck]`.
Do you know why?

My (maybe naive) expectation without further examination would have been
that an explicitly specified font is used if there is any, rather than
always forcing ZapfDingbats (using the interactive form dicts `DR` entry
as specified in Section 12.7.2 of the PDF 1.7 spec, table 218).

In that case, I'd currently see two cases that could be distinguished:

1) If the document supplies proper information and resources for the font,
those should be used (e.g. as with the given sample document here).

2) Otherwise ZapfDingbats is used and all required resources are saved in
the document as well.

Does this make sense or did I miss any reason for using ZapfDingbats
unconditionally? (like one ould never want anything else than
ZapfDinbats's '8' (check mark) in a checkbox anyway)

As far as I understand, the visual result would be the same for
implementing a solution for either 1) or 2) for the given sample document
(since ZapfDingbats is used in both cases), but other documents might behave
differently.

Please also let me know in case I missed to reply to any other question or
aspect you mentioned.

Another interesting thing I realized is that using Poppler's 'pdftocairo'
results in a PDF file that has the check mark shown properly (even though the
same warning about the unknown font tag is being shown); command:

    $ pdftocairo -pdf simple_form_CHECKBOX_TICKED_CLEANED.pdf fromCairo.pdf
    Syntax Error: Unknown font tag 'ZaDb'

I haven't had a closer look at this so far.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20180803/c7efb920/attachment-0001.html>


More information about the Poppler-bugs mailing list