[poppler] poppler util pdftohtml

Leonard Rosenthol lrosenth at adobe.com
Thu Sep 22 18:04:50 PDT 2011

Boy, your lawyer needs to read up on IP law :).

Since you do NOT have a license for the font data contained in the PDF,
your software has NO RIGHTS to use that information for anything other
than rendering the glyphs in the PDF.  You certainly have NO rights to
convert the format - in fact, doing so is a clear and distinct violation
of the font licenses.

As such, if your patches to pdf2html extract the font data for use in the
HTML - I STRONGLY recommend that the code NOT be accepted into the master


On 9/22/11 6:40 PM, "Josh Richardson" <jric at chegg.com> wrote:

>I'm not a lawyer, but I did check with one.  I don't think software can
>violate your IP/licenses, at least as long as that software doesn't
>contain unauthorized copyrighted material -- which pdftohtml does not
>AFAIK -- I certainly didn't add any to it.
>Best, --josh
>On 9/22/11 3:08 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
>>I can't recall what you said about this in the past, but since I was just
>>dealing with it today.
>>What do you do about embedded fonts?
>>As my company (Adobe) sells/creates fonts, I want to make sure that
>>pdftohtml won't be violating our IP/licenses.
>>Thanks in advance,
>>On 9/22/11 5:51 PM, "Josh Richardson" <jric at chegg.com> wrote:
>>>On 9/22/11 12:20 PM, "Jonathan Kew" <jfkthame at googlemail.com> wrote:
>>>>More generally, it is not possible to recreate useful XHTML (or
>>>>documents from arbitrary PDF files with anything like 100% reliability,
>>>>because many PDF files do not contain adequate information to
>>>>map the rendered glyphs back to correct Unicode text, or to reliably
>>>>reconstruct the proper flow of text. Constructs such as ActualText may
>>>>help, but are often lacking from real-world PDF documents.
>>>W.r.t. rendering glyphs, we get around the problem of missing unicode
>>>mappings by taking any glyph without a unicode mapping and assigning it
>>>offset in the private space of Unicode.  This produces the correct
>>>result in the XHTML, but not a full semantic representation.  If
>>>interested, they could get the semantics right too by pattern-matching
>>>glyph against an appropriate Unicode font.
>>>W.r.t. the flow of text, there have been other threads on this topic,
>>>pdftohtml does make some attempt, and I believe it's possible to do this
>>>to a high degree of accuracy, maybe >99% -- that said, noone has done it
>>>yet, so either it's harder than I think, or no-one has cared enough to
>>>really try (and I still fall into that camp.)
>>>Best, --josh
>>>poppler mailing list
>>>poppler at lists.freedesktop.org
>poppler mailing list
>poppler at lists.freedesktop.org

More information about the poppler mailing list