[poppler] poppler util pdftohtml

Leonard Rosenthol lrosenth at adobe.com
Fri Sep 23 04:08:39 PDT 2011

Marc – I don't understand your argument about Fonts vs. PDF.

Fonts are licensed in their own way -  with both technical and legal restrictions.  They clearly indicate if the font can be embedded in other formats (not just PDF, but also Word/Office, XPS, etc.) and for what purposes.  In addition, the license usually also restricts conversion of the font into other formats (eg. TTF->WOFF).   All of this has NOTHING to do with PDF.

As a long time (and "active") member of this project, I am simply raising a concern to protect this project from potential legal action.   I am not suggesting that Adobe is coming after anyone concerning fonts, but as you note below, I do NOT speak for Monotype or any other foundry…


From: "Marc J. Driftmeyer" <mjd at reanimality.com<mailto:mjd at reanimality.com>>
Reply-To: "Marc J. Driftmeyer" <mjd at reanimality.com<mailto:mjd at reanimality.com>>
Date: Thu, 22 Sep 2011 22:04:20 -0700
To: "poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>" <poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>>
Subject: Re: [poppler] poppler util pdftohtml

Just my two cents from my days at NeXT and Apple:

If Adobe were concerned with violating copyright issues with Fonts they never would have released the ISOs for PDF to the general public.

You can't have your cake and eat it too.

You want PDF to be ubiquitous but then you want to hen peck every Tom, Dick and Harry on licensing third party fonts?

Are you authorized to speak for Monotype? Or any other Font Author?

Font Portfolios licensed to corporations is a lucrative business. Attempting to police this on an individual who has a document they want to convert to html is asking for a Xerox ruling on Photocopying and your IP becoming so common it becomes worthless.

By all means, go after millions of users who extract fonts from PDF documents having embedded fonts in them. Put excessive DRM schemes on your solution.

Watch your entire intent of making PDF ubiquitous go down in flames.

Adobe better find some other way to leverage their Font technology for profit that is a value added service in n-tier markets or watch your customer base completely erode.

This finger waging about Fonts reminds me of that finger waving about Display Postscript and we decided to create Display PDF instead.

- Marc

On 09/22/2011 03:08 PM, Leonard Rosenthol wrote:

I can't recall what you said about this in the past, but since I was just
dealing with it today.

What do you do about embedded fonts?

As my company (Adobe) sells/creates fonts, I want to make sure that
pdftohtml won't be violating our IP/licenses.

Thanks in advance,

On 9/22/11 5:51 PM, "Josh Richardson" <jric at chegg.com><mailto:jric at chegg.com> wrote:

On 9/22/11 12:20 PM, "Jonathan Kew" <jfkthame at googlemail.com><mailto:jfkthame at googlemail.com> wrote:

More generally, it is not possible to recreate useful XHTML (or similar)
documents from arbitrary PDF files with anything like 100% reliability,
because many PDF files do not contain adequate information to accurately
map the rendered glyphs back to correct Unicode text, or to reliably
reconstruct the proper flow of text. Constructs such as ActualText may
help, but are often lacking from real-world PDF documents.

W.r.t. rendering glyphs, we get around the problem of missing unicode
mappings by taking any glyph without a unicode mapping and assigning it an
offset in the private space of Unicode.  This produces the correct visual
result in the XHTML, but not a full semantic representation.  If someone's
interested, they could get the semantics right too by pattern-matching the
glyph against an appropriate Unicode font.

W.r.t. the flow of text, there have been other threads on this topic, but
pdftohtml does make some attempt, and I believe it's possible to do this
to a high degree of accuracy, maybe >99% -- that said, noone has done ityet, so either it's harder than I think, or no-one has cared enough to
really try (and I still fall into that camp.)

Best, --josh

poppler mailing list
poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>http://lists.freedesktop.org/mailman/listinfo/poppler

poppler mailing list
poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>http://lists.freedesktop.org/mailman/listinfo/poppler

Marc J. Driftmeyer
Email :: mjd at reanimality.com<mailto:mjd at reanimality.com>
Web :: http://www.reanimality.com
Cell :: (509) 435-5212
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110923/6f33ccd7/attachment.htm>

More information about the poppler mailing list