[poppler] poppler util pdftohtml

Marc J. Driftmeyer mjd at reanimality.com
Thu Sep 22 22:04:20 PDT 2011


Just my two cents from my days at NeXT and Apple:

If Adobe were concerned with violating copyright issues with Fonts they 
never would have released the ISOs for PDF to the general public.

You can't have your cake and eat it too.

You want PDF to be ubiquitous but then you want to hen peck every Tom, 
Dick and Harry on licensing third party fonts?

Are you authorized to speak for Monotype? Or any other Font Author?

Font Portfolios licensed to corporations is a lucrative business. 
Attempting to police this on an individual who has a document they want 
to convert to html is asking for a Xerox ruling on Photocopying and your 
IP becoming so common it becomes worthless.

By all means, go after millions of users who extract fonts from PDF 
documents having embedded fonts in them. Put excessive DRM schemes on 
your solution.

Watch your entire intent of making PDF ubiquitous go down in flames.

Adobe better find some other way to leverage their Font technology for 
profit that is a value added service in n-tier markets or watch your 
customer base completely erode.

This finger waging about Fonts reminds me of that finger waving about 
Display Postscript and we decided to create Display PDF instead.

- Marc

On 09/22/2011 03:08 PM, Leonard Rosenthol wrote:
> I can't recall what you said about this in the past, but since I was just
> dealing with it today.
>
> What do you do about embedded fonts?
>
> As my company (Adobe) sells/creates fonts, I want to make sure that
> pdftohtml won't be violating our IP/licenses.
>
> Thanks in advance,
> Leonard
>
> On 9/22/11 5:51 PM, "Josh Richardson"<jric at chegg.com>  wrote:
>
>> On 9/22/11 12:20 PM, "Jonathan Kew"<jfkthame at googlemail.com>  wrote:
>>> More generally, it is not possible to recreate useful XHTML (or similar)
>>> documents from arbitrary PDF files with anything like 100% reliability,
>>> because many PDF files do not contain adequate information to accurately
>>> map the rendered glyphs back to correct Unicode text, or to reliably
>>> reconstruct the proper flow of text. Constructs such as ActualText may
>>> help, but are often lacking from real-world PDF documents.
>> W.r.t. rendering glyphs, we get around the problem of missing unicode
>> mappings by taking any glyph without a unicode mapping and assigning it an
>> offset in the private space of Unicode.  This produces the correct visual
>> result in the XHTML, but not a full semantic representation.  If someone's
>> interested, they could get the semantics right too by pattern-matching the
>> glyph against an appropriate Unicode font.
>>
>> W.r.t. the flow of text, there have been other threads on this topic, but
>> pdftohtml does make some attempt, and I believe it's possible to do this
>> to a high degree of accuracy, maybe>99% -- that said, noone has done it
>> yet, so either it's harder than I think, or no-one has cared enough to
>> really try (and I still fall into that camp.)
>>
>> Best, --josh
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler

-- 
Marc J. Driftmeyer
Email :: mjd at reanimality.com <mailto:mjd at reanimality.com>
Web :: http://www.reanimality.com
Cell :: (509) 435-5212
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110922/25b8b0d9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mjd.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110922/25b8b0d9/attachment.vcf>


More information about the poppler mailing list