[poppler] poppler util pdftohtml
jric at chegg.com
Thu Sep 22 15:40:18 PDT 2011
I'm not a lawyer, but I did check with one. I don't think software can
violate your IP/licenses, at least as long as that software doesn't
contain unauthorized copyrighted material -- which pdftohtml does not
AFAIK -- I certainly didn't add any to it.
On 9/22/11 3:08 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
>I can't recall what you said about this in the past, but since I was just
>dealing with it today.
>What do you do about embedded fonts?
>As my company (Adobe) sells/creates fonts, I want to make sure that
>pdftohtml won't be violating our IP/licenses.
>Thanks in advance,
>On 9/22/11 5:51 PM, "Josh Richardson" <jric at chegg.com> wrote:
>>On 9/22/11 12:20 PM, "Jonathan Kew" <jfkthame at googlemail.com> wrote:
>>>More generally, it is not possible to recreate useful XHTML (or similar)
>>>documents from arbitrary PDF files with anything like 100% reliability,
>>>because many PDF files do not contain adequate information to accurately
>>>map the rendered glyphs back to correct Unicode text, or to reliably
>>>reconstruct the proper flow of text. Constructs such as ActualText may
>>>help, but are often lacking from real-world PDF documents.
>>W.r.t. rendering glyphs, we get around the problem of missing unicode
>>mappings by taking any glyph without a unicode mapping and assigning it
>>offset in the private space of Unicode. This produces the correct visual
>>result in the XHTML, but not a full semantic representation. If
>>interested, they could get the semantics right too by pattern-matching
>>glyph against an appropriate Unicode font.
>>W.r.t. the flow of text, there have been other threads on this topic, but
>>pdftohtml does make some attempt, and I believe it's possible to do this
>>to a high degree of accuracy, maybe >99% -- that said, noone has done it
>>yet, so either it's harder than I think, or no-one has cared enough to
>>really try (and I still fall into that camp.)
>>poppler mailing list
>>poppler at lists.freedesktop.org
More information about the poppler