[poppler] poppler util pdftohtml

Albert Astals Cid aacid at kde.org
Fri Sep 23 01:46:02 PDT 2011

A Dijous, 22 de setembre de 2011, Leonard Rosenthol vàreu escriure:
> Boy, your lawyer needs to read up on IP law :).
> Since you do NOT have a license for the font data contained in the PDF,
> your software has NO RIGHTS to use that information for anything other
> than rendering the glyphs in the PDF.  You certainly have NO rights to
> convert the format - in fact, doing so is a clear and distinct violation
> of the font licenses.
> As such, if your patches to pdf2html extract the font data for use in the
> HTML - I STRONGLY recommend that the code NOT be accepted into the master
> repository.

We do already have code that extracts the font stream. Saying this is illegal 
is insane, after all it is just a series of bits in a given file. Are you 
saying cat or less or vi are illegal?


> Leonard
> On 9/22/11 6:40 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >I'm not a lawyer, but I did check with one.  I don't think software can
> >violate your IP/licenses, at least as long as that software doesn't
> >contain unauthorized copyrighted material -- which pdftohtml does not
> >AFAIK -- I certainly didn't add any to it.
> >
> >Best, --josh
> >
> >On 9/22/11 3:08 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
> >>I can't recall what you said about this in the past, but since I was
> >>just
> >>dealing with it today.
> >>
> >>What do you do about embedded fonts?
> >>
> >>As my company (Adobe) sells/creates fonts, I want to make sure that
> >>pdftohtml won't be violating our IP/licenses.
> >>
> >>Thanks in advance,
> >>Leonard
> >>
> >>On 9/22/11 5:51 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >>>On 9/22/11 12:20 PM, "Jonathan Kew" <jfkthame at googlemail.com> wrote:
> >>>>More generally, it is not possible to recreate useful XHTML (or
> >>>>similar)
> >>>>documents from arbitrary PDF files with anything like 100%
> >>>>reliability,
> >>>>because many PDF files do not contain adequate information to
> >>>>accurately
> >>>>map the rendered glyphs back to correct Unicode text, or to reliably
> >>>>reconstruct the proper flow of text. Constructs such as ActualText
> >>>>may
> >>>>help, but are often lacking from real-world PDF documents.
> >>>
> >>>W.r.t. rendering glyphs, we get around the problem of missing unicode
> >>>mappings by taking any glyph without a unicode mapping and assigning
> >>>it
> >>>an
> >>>offset in the private space of Unicode.  This produces the correct
> >>>visual
> >>>result in the XHTML, but not a full semantic representation.  If
> >>>someone's
> >>>interested, they could get the semantics right too by pattern-matching
> >>>the
> >>>glyph against an appropriate Unicode font.
> >>>
> >>>W.r.t. the flow of text, there have been other threads on this topic,
> >>>but
> >>>pdftohtml does make some attempt, and I believe it's possible to do
> >>>this
> >>>to a high degree of accuracy, maybe >99% -- that said, noone has done
> >>>it
> >>>yet, so either it's harder than I think, or no-one has cared enough to
> >>>really try (and I still fall into that camp.)
> >>>
> >>>Best, --josh
> >>>
> >>>_______________________________________________
> >>>poppler mailing list
> >>>poppler at lists.freedesktop.org
> >>>http://lists.freedesktop.org/mailman/listinfo/poppler
> >
> >_______________________________________________
> >poppler mailing list
> >poppler at lists.freedesktop.org
> >http://lists.freedesktop.org/mailman/listinfo/poppler
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler

More information about the poppler mailing list