[poppler] poppler util pdftohtml
Albert Astals Cid
aacid at kde.org
Fri Sep 23 01:46:02 PDT 2011
A Dijous, 22 de setembre de 2011, Leonard Rosenthol vàreu escriure:
> Boy, your lawyer needs to read up on IP law :).
> Since you do NOT have a license for the font data contained in the PDF,
> your software has NO RIGHTS to use that information for anything other
> than rendering the glyphs in the PDF. You certainly have NO rights to
> convert the format - in fact, doing so is a clear and distinct violation
> of the font licenses.
> As such, if your patches to pdf2html extract the font data for use in the
> HTML - I STRONGLY recommend that the code NOT be accepted into the master
We do already have code that extracts the font stream. Saying this is illegal
is insane, after all it is just a series of bits in a given file. Are you
saying cat or less or vi are illegal?
> On 9/22/11 6:40 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >I'm not a lawyer, but I did check with one. I don't think software can
> >violate your IP/licenses, at least as long as that software doesn't
> >contain unauthorized copyrighted material -- which pdftohtml does not
> >AFAIK -- I certainly didn't add any to it.
> >Best, --josh
> >On 9/22/11 3:08 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
> >>I can't recall what you said about this in the past, but since I was
> >>dealing with it today.
> >>What do you do about embedded fonts?
> >>As my company (Adobe) sells/creates fonts, I want to make sure that
> >>pdftohtml won't be violating our IP/licenses.
> >>Thanks in advance,
> >>On 9/22/11 5:51 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >>>On 9/22/11 12:20 PM, "Jonathan Kew" <jfkthame at googlemail.com> wrote:
> >>>>More generally, it is not possible to recreate useful XHTML (or
> >>>>documents from arbitrary PDF files with anything like 100%
> >>>>because many PDF files do not contain adequate information to
> >>>>map the rendered glyphs back to correct Unicode text, or to reliably
> >>>>reconstruct the proper flow of text. Constructs such as ActualText
> >>>>help, but are often lacking from real-world PDF documents.
> >>>W.r.t. rendering glyphs, we get around the problem of missing unicode
> >>>mappings by taking any glyph without a unicode mapping and assigning
> >>>offset in the private space of Unicode. This produces the correct
> >>>result in the XHTML, but not a full semantic representation. If
> >>>interested, they could get the semantics right too by pattern-matching
> >>>glyph against an appropriate Unicode font.
> >>>W.r.t. the flow of text, there have been other threads on this topic,
> >>>pdftohtml does make some attempt, and I believe it's possible to do
> >>>to a high degree of accuracy, maybe >99% -- that said, noone has done
> >>>yet, so either it's harder than I think, or no-one has cared enough to
> >>>really try (and I still fall into that camp.)
> >>>Best, --josh
> >>>poppler mailing list
> >>>poppler at lists.freedesktop.org
> >poppler mailing list
> >poppler at lists.freedesktop.org
> poppler mailing list
> poppler at lists.freedesktop.org
More information about the poppler