[poppler] poppler util pdftohtml

Albert Astals Cid aacid at kde.org
Fri Sep 23 07:00:01 PDT 2011


A Divendres, 23 de setembre de 2011, Leonard Rosenthol vàreu escriure:
> They may, you are right.
> 
> If you wanted to maintain a list of known "restriction free" fonts and
> only extract those - that would probably be OK.

No way, unless is there something in the the pdf/font file to indicate it has 
a restriction, i do not see why we should assume it has one.

Albert

> 
> Leonard
> 
> On 9/22/11 9:17 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >The fonts that are embedded in a PDF may come from any source, and be
> >completely restriction-free.  It's really up to the user of the software
> >to decide.  Note that there are many many many other open source programs
> >that extract fonts from PDFs.
> >
> >--josh
> >
> >On 9/22/11 6:04 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
> >>Boy, your lawyer needs to read up on IP law :).
> >>
> >>Since you do NOT have a license for the font data contained in the PDF,
> >>your software has NO RIGHTS to use that information for anything other
> >>than rendering the glyphs in the PDF.  You certainly have NO rights to
> >>convert the format - in fact, doing so is a clear and distinct violation
> >>of the font licenses.
> >>
> >>As such, if your patches to pdf2html extract the font data for use in
> >>the
> >>HTML - I STRONGLY recommend that the code NOT be accepted into the
> >>master
> >>repository.
> >>
> >>Leonard
> >>
> >>On 9/22/11 6:40 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >>>I'm not a lawyer, but I did check with one.  I don't think software
> >>>can
> >>>violate your IP/licenses, at least as long as that software doesn't
> >>>contain unauthorized copyrighted material -- which pdftohtml does not
> >>>AFAIK -- I certainly didn't add any to it.
> >>>
> >>>Best, --josh
> >>>
> >>>On 9/22/11 3:08 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
> >>>>I can't recall what you said about this in the past, but since I was
> >>>>just
> >>>>dealing with it today.
> >>>>
> >>>>What do you do about embedded fonts?
> >>>>
> >>>>As my company (Adobe) sells/creates fonts, I want to make sure that
> >>>>pdftohtml won't be violating our IP/licenses.
> >>>>
> >>>>Thanks in advance,
> >>>>Leonard
> >>>>
> >>>>On 9/22/11 5:51 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >>>>>On 9/22/11 12:20 PM, "Jonathan Kew" <jfkthame at googlemail.com> wrote:
> >>>>>>More generally, it is not possible to recreate useful XHTML (or
> >>>>>>similar)
> >>>>>>documents from arbitrary PDF files with anything like 100%
> >>>>>>reliability,
> >>>>>>because many PDF files do not contain adequate information to
> >>>>>>accurately
> >>>>>>map the rendered glyphs back to correct Unicode text, or to
> >>>>>>reliably
> >>>>>>reconstruct the proper flow of text. Constructs such as
> >>>>>>ActualText
> >>>>>>may
> >>>>>>help, but are often lacking from real-world PDF documents.
> >>>>>
> >>>>>W.r.t. rendering glyphs, we get around the problem of missing
> >>>>>unicode
> >>>>>mappings by taking any glyph without a unicode mapping and
> >>>>>assigning
> >>>>>it
> >>>>>an
> >>>>>offset in the private space of Unicode.  This produces the correct
> >>>>>visual
> >>>>>result in the XHTML, but not a full semantic representation.  If
> >>>>>someone's
> >>>>>interested, they could get the semantics right too by
> >>>>>pattern-matching
> >>>>>the
> >>>>>glyph against an appropriate Unicode font.
> >>>>>
> >>>>>W.r.t. the flow of text, there have been other threads on this
> >>>>>topic,
> >>>>>but
> >>>>>pdftohtml does make some attempt, and I believe it's possible to
> >>>>>do
> >>>>>this
> >>>>>to a high degree of accuracy, maybe >99% -- that said, noone has
> >>>>>done
> >>>>>it
> >>>>>yet, so either it's harder than I think, or no-one has cared
> >>>>>enough to
> >>>>>really try (and I still fall into that camp.)
> >>>>>
> >>>>>Best, --josh
> >>>>>
> >>>>>_______________________________________________
> >>>>>poppler mailing list
> >>>>>poppler at lists.freedesktop.org
> >>>>>http://lists.freedesktop.org/mailman/listinfo/poppler
> >>>
> >>>_______________________________________________
> >>>poppler mailing list
> >>>poppler at lists.freedesktop.org
> >>>http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list