[poppler] poppler util pdftohtml

Albert Astals Cid aacid at kde.org
Fri Sep 23 07:01:06 PDT 2011


A Divendres, 23 de setembre de 2011, Leonard Rosenthol vàreu escriure:
> Right - but in the case of FontSquirrel, they are giving the use clear and
> explicit guidance on the issue and FORCING THEM to make a decision.
> 
> Pdftohtml does no such thing, currently.
> 
> However, if you made the font extraction feature OPTIONAL and in order to
> use it, the user had to specify a specific command line option (ala the
> checkbox in Squirrel), then I think you have establish the removal of YOUR
> legal concerns and put them squarely on the user who made the choice.

This is ridiculous, are you saying gun manufacturers have legal concerns over 
people killed by a gun?

Albert

> 
> Leonard
> 
> On 9/22/11 11:23 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >At the end of the day it's just a tool....  What if there are more
> >restrictive flags in the font, but user has license for the font?  Then he
> >cannot use the tool?  It might be impractical to get a new version of the
> >font from the originator that has the bits you're looking for -- probably
> >just create more confusion.  See how Font Squirrel handles this:
> >http://www.fontsquirrel.com/fontface/generator
> >
> >--josh
> >
> >On 9/22/11 8:05 PM, "suzuki toshiya" <mpsuzuki at hiroshima-u.ac.jp> wrote:
> >>Is it acceptable that the font extraction from PDF is enabled when the
> >>embedded font includes OS/2 table and its fsType permits the permanent
> >>installation onto remote system (fsType == 0x0000)?
> >>
> >>Although the request to developers of the software generating PDF (like
> >>cairo, ghostscript etc) for the embedding with OS/2 table would be
> >>important
> >>to make the idea pragmatic, I think such restriction prevents the
> >>troubles
> >>caused by the conflicts of the understanding of font permissions.
> >>
> >>Regards,
> >>mpsuzuki
> >>
> >>Josh Richardson wrote:
> >>> The fonts that are embedded in a PDF may come from any source, and
> >>> be
> >>> completely restriction-free.  It's really up to the user of the
> >>>
> >>>software
> >>>
> >>> to decide.  Note that there are many many many other open source
> >>>
> >>>programs
> >>>
> >>> that extract fonts from PDFs.
> >>> 
> >>> --josh
> >>> 
> >>> On 9/22/11 6:04 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
> >>>> Boy, your lawyer needs to read up on IP law :).
> >>>> 
> >>>> Since you do NOT have a license for the font data contained in the
> >>>>
> >>>>PDF,
> >>>>
> >>>> your software has NO RIGHTS to use that information for anything
> >>>> other
> >>>> than rendering the glyphs in the PDF.  You certainly have NO
> >>>> rights to
> >>>> convert the format - in fact, doing so is a clear and distinct
> >>>>
> >>>>violation
> >>>>
> >>>> of the font licenses.
> >>>> 
> >>>> As such, if your patches to pdf2html extract the font data for use
> >>>> in
> >>>>
> >>>>the
> >>>>
> >>>> HTML - I STRONGLY recommend that the code NOT be accepted into the
> >>>>
> >>>>master
> >>>>
> >>>> repository.
> >>>> 
> >>>> Leonard
> >>>> 
> >>>> On 9/22/11 6:40 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >>>>> I'm not a lawyer, but I did check with one.  I don't think
> >>>>> software
> >>>>>
> >>>>>can
> >>>>>
> >>>>> violate your IP/licenses, at least as long as that software
> >>>>> doesn't
> >>>>> contain unauthorized copyrighted material -- which pdftohtml
> >>>>> does not
> >>>>> AFAIK -- I certainly didn't add any to it.
> >>>>> 
> >>>>> Best, --josh
> >>>>> 
> >>>>> On 9/22/11 3:08 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
> >>>>>> I can't recall what you said about this in the past, but since
> >>>>>> I was
> >>>>>> just
> >>>>>> dealing with it today.
> >>>>>> 
> >>>>>> What do you do about embedded fonts?
> >>>>>> 
> >>>>>> As my company (Adobe) sells/creates fonts, I want to make sure
> >>>>>> that
> >>>>>> pdftohtml won't be violating our IP/licenses.
> >>>>>> 
> >>>>>> Thanks in advance,
> >>>>>> Leonard
> >>>>>> 
> >>>>>> On 9/22/11 5:51 PM, "Josh Richardson" <jric at chegg.com> wrote:
> >>>>>>> On 9/22/11 12:20 PM, "Jonathan Kew"
> >>>>>>> <jfkthame at googlemail.com>
> >>>>>>>
> >>>>>>>wrote:
> >>>>>>>> More generally, it is not possible to recreate useful
> >>>>>>>> XHTML (or
> >>>>>>>> similar)
> >>>>>>>> documents from arbitrary PDF files with anything like 100%
> >>>>>>>> reliability,
> >>>>>>>> because many PDF files do not contain adequate information
> >>>>>>>> to
> >>>>>>>> accurately
> >>>>>>>> map the rendered glyphs back to correct Unicode text, or
> >>>>>>>> to
> >>>>>>>>
> >>>>>>>>reliably
> >>>>>>>>
> >>>>>>>> reconstruct the proper flow of text. Constructs such as
> >>>>>>>> ActualText
> >>>>>>>>
> >>>>>>>>may
> >>>>>>>>
> >>>>>>>> help, but are often lacking from real-world PDF documents.
> >>>>>>> 
> >>>>>>> W.r.t. rendering glyphs, we get around the problem of
> >>>>>>> missing
> >>>>>>>
> >>>>>>>unicode
> >>>>>>>
> >>>>>>> mappings by taking any glyph without a unicode mapping and
> >>>>>>>
> >>>>>>>assigning it
> >>>>>>>
> >>>>>>> an
> >>>>>>> offset in the private space of Unicode.  This produces the
> >>>>>>> correct
> >>>>>>> visual
> >>>>>>> result in the XHTML, but not a full semantic representation.
> >>>>>>>  If
> >>>>>>> someone's
> >>>>>>> interested, they could get the semantics right too by
> >>>>>>>
> >>>>>>>pattern-matching
> >>>>>>>
> >>>>>>> the
> >>>>>>> glyph against an appropriate Unicode font.
> >>>>>>> 
> >>>>>>> W.r.t. the flow of text, there have been other threads on
> >>>>>>> this
> >>>>>>>
> >>>>>>>topic,
> >>>>>>>
> >>>>>>> but
> >>>>>>> pdftohtml does make some attempt, and I believe it's
> >>>>>>> possible to do
> >>>>>>> this
> >>>>>>> to a high degree of accuracy, maybe >99% -- that said, noone
> >>>>>>> has
> >>>>>>>
> >>>>>>>done
> >>>>>>>
> >>>>>>> it
> >>>>>>> yet, so either it's harder than I think, or no-one has cared
> >>>>>>> enough
> >>>>>>>
> >>>>>>>to
> >>>>>>>
> >>>>>>> really try (and I still fall into that camp.)
> >>>>>>> 
> >>>>>>> Best, --josh
> >>>>>>> 
> >>>>>>> _______________________________________________
> >>>>>>> poppler mailing list
> >>>>>>> poppler at lists.freedesktop.org
> >>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> >>>>> 
> >>>>> _______________________________________________
> >>>>> poppler mailing list
> >>>>> poppler at lists.freedesktop.org
> >>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> >>> 
> >>> _______________________________________________
> >>> poppler mailing list
> >>> poppler at lists.freedesktop.org
> >>> http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list