[poppler] poppler util pdftohtml

Leonard Rosenthol lrosenth at adobe.com
Fri Sep 23 04:00:31 PDT 2011

Right - but in the case of FontSquirrel, they are giving the use clear and
explicit guidance on the issue and FORCING THEM to make a decision.

Pdftohtml does no such thing, currently.

However, if you made the font extraction feature OPTIONAL and in order to
use it, the user had to specify a specific command line option (ala the
checkbox in Squirrel), then I think you have establish the removal of YOUR
legal concerns and put them squarely on the user who made the choice.


On 9/22/11 11:23 PM, "Josh Richardson" <jric at chegg.com> wrote:

>At the end of the day it's just a tool....  What if there are more
>restrictive flags in the font, but user has license for the font?  Then he
>cannot use the tool?  It might be impractical to get a new version of the
>font from the originator that has the bits you're looking for -- probably
>just create more confusion.  See how Font Squirrel handles this:
>On 9/22/11 8:05 PM, "suzuki toshiya" <mpsuzuki at hiroshima-u.ac.jp> wrote:
>>Is it acceptable that the font extraction from PDF is enabled when the
>>embedded font includes OS/2 table and its fsType permits the permanent
>>installation onto remote system (fsType == 0x0000)?
>>Although the request to developers of the software generating PDF (like
>>cairo, ghostscript etc) for the embedding with OS/2 table would be
>>to make the idea pragmatic, I think such restriction prevents the
>>caused by the conflicts of the understanding of font permissions.
>>Josh Richardson wrote:
>>> The fonts that are embedded in a PDF may come from any source, and be
>>> completely restriction-free.  It's really up to the user of the
>>> to decide.  Note that there are many many many other open source
>>> that extract fonts from PDFs.
>>> --josh
>>> On 9/22/11 6:04 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
>>>> Boy, your lawyer needs to read up on IP law :).
>>>> Since you do NOT have a license for the font data contained in the
>>>> your software has NO RIGHTS to use that information for anything other
>>>> than rendering the glyphs in the PDF.  You certainly have NO rights to
>>>> convert the format - in fact, doing so is a clear and distinct
>>>> of the font licenses.
>>>> As such, if your patches to pdf2html extract the font data for use in
>>>> HTML - I STRONGLY recommend that the code NOT be accepted into the
>>>> repository.
>>>> Leonard
>>>> On 9/22/11 6:40 PM, "Josh Richardson" <jric at chegg.com> wrote:
>>>>> I'm not a lawyer, but I did check with one.  I don't think software
>>>>> violate your IP/licenses, at least as long as that software doesn't
>>>>> contain unauthorized copyrighted material -- which pdftohtml does not
>>>>> AFAIK -- I certainly didn't add any to it.
>>>>> Best, --josh
>>>>> On 9/22/11 3:08 PM, "Leonard Rosenthol" <lrosenth at adobe.com> wrote:
>>>>>> I can't recall what you said about this in the past, but since I was
>>>>>> just
>>>>>> dealing with it today.
>>>>>> What do you do about embedded fonts?
>>>>>> As my company (Adobe) sells/creates fonts, I want to make sure that
>>>>>> pdftohtml won't be violating our IP/licenses.
>>>>>> Thanks in advance,
>>>>>> Leonard
>>>>>> On 9/22/11 5:51 PM, "Josh Richardson" <jric at chegg.com> wrote:
>>>>>>> On 9/22/11 12:20 PM, "Jonathan Kew" <jfkthame at googlemail.com>
>>>>>>>> More generally, it is not possible to recreate useful XHTML (or
>>>>>>>> similar)
>>>>>>>> documents from arbitrary PDF files with anything like 100%
>>>>>>>> reliability,
>>>>>>>> because many PDF files do not contain adequate information to
>>>>>>>> accurately
>>>>>>>> map the rendered glyphs back to correct Unicode text, or to
>>>>>>>> reconstruct the proper flow of text. Constructs such as ActualText
>>>>>>>> help, but are often lacking from real-world PDF documents.
>>>>>>> W.r.t. rendering glyphs, we get around the problem of missing
>>>>>>> mappings by taking any glyph without a unicode mapping and
>>>>>>>assigning it
>>>>>>> an
>>>>>>> offset in the private space of Unicode.  This produces the correct
>>>>>>> visual
>>>>>>> result in the XHTML, but not a full semantic representation.  If
>>>>>>> someone's
>>>>>>> interested, they could get the semantics right too by
>>>>>>> the
>>>>>>> glyph against an appropriate Unicode font.
>>>>>>> W.r.t. the flow of text, there have been other threads on this
>>>>>>> but
>>>>>>> pdftohtml does make some attempt, and I believe it's possible to do
>>>>>>> this
>>>>>>> to a high degree of accuracy, maybe >99% -- that said, noone has
>>>>>>> it
>>>>>>> yet, so either it's harder than I think, or no-one has cared enough
>>>>>>> really try (and I still fall into that camp.)
>>>>>>> Best, --josh
>>>>>>> _______________________________________________
>>>>>>> poppler mailing list
>>>>>>> poppler at lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>> _______________________________________________
>>>>> poppler mailing list
>>>>> poppler at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler

More information about the poppler mailing list