[Poppler-bugs] [Bug 107318] New: Emit more font information when pdftohtml is run with -xml

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Jul 21 03:49:03 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=107318

            Bug ID: 107318
           Summary: Emit more font information when pdftohtml is run with
                    -xml
           Product: poppler
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: utils
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: ulatekh at yahoo.com

Created attachment 140750
  --> https://bugs.freedesktop.org/attachment.cgi?id=140750&action=edit
Patch to add functionality

I'm about to use pdftohtml to extract information from PDFs and organize the
results into a database, so I had a chance to dig through the code.

The patch merely emits more information in the <fontspec> elements when
pdftohtml is run with -xml. The PDFs I'm trying to analyze appear to be pretty
consistent with their font usage, to the point where I can use them to infer
the text's meaning. But I needed more information in the <fontspec> to do that,
and this patch does that for me.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20180721/aa98db23/attachment.html>


More information about the Poppler-bugs mailing list