[Poppler-bugs] [Bug 39385] pdftohtml: add image and font extraction

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Aug 25 16:34:25 PDT 2011


https://bugs.freedesktop.org/show_bug.cgi?id=39385

Joshua Richardson <joshuarbox-junk1 at yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49314|0                           |1
        is obsolete|                            |
  Attachment #49682|0                           |1
        is obsolete|                            |
  Attachment #50288|0                           |1
        is obsolete|                            |

--- Comment #5 from Joshua Richardson <joshuarbox-junk1 at yahoo.com> 2011-08-25 16:34:21 PDT ---
Created an attachment (id=50579)
 View: https://bugs.freedesktop.org/attachment.cgi?id=50579
 Review: https://bugs.freedesktop.org/review?bug=39385&attachment=50579

New patches, per Albert's request

Some comments:
 * Please attach only one set of consecutive patches, having two sets of
patches that have "conflicting" order (e.g. there is a patch with number 10 in
both tars) is difficult to follow

>> Done

* Please rebase the patches against master since the first set of patches
include parts of the text rotation feature and other patches that are already
commited to master

>> Done.  When I run "git format-patch origin", I still get some patches that were in those earlier bugs.  I think it's because you "cleaned up" my earlier patches when committing, so it shouldn't be a problem, and I just removed them.

 * Please squash commits 0007-created-SplashOutputDevHtmlImages-class.patch,
0008-Fixed-spacing-line-length-issues.patch and
0009-Moved-SplashOutputDevHtmlImages-into-the-utils-direc.patch since there is
no need for us to see you moved the code around

>> Done.  Heads-up that automatic rebase worked flawlessly, with no merge conflicts.  When I did a subsequent interactive rebase, git got totally confused, and I spent an entire day resolving merge conflicts, and still arrived at a slightly different result, which I went back and manually corrected.  But, if I hadn't done the automatic rebase first, there's a good chance I would have never found out about those errors.  Do you have any pointers on how to make this smoother?

 * Please remove the .gitignore and README.contributors and similar changes
from 0009, they have nothing to do with this feature, we might or might not
want them but cramming everything into a single bugreport makes it almost
impossible to review in a timely fashion

>> Done.

 * 0010-Add-background-color-to-main-div-so-it-doesn-t-rely-.patch Seems like a
non related feature and i'd prefer it to be sent separately, and giving the
user the option to set the background color (i can easily imagine situations
where the background color is not white)

>> 0010 is integral to the image extraction feature, because without it, parts of the page without graphics on them will not have a background color set, and may default to gray on some browsers.  This change ensures that the appearance of the document is the same as before image-extraction was implemented, and that the document will appear as it does in any PDF reader, since AFAIK, they all assume a white background.  If a colored background is added to the PDF, it will still be extracted and appear correctly in the HTML.  Adding an option to override the background color in the HTML might be interesting for someone, but it holds no allure for me, and I don't know who would use it.  It would involve making changes to the guts of Splash (so that the background of areas that are painted will match the broader background.)

 * Please do not break the encoding of the files, e.g.
-// Copyright (C) 2010 Christian Feuersänger <cfeuersaenger at googlemail.com>
+// Copyright (C) 2010 Christian Feuersï¿œnger <cfeuersaenger at googlemail.com>
is not good

>> Yes, sorry.  Modern tools assume UTF-8 -- not sure which one broke it.  Unfortunately that character is encoded like the leading byte of a three-byte UTF-8 character.  We should probably have a UTF-8 convention.

* Please also squash unneded commits like
0015-Modified-Splash.cc-to-store-coordinates-of-most-rece.patch that you seem
to revert in 0019-Images-now-spliced-out-of-splashed-and-output-to-fil.patch it
is not fun to review a patch just to discover the code just disappears in the
next commit

>> Done.

 * 0039-Speed-up-the-extraction-of-images-by-40.patch includes adding a debug
option to GlobalParams, that is unwanted and has nothing to do with the
definition of the patch "Subject: [PATCH 39/52] Speed up the extraction of
images by 40%"I have removed it.

>> Done.  I hope you may reconsider, because I think that having a debug switch is a useful diagnostic tool, and it common for a complex system.  It was useful to me in diagnosing the speedup.

 * 0043-Clean-up-debug-output.-Remove-a-few-compiler-warning.patch contains a
fetch-user-archives.pl that is unwanted and again has nothing to do with the
definition of the patch "[PATCH 43/52] Clean up debug output.  Remove a few
compiler warnings."

>> Removed it.

Please fix these issues and i'll have a look once you provide an new set of
patches.

>> Thanks Albert!!!

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list