[poppler] Accessibility of PDF documents (corrected patch attached)

leena chourey leenagour at gmail.com
Tue Sep 7 05:00:05 PDT 2010


Thanks Albert to review this patch again.
Comments are inline:

On Sun, Jul 25, 2010 at 9:27 PM, Albert Astals Cid <aacid at kde.org> wrote:

> A Dimarts, 6 de juliol de 2010, leena chourey va escriure:
> > Dear Albert,
>
> Hi
>
> > Thanks for your response.
> >
> > As discussed in the last mail, we have modified the patch so that:
> >
> >    - There is no behavioural change in pdftohtml -c <filename> means it
> >    produces exactly the same output it did before.
> >    - Defined new option as pdftohtml -s <filename> to generate a single
> > html file corresponding to a pdf file.
> >
> > Please check and give your feedback if any further change is required.
>
> You are using a variable you deleted (tmp) in this chunk of code
>
> ***********************
>       delete tmp;


This (delete tmp) was from the original development code only, we didn't
made changes regarding tmp. I have checked, it is in the recent development
version also.

>  -      fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page %d</TITLE>\n\n",
> -             DOCTYPE, page);
> +      if (!singleHtml)
> +          fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page
> %d</TITLE>\n\n",
> DOCTYPE, page);
> +      else
> +          fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE> %s</TITLE>\n\n",
> DOCTYPE, tmp->getCString());////file name
> ***********************
>
> I'm also concerned about you adding various <HTML> to the same .html page,
> my
> limited HTML knowledge says you can only have one of those.


For the above: I would like to say that every page has different heading
details as well as title.  This should not be changed for pages.


>
> Also it would be necessary that you update the pdftohtml.1 file (the man
> page)
> adding the new option.

pdftohtml.1 is updated.

Please find the latest patch for "pdftohtml -s <file.pdf> " and give
feedback.


>
> Albert
>
> >
> > With best regard
> > Leena C
> >
> > On Wed, Jun 23, 2010 at 1:19 AM, Albert Astals Cid <aacid at kde.org>
> wrote:
> > > A Dimarts, 22 de juny de 2010, leena chourey va escriure:
> > > > Dear Albert,
> > > >
> > > > Thanks for giving detail comment to patch.
> > >
> > > > Please check updates given inline:
> > > Please do not forget to CC the poppler mailing list.
> > >
> > > > On Thu, Jun 17, 2010 at 4:14 AM, Albert Astals Cid <aacid at kde.org>
> > >
> > > wrote:
> > > > > A Dimecres, 16 de juny de 2010, omkar va escriure:
> > > > > > Dear Albert,
> > > > > >
> > > > > > Please find the corrected patch for "accessibility of pdf
> document
> > > > > > " and give your feedback.
> > > > >
> > > > > Hi, some comments:
> > > > >  * The comments like
> > > > >  // One more parameter(int j) is added in the getCSStyle function
> by
> > >
> > > CDAC
> > >
> > > > > developer Team
> > > > >
> > > > >   need to be removed, if each line had near it who coded it, the
> code
> > > > >   will
> > > > >
> > > > > be
> > > > > twice as big and much more unreadable
> > > >
> > > > Done, deleted all unwanted comments
> > > >
> > > > >   * The spacing of your patches could be better, that is
> > > > >
> > > > > GooString* HtmlFontAccu::getCSStyle(int i, GooString* content ,int
> > > > > j){ should be
> > > > > +GooString* HtmlFontAccu::getCSStyle(int i, GooString* content, int
> > > > > j){ but that's nothing huge, i can fix it
> > > >
> > > > Updated accordingly.
> > > >
> > > > >   * You are leaking (i.e. not deleting) jStr in both
> > > > >
> > > > > HtmlFontAccu::getCSStyle
> > > > > and HtmlFontAccu::CSStyle
> > > >
> > > > Deleted jStr
> > > >
> > > > >  * I see that the new HtmlPage::complexHtml and the old
> > > > >
> > > > > HtmlPage::dumpComplex
> > > > > are very simple, i if you reused the code instead of copying it
> > > > >
> > > > >  * This introduces a behavioural change that is unaccetable, i
> > >
> > > understand
> > >
> > > > > you
> > > > > want pdftohtml to produce a different (in your opinion better)
> > > > > output, for that you'll have to introduce a new comandline option
> to
> > > > > pdftohtml (something
> > > > > like --singlehtml) or something like that
> > > >
> > > > For last 2 point we want some clarification.
> > > > As you said behavioural change is unacceptable and also suggested to
> > > > introduce a new command line option to generate single html. So if we
> > > > do
> > >
> > > as
> > >
> > > > following, will it be acceptable?
> > > >
> > > >    - *Existing is:*
> > > >    Command line option: pdftohtml -c  <filename>
> > > >
> > > >    Function called:
> > > >                       dumpComplex
> > > >
> > > >    ()
> > > >    {
> > > >
> > > >        Read from input file
> > > >        Write into file to Generates pagewise html format
> > > >
> > > >    }
> > > >
> > > >
> > > >    - *Proposed changes:*
> > > >    New Command line option : pdftohtml -s <filename>
> //Checked,
> > > >    nothing  is already defined for -s            (pdftohtml -c
> > >
> > >  <filename>
> > >
> > > > will exists as it is)
> > > >
> > > >    - Function called:
> > > >                          dumpSingle()  //new function similar to
> > > >
> > > > dumpComplex {
> > > >
> > > >        Read from input file
> > > >        Write into file to append single html format
> > > >
> > > >    }
> > > >
> > > >    - A function to “Read from input file” can be defined and call it
> in
> > > >
> > > > both dumpComplex() and dumpSingle(), So that code duplication can be
> > > > removed (for second last point of your mail).
> > > >
> > > >    - And with -s option (for --single Html) behavioural change will
> be
> > > >    defined separately. (-c will not be affected)
> > >
> > > To be clear, pdftohtml -c should produce exactly the same output it did
> > > before
> > > your patch, pdftohtml -s you can output your version.
> > >
> > > So yes, i think i kind of agree with your proposal.
> > >
> > > Albert
> > >
> > > > For your opinion
> > > >
> > > > With Regards
> > > > Leena C & Onkar P
> > > > (for CDAC Accessibility Team)
>
With best regards
Leena C
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100907/25cdf0d8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-accesspdf_1Sept.patch
Type: text/x-diff
Size: 9472 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100907/25cdf0d8/attachment-0001.patch>


More information about the poppler mailing list