[poppler] Accessibility of PDF documents (corrected patch attached)

Albert Astals Cid aacid at kde.org
Sun Jul 25 08:57:42 PDT 2010


A Dimarts, 6 de juliol de 2010, leena chourey va escriure:
> Dear Albert,

Hi

> Thanks for your response.
> 
> As discussed in the last mail, we have modified the patch so that:
> 
>    - There is no behavioural change in pdftohtml -c <filename> means it
>    produces exactly the same output it did before.
>    - Defined new option as pdftohtml -s <filename> to generate a single
> html file corresponding to a pdf file.
> 
> Please check and give your feedback if any further change is required.

You are using a variable you deleted (tmp) in this chunk of code

***********************
       delete tmp;
 
-      fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page %d</TITLE>\n\n",
-             DOCTYPE, page);
+      if (!singleHtml)
+          fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page %d</TITLE>\n\n", 
DOCTYPE, page);
+      else
+          fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE> %s</TITLE>\n\n", 
DOCTYPE, tmp->getCString());////file name
***********************

I'm also concerned about you adding various <HTML> to the same .html page, my 
limited HTML knowledge says you can only have one of those.

Also it would be necessary that you update the pdftohtml.1 file (the man page) 
adding the new option.

Albert

> 
> With best regard
> Leena C
> 
> On Wed, Jun 23, 2010 at 1:19 AM, Albert Astals Cid <aacid at kde.org> wrote:
> > A Dimarts, 22 de juny de 2010, leena chourey va escriure:
> > > Dear Albert,
> > > 
> > > Thanks for giving detail comment to patch.
> > 
> > > Please check updates given inline:
> > Please do not forget to CC the poppler mailing list.
> > 
> > > On Thu, Jun 17, 2010 at 4:14 AM, Albert Astals Cid <aacid at kde.org>
> > 
> > wrote:
> > > > A Dimecres, 16 de juny de 2010, omkar va escriure:
> > > > > Dear Albert,
> > > > > 
> > > > > Please find the corrected patch for "accessibility of pdf document
> > > > > " and give your feedback.
> > > > 
> > > > Hi, some comments:
> > > >  * The comments like
> > > >  // One more parameter(int j) is added in the getCSStyle function by
> > 
> > CDAC
> > 
> > > > developer Team
> > > > 
> > > >   need to be removed, if each line had near it who coded it, the code
> > > >   will
> > > > 
> > > > be
> > > > twice as big and much more unreadable
> > > 
> > > Done, deleted all unwanted comments
> > > 
> > > >   * The spacing of your patches could be better, that is
> > > > 
> > > > GooString* HtmlFontAccu::getCSStyle(int i, GooString* content ,int
> > > > j){ should be
> > > > +GooString* HtmlFontAccu::getCSStyle(int i, GooString* content, int
> > > > j){ but that's nothing huge, i can fix it
> > > 
> > > Updated accordingly.
> > > 
> > > >   * You are leaking (i.e. not deleting) jStr in both
> > > > 
> > > > HtmlFontAccu::getCSStyle
> > > > and HtmlFontAccu::CSStyle
> > > 
> > > Deleted jStr
> > > 
> > > >  * I see that the new HtmlPage::complexHtml and the old
> > > > 
> > > > HtmlPage::dumpComplex
> > > > are very simple, i if you reused the code instead of copying it
> > > > 
> > > >  * This introduces a behavioural change that is unaccetable, i
> > 
> > understand
> > 
> > > > you
> > > > want pdftohtml to produce a different (in your opinion better)
> > > > output, for that you'll have to introduce a new comandline option to
> > > > pdftohtml (something
> > > > like --singlehtml) or something like that
> > > 
> > > For last 2 point we want some clarification.
> > > As you said behavioural change is unacceptable and also suggested to
> > > introduce a new command line option to generate single html. So if we
> > > do
> > 
> > as
> > 
> > > following, will it be acceptable?
> > > 
> > >    - *Existing is:*
> > >    Command line option: pdftohtml -c  <filename>
> > >    
> > >    Function called:
> > >                       dumpComplex
> > >    
> > >    ()
> > >    {
> > >    
> > >        Read from input file
> > >        Write into file to Generates pagewise html format
> > >    
> > >    }
> > >    
> > >    
> > >    - *Proposed changes:*
> > >    New Command line option : pdftohtml -s <filename>         //Checked,
> > >    nothing  is already defined for -s            (pdftohtml -c
> >  
> >  <filename>
> >  
> > > will exists as it is)
> > > 
> > >    - Function called:
> > >                          dumpSingle()  //new function similar to
> > > 
> > > dumpComplex {
> > > 
> > >        Read from input file
> > >        Write into file to append single html format
> > >    
> > >    }
> > >    
> > >    - A function to “Read from input file” can be defined and call it in
> > > 
> > > both dumpComplex() and dumpSingle(), So that code duplication can be
> > > removed (for second last point of your mail).
> > > 
> > >    - And with -s option (for --single Html) behavioural change will be
> > >    defined separately. (-c will not be affected)
> > 
> > To be clear, pdftohtml -c should produce exactly the same output it did
> > before
> > your patch, pdftohtml -s you can output your version.
> > 
> > So yes, i think i kind of agree with your proposal.
> > 
> > Albert
> > 
> > > For your opinion
> > > 
> > > With Regards
> > > Leena C & Onkar P
> > > (for CDAC Accessibility Team)


More information about the poppler mailing list