[poppler] Accessibility of PDF documents (corrected patch attached)

leena chourey leenagour at gmail.com
Thu Sep 16 21:11:59 PDT 2010


Hello Albert ,

Thanks for committing the patch. Hope fully this will help to improve the
accessibility and usability of pdftohtml option.


With regards
Leena Chourey
For Accessibility Team
CDAC Mumbai


On Fri, Sep 17, 2010 at 3:54 AM, Albert Astals Cid <aacid at kde.org> wrote:

> A Dijous, 16 de setembre de 2010, vàreu escriure:
> > Dear Poppler Developers,
> >
> > Thanks for response.
> >
> > For copyright information:
> >
> > *Copyright OSSD CDAC Mumbai
> >
> > Changed 2010 by  Leena Chourey (leenac at cdacmumbai.in) & Onkar Potdar (
> > onkar at cdacmumbai.in)
> > *
>
> I've commited your code, it'll be in poppler >= 0.15.1
>
> Albert
>
> >
> > With regards
> > Leena Chourey
> > For Accessibility Team
> > CDAC Mumbai
> >
> > On Thu, Sep 16, 2010 at 12:55 AM, Albert Astals Cid <aacid at kde.org>
> wrote:
> > > A Dimarts, 7 de setembre de 2010, leena chourey va escriure:
> > > > Thanks Albert to review this patch again.
> > >
> > > The patch seems ok.
> > >
> > > Before commiting i need the names and e-mail addresses of the people
> the
> > > copyright of this patch belongs to.
> > >
> > > Albert
> > >
> > > > Comments are inline:
> > > >
> > > > On Sun, Jul 25, 2010 at 9:27 PM, Albert Astals Cid <aacid at kde.org>
> > >
> > > wrote:
> > > > > A Dimarts, 6 de juliol de 2010, leena chourey va escriure:
> > > > > > Dear Albert,
> > > > >
> > > > > Hi
> > > > >
> > > > > > Thanks for your response.
> > > > > >
> > > > > > As discussed in the last mail, we have modified the patch so
> that:
> > > > > >    - There is no behavioural change in pdftohtml -c <filename>
> > > > > >    means
> > >
> > > it
> > >
> > > > > >    produces exactly the same output it did before.
> > > > > >    - Defined new option as pdftohtml -s <filename> to generate a
> > >
> > > single
> > >
> > > > > > html file corresponding to a pdf file.
> > > > > >
> > > > > > Please check and give your feedback if any further change is
> > >
> > > required.
> > >
> > > > > You are using a variable you deleted (tmp) in this chunk of code
> > > > >
> > > > > ***********************
> > > > >
> > > > >       delete tmp;
> > > >
> > > > This (delete tmp) was from the original development code only, we
> > > > didn't made changes regarding tmp. I have checked, it is in the
> recent
> > >
> > > development
> > >
> > > > version also.
> > > >
> > > > >  -      fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page
> > > > >  %d</TITLE>\n\n",
> > > > >
> > > > > -             DOCTYPE, page);
> > > > > +      if (!singleHtml)
> > > > > +          fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page
> > > > > %d</TITLE>\n\n",
> > > > > DOCTYPE, page);
> > > > > +      else
> > > > > +          fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>
> > >
> > > %s</TITLE>\n\n",
> > >
> > > > > DOCTYPE, tmp->getCString());////file name
> > > > > ***********************
> > > > >
> > > > > I'm also concerned about you adding various <HTML> to the same
> .html
> > > > > page, my
> > > > > limited HTML knowledge says you can only have one of those.
> > > >
> > > > For the above: I would like to say that every page has different
> > > > heading details as well as title.  This should not be changed for
> > > > pages.
> > > >
> > > > > Also it would be necessary that you update the pdftohtml.1 file
> (the
> > >
> > > man
> > >
> > > > > page)
> > > > > adding the new option.
> > > >
> > > > pdftohtml.1 is updated.
> > > >
> > > > Please find the latest patch for "pdftohtml -s <file.pdf> " and give
> > > > feedback.
> > > >
> > > > > Albert
> > > > >
> > > > > > With best regard
> > > > > > Leena C
> > > > > >
> > > > > > On Wed, Jun 23, 2010 at 1:19 AM, Albert Astals Cid <
> aacid at kde.org>
> > > > >
> > > > > wrote:
> > > > > > > A Dimarts, 22 de juny de 2010, leena chourey va escriure:
> > > > > > > > Dear Albert,
> > > > > > > >
> > > > > > > > Thanks for giving detail comment to patch.
> > > > > > >
> > > > > > > > Please check updates given inline:
> > > > > > > Please do not forget to CC the poppler mailing list.
> > > > > > >
> > > > > > > > On Thu, Jun 17, 2010 at 4:14 AM, Albert Astals Cid <
> > >
> > > aacid at kde.org>
> > >
> > > > > > > wrote:
> > > > > > > > > A Dimecres, 16 de juny de 2010, omkar va escriure:
> > > > > > > > > > Dear Albert,
> > > > > > > > > >
> > > > > > > > > > Please find the corrected patch for "accessibility of pdf
> > > > >
> > > > > document
> > > > >
> > > > > > > > > > " and give your feedback.
> > > > > > > > >
> > > > > > > > > Hi, some comments:
> > > > > > > > >  * The comments like
> > > > > > > > >  // One more parameter(int j) is added in the getCSStyle
> > >
> > > function
> > >
> > > > > by
> > > > >
> > > > > > > CDAC
> > > > > > >
> > > > > > > > > developer Team
> > > > > > > > >
> > > > > > > > >   need to be removed, if each line had near it who coded
> it,
> > >
> > > the
> > >
> > > > > code
> > > > >
> > > > > > > > >   will
> > > > > > > > >
> > > > > > > > > be
> > > > > > > > > twice as big and much more unreadable
> > > > > > > >
> > > > > > > > Done, deleted all unwanted comments
> > > > > > > >
> > > > > > > > >   * The spacing of your patches could be better, that is
> > > > > > > > >
> > > > > > > > > GooString* HtmlFontAccu::getCSStyle(int i, GooString*
> content
> > > > > > > > > ,int j){ should be
> > > > > > > > > +GooString* HtmlFontAccu::getCSStyle(int i, GooString*
> > > > > > > > > content, int j){ but that's nothing huge, i can fix it
> > > > > > > >
> > > > > > > > Updated accordingly.
> > > > > > > >
> > > > > > > > >   * You are leaking (i.e. not deleting) jStr in both
> > > > > > > > >
> > > > > > > > > HtmlFontAccu::getCSStyle
> > > > > > > > > and HtmlFontAccu::CSStyle
> > > > > > > >
> > > > > > > > Deleted jStr
> > > > > > > >
> > > > > > > > >  * I see that the new HtmlPage::complexHtml and the old
> > > > > > > > >
> > > > > > > > > HtmlPage::dumpComplex
> > > > > > > > > are very simple, i if you reused the code instead of
> copying
> > > > > > > > > it
> > > > > > > > >
> > > > > > > > >  * This introduces a behavioural change that is
> unaccetable,
> > > > > > > > >  i
> > > > > > >
> > > > > > > understand
> > > > > > >
> > > > > > > > > you
> > > > > > > > > want pdftohtml to produce a different (in your opinion
> > > > > > > > > better) output, for that you'll have to introduce a new
> > > > > > > > > comandline
> > >
> > > option
> > >
> > > > > to
> > > > >
> > > > > > > > > pdftohtml (something
> > > > > > > > > like --singlehtml) or something like that
> > > > > > > >
> > > > > > > > For last 2 point we want some clarification.
> > > > > > > > As you said behavioural change is unacceptable and also
> > > > > > > > suggested to introduce a new command line option to generate
> > > > > > > > single html.
> > >
> > > So
> > >
> > > > > > > > if we do
> > > > > > >
> > > > > > > as
> > > > > > >
> > > > > > > > following, will it be acceptable?
> > > > > > > >
> > > > > > > >    - *Existing is:*
> > > > > > > >    Command line option: pdftohtml -c  <filename>
> > > > > > > >
> > > > > > > >    Function called:
> > > > > > > >                       dumpComplex
> > > > > > > >
> > > > > > > >    ()
> > > > > > > >    {
> > > > > > > >
> > > > > > > >        Read from input file
> > > > > > > >        Write into file to Generates pagewise html format
> > > > > > > >
> > > > > > > >    }
> > > > > > > >
> > > > > > > >
> > > > > > > >    - *Proposed changes:*
> > > > > > > >    New Command line option : pdftohtml -s <filename>
> > > > >
> > > > > //Checked,
> > > > >
> > > > > > > >    nothing  is already defined for -s            (pdftohtml
> -c
> > > > > > >
> > > > > > >  <filename>
> > > > > > >
> > > > > > > > will exists as it is)
> > > > > > > >
> > > > > > > >    - Function called:
> > > > > > > >                          dumpSingle()  //new function similar
> > > > > > > >                          to
> > > > > > > >
> > > > > > > > dumpComplex {
> > > > > > > >
> > > > > > > >        Read from input file
> > > > > > > >        Write into file to append single html format
> > > > > > > >
> > > > > > > >    }
> > > > > > > >
> > > > > > > >    - A function to “Read from input file” can be defined and
> > > > > > > >    call it
> > > > >
> > > > > in
> > > > >
> > > > > > > > both dumpComplex() and dumpSingle(), So that code duplication
> > > > > > > > can be removed (for second last point of your mail).
> > > > > > > >
> > > > > > > >    - And with -s option (for --single Html) behavioural
> change
> > >
> > > will
> > >
> > > > > be
> > > > >
> > > > > > > >    defined separately. (-c will not be affected)
> > > > > > >
> > > > > > > To be clear, pdftohtml -c should produce exactly the same
> output
> > > > > > > it did before
> > > > > > > your patch, pdftohtml -s you can output your version.
> > > > > > >
> > > > > > > So yes, i think i kind of agree with your proposal.
> > > > > > >
> > > > > > > Albert
> > > > > > >
> > > > > > > > For your opinion
> > > > > > > >
> > > > > > > > With Regards
> > > > > > > > Leena C & Onkar P
> > > > > > > > (for CDAC Accessibility Team)
> > > >
> > > > With best regards
> > > > Leena C
> > >
> > > _______________________________________________
> > > poppler mailing list
> > > poppler at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/poppler
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100917/88b25704/attachment-0001.html>


More information about the poppler mailing list