[poppler] For Accessibility of pdf document: changes required in pdftohtml complex output

leena chourey leenagour at gmail.com
Wed Jun 9 03:05:03 PDT 2010

Dear poppler developers,

I am new to this list, and working on Gnome accessibility.  To read pdf
document, visually impaired person uses screen reader, but very less support
is provided by the opensource communities. We are working in the same line
and trying to make pdf document accessible using screen reader Orca. We have
analysed various options in this reagard, that includes exploration of
evince document viewer, orca accessibility features for pdf document and
more. As a first step, we have decided to use pdftohtml utility to provide
pdf content in html format, so that orca  can the pdf content available in
html format.
Observations while exploring poppler-0.12.4 (utils):

   - Poppler-utils has a pdftohtml facility to generate html file for pdf
   document, Similarly with -c option it can generate the formatted html file
   for corresponding pdf. -c generates file_ind.html, file_outline.html,
   file.html and 1 .html & .png for each page of pdf.  (please confirm)
   - While working on this file.html in firefox, we have observed that this
   links/contains only index file (file_ind.html) and file1.html (first page
   html) file. To shift to another page, I have to click on that page from
   index, which opens the corresponding page in new tab of firefox. So for
   every page one new tab will open. (please confirm)
   - I don't find way to return to previous page or jump to some particular

For a person with perfect vision, no issues in reading pdfcontent in complex
html format. But to ensure that the complex html format is as much as
similar to pdfdocument displayed using any document viewer and to make html
format more accessible and usable by a blind person, we found that following
issues need to be resolved.  As mentioned above for accessibility, now if a
blind person reads file.html then following are some issues :

   1. Because file.html uses frameset/frame so orca is not able to shift
   control from 1 frame to another. it shifted after reading full content of
   one frame (with tab). Normal person can shift from frame to frame with the
   help of mouse, but with tab it is not possible to skip no. of tabs.
   2. If a blind person want to read/shift to another page , it opens in new
   tab, it will be confusing for her/him to handle no of tabs (1 for each
   3. Some more issues are there related with content format can be
   discussed in further communication

To resolve first 2 issues, it is required to have changes in pdftohtml -c
utility, that will make html document more accessible and usable to a
visually impaired person.

With regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100609/fe428d39/attachment-0001.htm>

More information about the poppler mailing list