[Poppler-bugs] [Bug 47022] pdftohtml: control over word breaks

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Mar 11 07:11:10 PDT 2012


https://bugs.freedesktop.org/show_bug.cgi?id=47022

--- Comment #1 from Ihar Filipau <thephilips at gmail.com> 2012-03-11 07:11:10 PDT ---
Created attachment 58283
  --> https://bugs.freedesktop.org/attachment.cgi?id=58283
the patch, v1

Add a control over word break threshold (the best name I could think up).

1. Add a new global variable `double wordBreakThreshold` in the pdftohtml.cc
   Default value 10 percent
   Later converted to internal coefficient by dividing by 100.

2. Add new command line parameter: -wbt <fp>
   Value stored in the wordBreakThreshold variable.

3. After command line is parsed, covert the percentage into a coefficient.

4. HtmlOutputDev.cc, HtmlPage::addChar(): replace the hardcoded `0.1` with
   the variable.

5. HtmlOutputDev.cc, HtmlPage::coalesce(): replace the hardcoded `0.1` with
   the variable.

6. Document the parameter in the man page.

I was tempted to introduce a new bool function for the word break check, yet:

- the functionality is duplicated (as I have understood, the results of
word-breaking in addChar() are post-processed and largely overridden by the
::coalesce() method)

- there is a TODO in ::addChar() of which validity and applicability I'm not
sure.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list