Eric S. Raymond
Eric S. Raymond
esr at thyrsus.com
Fri Jun 7 09:53:51 PDT 2013
Michael Meeks <michael.meeks at suse.com>:
> I was curious about what you'd like to hack on here :-)
I wrote and maintain a tool called 'doclifter' that lifts manual pages
(and most other kinds of documents witten in troff-based markups) into
DocBook-XML. This is a useful tool for several reasons; one is that the
XML can be used to generate higher-quality HTML than you get from a
presentation-level troff to HTML translation. If all manual pages lifted
cleanly, generating a nice web view of all the world's documentation
would be easy.
Unfortunately, troff markup is such a badly structured tag soup that
automatic lifting doesn't always work. By dint of a bunch of compiler
technology and a couple hundred cliche-recognition rules, doclifter
does a pretty good job; on the 12K pages shipped with a stock Linux
distribution it lifts about 94% of the eligible targets cleanly
Most of the remaining 6% of troff pages contain markup that is
outright broken even in troff terms. Your pages, which had an
incorrect \fb where a \fB was needed, are good examples.
One of my longer-term projects is cleaning up the Linux/Unix manual-page
corpus so that remaining 4% gets fixed and becomes automatically liftable.
I've been working on this since 2002, and have shipped about 2000 patches
upstream to several hundred projects.
Recently I fixed up all the X man pages. Current statistics:
11923 100% Total pages in stock Ubuntu 13.04
917 7.69% Already made from XML-DocBook or Doxygen, not eligible.
10270 86.14% Clean lift from troff, no problems
721 6.02% Clean lift with a fix patch.
8 0.07% Internal error in doclifter
7 0.08% Incorrect (non-validating) XML generated.
You just got your patches. The LibreOffice pages now lift clean.
Very occasionally (once every year or two) I run a validation pass on
as much of the manual-page universe as I can easily get my hands on.
In the future, if your pages develop any problems due to careless
changes, I'll ship you another fix. Otherwise I have no specially
concentrated interest in LibreOffice, sorry. I think it's a good thing
that the suite exists, but I don't use it myself.
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
More information about the LibreOffice