Bugs in doclifting the Xorg man pages

Carlo Salinari csali at tiscali.it
Tue Jan 23 00:32:59 PST 2007

Alan Coopersmith wrote:
> For those who don't follow the bugzilla e-mail, but want to help
> in the great Xorg Doc XMLification, Eric Raymond has filed a
> number of bugs against man pages that give doclifter fits - checking
> in his fixes would bring us one step closer to moving the docs to XML:
> https://bugs.freedesktop.org/buglist.cgi?query_format=advanced&emailreporter1=1&emailtype1=substring&email1=esr%40thyrsus.com 

By the way, I reworked my script to convert sgml files to xml. I didn't 
post it earlier because I wanted to integrate the changes in the build 
scripts, but got stuck in the autotools. Kind of frustrating.

Also, it doesn't handle the Japanese files, since I'm not able to check 
the correctness of the transformation for this language.

So, for the record, here is the script :


# Utility script to convert xorg's DocBook articles from SGML to XML
# check if the specified input file exists, otherwise print usage message
if ! [ $sgml_file -a -e $sgml_file ]
then cat << USAGE

Utility script to convert xorg's DocBook articles from SGML to XML.

    $0 file.sgml
Produces file.xml in the same directory of file.sgml

exit 1

# quick check for programs we will use in the conversion
if ! [ "$(which perl)" -a "$(which sgml2xml)" -a "$(which tidy)" ]
then cat << REQUIRED

This script requires the following programs:

exit 2

WORKING_DIR=$(dirname $sgml_file)
basename=$(basename $sgml_file)
output_file=$(basename $sgml_file .sgml).xml

echo "### Processing: $basename"

# cut everything before the opening <article> tag
    perl -we '$found=0; while(<>) {if (/<article>/i) {$found=1}; print 
if $found}' $sgml_file |\
# re-insert old doctype declaration, but without the entity declaration
    sed '1i <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" >' |\
# hide entity references from the parsers: ' ' becomes '|||nbsp;'
    sed 's/\&/|||/g' |\
# comment-out conditional sections delimiters
    sed -e 's/<!\[.*%.*\[/<!--BEGIN & END-->/g' -e 's/\]\]>/<!--BEGIN & 
END-->/g' |\
# invoke sgml2xml: convert tags to lower case, match closing tags, 
remove doctype declaration etc.
    sgml2xml -x lower -x cdata -x comment |\
# append new doctype declaration; comment-out the entity declaration
    sed '1a <!DOCTYPE article PUBLIC "-//OASIS//DTD Docbook XML 
V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"' |\
    sed '2a [<!--' |\
    sed '3a <!ENTITY % defs SYSTEM "X11/defs.ent"> %defs;' |\
    sed '4a -->]>' |\
# make the markup more human-readable
    tidy -q -i -wrap 78 -xml |\
# unhide the entity declaration
    sed -e '4 s/<!--//' -e '6 s/-->//' |\
# uncomment the conditional sections
    sed -e 's/<!--BEGIN //' -e 's/ END-->//' |\
# unhide entity references ('|||nbsp;' returns ' ') and save to file
    sed 's/|||/\&/g' > "$WORKING_DIR/$output_file"



More information about the xorg mailing list