Bugs in doclifting the Xorg man pages
Carlo Salinari
csali at tiscali.it
Tue Jan 23 00:32:59 PST 2007
Alan Coopersmith wrote:
> For those who don't follow the bugzilla e-mail, but want to help
> in the great Xorg Doc XMLification, Eric Raymond has filed a
> number of bugs against man pages that give doclifter fits - checking
> in his fixes would bring us one step closer to moving the docs to XML:
>
> https://bugs.freedesktop.org/buglist.cgi?query_format=advanced&emailreporter1=1&emailtype1=substring&email1=esr%40thyrsus.com
>
>
By the way, I reworked my script to convert sgml files to xml. I didn't
post it earlier because I wanted to integrate the changes in the build
scripts, but got stuck in the autotools. Kind of frustrating.
Also, it doesn't handle the Japanese files, since I'm not able to check
the correctness of the transformation for this language.
So, for the record, here is the script :
---[go_docbookx.sh]---------------------------------
#!/bin/sh
#####
# Utility script to convert xorg's DocBook articles from SGML to XML
#####
# check if the specified input file exists, otherwise print usage message
sgml_file=$1
if ! [ $sgml_file -a -e $sgml_file ]
then cat << USAGE
Utility script to convert xorg's DocBook articles from SGML to XML.
Usage:
$0 file.sgml
Produces file.xml in the same directory of file.sgml
USAGE
exit 1
fi
# quick check for programs we will use in the conversion
if ! [ "$(which perl)" -a "$(which sgml2xml)" -a "$(which tidy)" ]
then cat << REQUIRED
This script requires the following programs:
perl
sgml2xml
tidy
REQUIRED
exit 2
fi
WORKING_DIR=$(dirname $sgml_file)
basename=$(basename $sgml_file)
output_file=$(basename $sgml_file .sgml).xml
echo "### Processing: $basename"
# cut everything before the opening <article> tag
perl -we '$found=0; while(<>) {if (/<article>/i) {$found=1}; print
if $found}' $sgml_file |\
# re-insert old doctype declaration, but without the entity declaration
sed '1i <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" >' |\
# hide entity references from the parsers: ' ' becomes '|||nbsp;'
sed 's/\&/|||/g' |\
# comment-out conditional sections delimiters
sed -e 's/<!\[.*%.*\[/<!--BEGIN & END-->/g' -e 's/\]\]>/<!--BEGIN &
END-->/g' |\
# invoke sgml2xml: convert tags to lower case, match closing tags,
remove doctype declaration etc.
sgml2xml -x lower -x cdata -x comment |\
# append new doctype declaration; comment-out the entity declaration
sed '1a <!DOCTYPE article PUBLIC "-//OASIS//DTD Docbook XML
V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"' |\
sed '2a [<!--' |\
sed '3a <!ENTITY % defs SYSTEM "X11/defs.ent"> %defs;' |\
sed '4a -->]>' |\
# make the markup more human-readable
tidy -q -i -wrap 78 -xml |\
# unhide the entity declaration
sed -e '4 s/<!--//' -e '6 s/-->//' |\
# uncomment the conditional sections
sed -e 's/<!--BEGIN //' -e 's/ END-->//' |\
# unhide entity references ('|||nbsp;' returns ' ') and save to file
sed 's/|||/\&/g' > "$WORKING_DIR/$output_file"
------------------------------------
ciao,
Carlo
More information about the xorg
mailing list