[poppler] utils manpages in mdoc(7)

Jan Stary hans at stare.cz
Sun Nov 9 08:29:11 PST 2014


On Nov 09 16:46:37, aacid at kde.org wrote:
> El Diumenge, 9 de novembre de 2014, a les 13:21:45, Jan Stary va escriure:
> > Currently, the manpages of the poppler utils
> > are written in the legacy man(7) markup language.
> > Below please find a proposed rewrite of pdfunite.1
> > into the _semantic_ mdoc(7) language.
> > 
> > Both languages are well supported for decades,
> > by groff (on most linuxes) and by mandoc (the BSDs).
> > The advantage of the semantic markup is that it allows
> > for cinstructions like "There is an optional -h flag"
> > 
> > 	.Op Fl h
> > 
> > as opposed to the physical markup of
> > "type a bracket, switch to italics, type -h,
> > swithc back to roman, typed the closing bracket"
> > and similarly for other manpage constructions.
> > See http://manpages.bsd.lv/ for an elaborate discussion
> > on why this is a good thing.
> 
> Can we have a short summary?

The man(7) markup language uses _physical_ markup,
such as "put this in boldface", "type a bracket here", etc.
The mdoc(7) language is a _semantic_ markup: it describes
the meaning and purpose, as opposed to details of presentation.

For example, this is how pdfimages.1 mentions the -f option:

	.BI \-f " number"

This means: "switch to italics, type a dasf-ef,
then type 'number' separated by a space".

This is how mdoc(7) describes the same:

	.Fl f Ar number

This means: "There is an 'f' flag, which takes a 'number' argument".
That's the _meaning_ of it. Presentational details such as
"prepend the option with a dash" or "make it italics"
or "separate the option flag and the argument name with a space"
are presentational details, described in macros.

In a not-too-far-fetched analogy, this is like the difference
in having "<b>stuff</b>" in your html code and properly tagging
various classes of information as such, to be assigned a given
presentation in a CSS.


Both languages are well supported for over a decade.

On most linuxes, the rendering is done by groff(1),
which understands "groff -man" and "groff -mdoc"
(beside other things). Most linux distributions
have their system manpages written in man(7).

On the *BSD family of systems, the manpages are gradully being
rewritten into mdoc(7), and the rendering is gradually overtaken
by mandoc(1), a replacement of groff. This supports both man(7) and mdoc(7)
too (beside other things). On OpenBSD, for example, the system manpages
have all been rewritten into mdoc(7). A lot of third-site software
uses man(7), so part of the porting/packaging process is to either
check that the upstream manpages render well with mandoc,
or to use groff.

My main motivation here is to ease this, and the cleanest way
I think is to have the semantic markup. The port of poppler,
by the way, renders very well with mandoc -man, so there is no
need to use groff. However, I think that this is independently
an improvement of the poppler-utils manpages as such.

Below please find another example,
pdfseparate.1 rewritten into mdoc(7).


> > Please let me know if there is any interest in this,
> > I am willing to do the work.
> > 
> > 	Your happy user
> > 
 		Jan


.Dd November 10, 2014
.Dt PDFSEPARATE 1
.Os
.Sh NAME
.Nm pdfseparate
.Nd extract pages from a PDF document
.Sh SYNOPSIS
.Nm pdfseparate
.Op Fl h
.Op Fl v
.Op Fl f Ar first
.Op Fl l Ar last
.Ar input
.Ar name-pattern
.Sh DESCRIPTION
.Nm
extracts individual pages from a PDF document.
The input document must not be encrypted.
.Pp
The pages extracted from
.Ar input
are saved in individual output files named like
.Ar name-pattern .
The
.Ar name-pattern
must contain a
.Dq %d
placeholder if more than one page is to be be extracted.
The
.Dq %d
will be replaced by the original page number.
.Pp
The options are as follows:
.Pp
.Bl -tag -width 8n -compact
.It Fl f Ar first
The first page to extract (start of input by default).
.It Fl l Ar last
The last page to extract (end of input by default).
.It Fl h
Print usage information.
.It Fl v
Print copyright and version information.
.El
.Sh EXAMPLES
.Dl $ pdfseparate file.pdf file-%d.pdf
.Pp
extracts all pages from
.Pa file.pdf .
If
.Pa file.pdf
has 3 pages, the resulting files will be named
.Pa sample-1.pdf ,
.Pa sample-2.pdf
and
.Pa sample-3.pdf .
.Sh SEE ALSO
.Xr pdfunite 1
.Pp
.Lk http://poppler.freedesktop.org


More information about the poppler mailing list