<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title></title>
<link rel="stylesheet"
        href="http://www.oasis-open.org/spectools/css/oasis-wd.css"
        type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.68.1">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084"
        alink="#0000FF">
<div class="article" lang="en">
<div class="toc">
<h2>Table of Contents</h2>
<dl>
        <dt><span class="section"><a href="#s.po_overview">1. Overview of the
        PO file format</a></span></dt>
        <dd>
        <dl>
                <dt><span class="section"><a href="#s.po_overview.po_pot">1.1. PO and
                POT</a></span></dt>
                <dt><span class="section"><a href="#s.po_overview.general_structure">1.2.
                General Structure</a></span></dt>
                <dt><span class="section"><a href="#s.po_overview.header">1.3. Header</a></span></dt>
                <dt><span class="section"><a href="#s.po_overview.tu">1.4. Translation
                Units</a></span></dt>
                <dt><span class="section"><a href="#s.po_overview.domains">1.5.
                Domains</a></span></dt>
        </dl>
        </dd>
</dl>
</div>
<hr>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h2 class="title" style="clear: both"><a name="s.po_overview"></a>1. Overview
of the PO file format</h2>
</div>
</div>
<p>Because the Gettext PO format is not a defined standard - nor is the
format well documented, we will in this section present an overview of
the features and design of the PO file format.</p>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h3 class="title"><a name="s.po_overview.po_pot"></a>1.1. PO and POT</h3>
</div>
</div>
<p>There are two types of PO files: PO Template files (POTs) and
Language specific PO files (POs). POTs contains a skeleton header,
followed by the extracted translation units. POTs are generated by the <span
        class="application">xgettext</span> extraction tool and are not meant
to be edited by humans. POTs are converted into Language Specific POs by
the <span class="application">msginit</span> tool, and these files are
then edited by translators.</p>
<p>When source code is updated, a new POT is generated for the project,
and the changes from previous versions are incorporated into the
existing translations by using the <span class="application">msgmerge</span>
tool. This tool inserts new translation units into the existing PO
files, marks translation units no longer in use as obsolete, and updates
any references and extracted comments.</p>
<p>Translated PO files are converted to binary resource files, known as
MO (Machine Object) files, by the <span class="application">msgfmt</span>
tool. The Gettext library use MO files at runtime; hence PO files are
only used in the development and localisation process.</p>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h3 class="title"><a name="s.po_overview.general_structure"></a>1.2. General
Structure</h3>
</div>
</div>
<p>A PO file starts with a header, followed by a number of translation
units.</p>
<pre class="programlisting"># SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE-NAME VERSION\n"
"Report-Msgid-Bugs-To: BUG-EMAIL-ADDR <EMAIL@ADDRESS>\n"
"POT-Creation-Date: YEAR-MO-DA HO:MI+ZONE\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <EMAIL@ADDRESS>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=n!=1;\n"
"X-User-Defined-Var: VALUE\n"
# Translator Comment
#. Extracted Comment
#: myfile.c:12
#, flag
msgid "Original String 1"
msgstr "Translated String 1"
# Translator Comment
#. Extracted Comment
#: myfile.c:23
#, flag
msgid "Original String 2"
msgstr "Translated String 2"
</pre>
<p></p>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h3 class="title"><a name="s.po_overview.header"></a>1.3. Header</h3>
</div>
</div>
<p></p>
<pre class="programlisting"># French Translation for MyApplication.
# Copyright (C) 2005 John Developer
# This file is distributed under the same license as the MyApp package.
# John Developer <john@example.com>, 2005.
# Joe Translator <joe@example.com>, 2005.
#
msgid ""
msgstr ""
"Project-Id-Version: MyApp 1.0\n"
"Report-Msgid-Bugs-To: MyApp List <myapp-list@example.com>\n"
"POT-Creation-Date: 2005-04-27 13:15+0900\n"
"PO-Revision-Date: 2005-04-27 13:45+0900\n"
"Last-Translator: Joe Translator <joe@example.com>\n"
"Language-Team: French Team <fr-list@example.com>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n!=1);\n"
"X-Generator: KBabel 1.9\n"
</pre>
<p></p>
<p>The PO header follows a simliar structure to PO translation units,
but is destinguished by its empty source element ( <code class="code">msgid</code>
). The header variables are contained in the headers' target ( <code
        class="code">msgstr</code> ) element, with newline character
representations ( <code class="code">"\n"</code> ) separating each
variable.</p>
<p>The initial comment lines (comments are lines starting with <code
        class="code">"# "</code> ) usually contains a copyright notice as well
as licensing information, followed by a list of all translators that has
been involved in translating the specific PO file.</p>
<p>The header skeleton in a POT file is initially marked with the <code
        class="code">fuzzy</code> flag (flags are comma separated entries on
lines starting with <code class="code">"#, "</code> ). This flag is
removed when the header variables are filled in and the POT file is
initialized to a language-specific PO file.</p>
<p></p>
<div class="table"><a name="id2581684"></a>
<p class="title"><b>Table 1. Predefined PO Header variables</b></p>
<table summary="Predefined PO Header variables" border="1">
        <colgroup>
                <col>
                <col>
        </colgroup>
        <thead>
                <tr>
                        <th>Variable Name</th>
                        <th>Description</th>
                </tr>
        </thead>
        <tbody>
                <tr>
                        <td><code class="code">Project-Id-Version</code></td>
                        <td>Application name and version</td>
                </tr>
                <tr>
                        <td><code class="code">Report-Msgid-Bugs-To</code></td>
                        <td>Mailing list or contact person for reporting errors in
                        translation units.</td>
                </tr>
                <tr>
                        <td><code class="code">POT-Creation-Date</code></td>
                        <td>Date POT file was generated. Automatically filled in by Gettext</td>
                </tr>
                <tr>
                        <td><code class="code">PO-Revision-Date</code></td>
                        <td>Timestamp when PO file was last edited by a translator</td>
                </tr>
                <tr>
                        <td><code class="code">Last-Translator</code></td>
                        <td>Contact information for last translator editing the file.</td>
                </tr>
                <tr>
                        <td><code class="code">Language-Team</code></td>
                        <td>Name of language team that translated this file</td>
                </tr>
                <tr>
                        <td><code class="code">MIME-Version</code></td>
                        <td>MIME version used for specifying Content-Type</td>
                </tr>
                <tr>
                        <td><code class="code">Content-Type</code></td>
                        <td>MIME content type for this file</td>
                </tr>
                <tr>
                        <td><code class="code">Content-Transfer-Encoding</code></td>
                        <td>MIME transfer encoding</td>
                </tr>
                <tr>
                        <td><code class="code">Plural-Forms</code></td>
                        <td>Number of plural forms in target language, and c-expression for
                        evaluating which plural form to use for a parameter.</td>
                </tr>
        </tbody>
</table>
</div>
<p>In addition to these predefined variables, the PO header can contain
custom user-defined variables of the same format.</p>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h3 class="title"><a name="s.po_overview.tu"></a>1.4. Translation Units</h3>
</div>
</div>
<p></p>
<pre class="programlisting"># Translator Comment
#. Extracted Comment
#: myfile.c:12 myfile.c:32
#, flag
msgid "Original String"
msgstr "Translated String"</pre>
<p></p>
<p>PO translation units use the source string (<code class="code">msgid</code>)
as primary id, and contain the translation in the <code class="code">msgstr</code>
field. In addition to this, PO translation units contain other
meta-data, explained in further detail in the following sections.</p>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h4 class="title"><a name="s.po_overview.tu.msg"></a>1.4.1. Source and
Target</h4>
</div>
</div>
<p>The <code class="code">msgid</code> and <code class="code">msgstr</code>
contains the source and target string of a translation unit.</p>
<p>The actual content of <code class="code">msgid</code> and <code
        class="code">msgstr</code> is a concatonation of the strings inside the
quotes (<code class="code">' " '</code> characters ) on each line,
meaning that the following two examples are identical:</p>
<pre class="programlisting">msgid ""
"My name is "
""
"%s. \n"
"What is"
" "
"your name?"
</pre>
<p>and</p>
<pre class="programlisting">msgid "My name is %s. \n What is your name?"</pre>
<p></p>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h4 class="title"><a name="s.po_overview.tu.transcomment"></a>1.4.2. Translator
Comments</h4>
</div>
</div>
<p></p>
<pre class="programlisting"># This is a comment line
# This is another comment line</pre>
<p>Translator comments are lines starting with <code class="code">"# "</code>
(hash character + whitespace character). These comments are added by
translators, and are not present in POT files.</p>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h4 class="title"><a name="s.po_overview.tu.autocomment"></a>1.4.3. Extracted
Comments</h4>
</div>
</div>
<p></p>
<pre class="programlisting">#. This is an extracted comment
#. This is another extracted comment</pre>
<p>Extracted comments are lines starting with <code class="code">"#."</code>
(hash character + dot character). These comments are extracted from the
source code. Source-code comments are normally extracted if they are on
the same line as the source string, or on the line immidiately
preceeding it, as in the following c-example:</p>
<pre class="programlisting">/* This comment will be extracted */
gettext("Hello World");</pre>
<p>This would become:</p>
<pre class="programlisting">#. This comment will be extracted
msgid "Hello World"
msgstr ""</pre>
<p></p>
<p>When updating a PO file from a new POT file, existing extracted
comments in the language specific PO file are discarded, and the
extracted comments present in the POT file are inserted in the existing
PO file.</p>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h4 class="title"><a name="s.po_overview.tu.references"></a>1.4.4. References</h4>
</div>
</div>
<p></p>
<pre class="programlisting">#: myfile.c:1 myfile.c:23 otherfile.c:1
#: otherfile.c:34</pre>
<p>References are identified by lines starting with <code class="code">"#:"</code>
(hash character + colon character). References are space separated lists
of locations (<code class="code">sourcefile:linenumber</code>)
specifying where the translation unit is found in a source file.</p>
<p>As each <code class="code">msgid</code> has to be unique within a PO
domain, a single translation unit can contain muliple references; one
for each location where the string is found in the source code.</p>
<p>Similar to extracted comments, when updating a PO file from a new POT
file, existing references in the language specific PO file are
discarded, and the references present in the POT file are inserted in
the existing PO file.</p>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h4 class="title"><a name="s.po_overview.tu.flags"></a>1.4.5. Flags</h4>
</div>
</div>
<p>Flags are identified by lines starting with <code class="code">"#,"</code>
(hash character + comma character). Multiple flags are separated by
commas.</p>
<p>Flags are used both as processing instructions by the Gettext tools,
and by translators to indicate that a translation unit is unfinished or
"fuzzy".</p>
<div class="table"><a name="id2581054"></a>
<p class="title"><b>Table 2. Flag values and descriptions</b></p>
<table summary="Flag values and descriptions" border="1">
        <colgroup>
                <col>
                <col>
        </colgroup>
        <thead>
                <tr>
                        <th>Flag Name</th>
                        <th>Description</th>
                </tr>
        </thead>
        <tbody>
                <tr>
                        <td><code class="code">fuzzy</code></td>
                        <td>
                        <p>Indicates that a translation units needs review by a translator.</p>
                        <p>This flag is inserted by the gettext tools when a translation unit
                        changes, or when the translation unit does not pass the format check.
                        </p>
                        <p>The flag is also commonly used by translators to mark a
                        translation unit as unfinished.</p>
                        <p>Note that entries marked as <code class="code">fuzzy</code> are
                        not included when PO files are compiled to binary MO files.</p>
                        </td>
                </tr>
                <tr>
                        <td><code class="code">no-wrap</code></td>
                        <td>
                        <p>Indicates that the text in the <code class="code">msgid</code>
                        field is not to be wrapped at page with (usually 80 characters) which
                        it usually is. Note that this does not affect the wrapping of the
                        actual source string, only the representation of it in the PO file.</p>
                        <p>This flag is set by developers in the source code, or by adding a
                        command-line flag when invoking the Gettext tools.</p>
                        </td>
                </tr>
                <tr>
                        <td><code class="code"><em class="replaceable"><code>X</code></em>-format</code>,
                        where <em class="replaceable"><code>X</code></em> is any of the
                        following:
                        <div class="itemizedlist">
                        <ul type="bullet">
                                <li style="list-style-type: disc">awk</li>
                                <li style="list-style-type: disc">c</li>
                                <li style="list-style-type: disc">csharp</li>
                                <li style="list-style-type: disc">elips</li>
                                <li style="list-style-type: disc">gcc-internal</li>
                                <li style="list-style-type: disc">java</li>
                                <li style="list-style-type: disc">librep</li>
                                <li style="list-style-type: disc">lisp</li>
                                <li style="list-style-type: disc">objc</li>
                                <li style="list-style-type: disc">object-pascal</li>
                                <li style="list-style-type: disc">perl</li>
                                <li style="list-style-type: disc">perl-brace</li>
                                <li style="list-style-type: disc">php</li>
                                <li style="list-style-type: disc">python</li>
                                <li style="list-style-type: disc">qt</li>
                                <li style="list-style-type: disc">scheme</li>
                                <li style="list-style-type: disc">sh</li>
                                <li style="list-style-type: disc">smalltalk</li>
                                <li style="list-style-type: disc">tcl</li>
                                <li style="list-style-type: disc">ycp</li>
                        </ul>
                        </div>
                        </td>
                        <td>
                        <p>Indicates that Gettext is to do a format check on the translation
                        unit to validate that both <code class="code">msgid</code> and <code
                                class="code">msgstr</code> contains valid parameter values according
                        to the source format.</p>
                        <p>This flag is automatically inserted by the Gettext extraction
                        tool.</p>
                        </td>
                </tr>
                <tr>
                        <td><code class="code">no-<em class="replaceable"><code>X</code></em>-format</code>,
                        where <em class="replaceable"><code>X</code></em> is any of the items
                        in the list above.</td>
                        <td>
                        <p>Indicates that Gettext is to skip the format check for this
                        translation unit.</p>
                        <p>This flag has to be set by developers in the source code.</p>
                        </td>
                </tr>
        </tbody>
</table>
</div>
<p></p>
<p>Flags are inserted and overridden by developers in source code, by
adding them to a comment immediately preceeding the call to gettext, as
in the following example:</p>
<pre class="programlisting">/* xgettext:no-c-format */
printf(_("Hello World"));
</pre>
<p>Since the Gettext call here is inside a <code class="code">printf</code>
function call, the gettext tools will automatically assume this is a <code
        class="code">c-format</code> string. But in this example the developer
overrides that, and specifies it is not so, which would generate the
following PO translation unit:</p>
<pre class="programlisting">#, no-c-format
msgid "Hello World"
msgstr ""
</pre>
<p></p>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h4 class="title"><a name="s.po_overview.tu.plural"></a>1.4.6. Plural
Forms</h4>
</div>
</div>
<p>Gettext, in addition to supporting normal translation units with a
single <code class="code">msgid</code> and <code class="code">msgstr</code>,
support <span class="emphasis"><em>plural form</em></span> translation
units. These translation units contain the <span class="emphasis"><em>singular</em></span>
English form in the <code class="code">msgid</code> field, and the <span
        class="emphasis"><em>plural</em></span> form in the <code class="code">msgid_plural</code>.
As the target, these translation units have an array of <code
        class="code">msgstr</code>, representing the number of forms in the
target language:</p>
<pre class="programlisting">msgid "You have %d file"
msgid_plural "You have %d files"
msgstr[0] "Du har %d fil"
msgstr[1] "Du har %d filer"
</pre>
<p></p>
<p>The target language may have one or more forms (Japanese has one
form, while Polish has 3 forms), and the logic for selecting which form
to use for a parameter is defined in a PO header field, where <code
        class="code">nplurals</code> defines the number of forms and <code
        class="code">plural</code> contains a c-expression for evaluating which
item in the <code class="code">msgstr</code> array to use at runtime:</p>
<pre class="programlisting">"Plural-Forms: nplurals=2; plural=(n != 1);\n"
</pre>
<p>This is a typical example for a Germanic language, which has a
special case when <code class="code">n</code> is 1. A more complex
example is Polish, which has special cases for when <code class="code">n</code>
is 1, and in addition some numbers ending in 2, 3 or 4:</p>
<pre class="programlisting">"Plural-Forms: nplurals=3; "
"plural=n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;"
</pre>
<p></p>
<p>C-expressions are defined as <code class="code">condition ?
true_value : false_value</code> where <code class="code">condition</code>
is an expression evaluating to true/false. In the above example, the
first condition is <code class="code">n==1</code> which if true gives
the result <code class="code">0</code>, and if false gives the result of
a second c-expression. For the second expression, the condition is <code
        class="code">n%10>=2 && n%10<=4 && (n%100<10
|| n%100>=20)</code>, which if true gives the result <code
        class="code">1</code>, and if false gives the result <code class="code">2</code>.
At runtime, Gettext will use the <code class="code">msgstr</code> with
the index returned from this expression.</p>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h4 class="title"><a name="s.po_overview.tu.obsolete_entries"></a>1.4.7. Obsolete
Translation Units</h4>
</div>
</div>
<p>Obsolete entries are translation units that are no longer present in
the source-files, and are therefore commented out when a PO file is
updated. These entries are re-used by Gettext only if the
translation-unit re-appears in the project, and are also used for fuzzy
matching by the 'msgmerge' tool. Obsolete entries are marked with <code
        class="code">#~</code>, as in the following example:</p>
<pre class="programlisting"># This is a translator comment
#~ msgid ""
#~ "Please enter the following details:\n"
#~ " - First Name\n"
#~ " - Last Name\n"
#~ msgstr ""
#~ "Venligst fyll inn følgende data:\n"
#~ " - Fornavn\n"
#~ " - Etternavn\n"
</pre>
<p></p>
</div>
</div>
<div class="section" lang="en">
<div class="titlepage">
<div>
<h3 class="title"><a name="s.po_overview.domains"></a>1.5. Domains</h3>
</div>
</div>
<p>One single PO file normally represents one MO file, known as a
Gettext <span class="emphasis"><em>domain</em></span>, but the PO format
also allows for representing multiple domains in a single PO file. This
is done by adding the <code class="code">domain</code> keyword followed
by the domain name, as in the following example:</p>
<pre class="programlisting">domain "domain_1"
msgid "hello world"
msgstr "hei verden"
domain "domain_2"
msgid "hello world"
msgstr "hei verden"
</pre>
<p></p>
The above example would produce two MO files, <code class="code">domain_1.mo</code>
and <code class="code">domain_2.mo</code>. If no domain is specified,
translation units belong to the default domain <code class="code">messages</code>.
<p>A PO header is bound to a domain, so each domain has its own header.
</p>
<p>Having mulitple domains in a single PO file is very rare; in fact,
the authors have never seen this in use.</p>
</div>
</div>
</div>
</body>
</html>