<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <title></title> <link rel="stylesheet" href="http://www.oasis-open.org/spectools/css/oasis-wd.css" type="text/css"> <meta name="generator" content="DocBook XSL Stylesheets V1.68.1"> </head> <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> <div class="article" lang="en"> <div class="toc"> <h2>Table of Contents</h2> <dl> <dt><span class="section"><a href="#s.po_overview">1. Overview of the PO file format</a></span></dt> <dd> <dl> <dt><span class="section"><a href="#s.po_overview.po_pot">1.1. PO and POT</a></span></dt> <dt><span class="section"><a href="#s.po_overview.general_structure">1.2. General Structure</a></span></dt> <dt><span class="section"><a href="#s.po_overview.header">1.3. Header</a></span></dt> <dt><span class="section"><a href="#s.po_overview.tu">1.4. Translation Units</a></span></dt> <dt><span class="section"><a href="#s.po_overview.domains">1.5. Domains</a></span></dt> </dl> </dd> </dl> </div> <hr> <div class="section" lang="en"> <div class="titlepage"> <div> <h2 class="title" style="clear: both"><a name="s.po_overview"></a>1. Overview of the PO file format</h2> </div> </div> <p>Because the Gettext PO format is not a defined standard - nor is the format well documented, we will in this section present an overview of the features and design of the PO file format.</p> <div class="section" lang="en"> <div class="titlepage"> <div> <h3 class="title"><a name="s.po_overview.po_pot"></a>1.1. PO and POT</h3> </div> </div> <p>There are two types of PO files: PO Template files (POTs) and Language specific PO files (POs). POTs contains a skeleton header, followed by the extracted translation units. POTs are generated by the <span class="application">xgettext</span> extraction tool and are not meant to be edited by humans. POTs are converted into Language Specific POs by the <span class="application">msginit</span> tool, and these files are then edited by translators.</p> <p>When source code is updated, a new POT is generated for the project, and the changes from previous versions are incorporated into the existing translations by using the <span class="application">msgmerge</span> tool. This tool inserts new translation units into the existing PO files, marks translation units no longer in use as obsolete, and updates any references and extracted comments.</p> <p>Translated PO files are converted to binary resource files, known as MO (Machine Object) files, by the <span class="application">msgfmt</span> tool. The Gettext library use MO files at runtime; hence PO files are only used in the development and localisation process.</p> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h3 class="title"><a name="s.po_overview.general_structure"></a>1.2. General Structure</h3> </div> </div> <p>A PO file starts with a header, followed by a number of translation units.</p> <pre class="programlisting"># SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: PACKAGE-NAME VERSION\n" "Report-Msgid-Bugs-To: BUG-EMAIL-ADDR <EMAIL@ADDRESS>\n" "POT-Creation-Date: YEAR-MO-DA HO:MI+ZONE\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <EMAIL@ADDRESS>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=n!=1;\n" "X-User-Defined-Var: VALUE\n" # Translator Comment #. Extracted Comment #: myfile.c:12 #, flag msgid "Original String 1" msgstr "Translated String 1" # Translator Comment #. Extracted Comment #: myfile.c:23 #, flag msgid "Original String 2" msgstr "Translated String 2" </pre> <p></p> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h3 class="title"><a name="s.po_overview.header"></a>1.3. Header</h3> </div> </div> <p></p> <pre class="programlisting"># French Translation for MyApplication. # Copyright (C) 2005 John Developer # This file is distributed under the same license as the MyApp package. # John Developer <john@example.com>, 2005. # Joe Translator <joe@example.com>, 2005. # msgid "" msgstr "" "Project-Id-Version: MyApp 1.0\n" "Report-Msgid-Bugs-To: MyApp List <myapp-list@example.com>\n" "POT-Creation-Date: 2005-04-27 13:15+0900\n" "PO-Revision-Date: 2005-04-27 13:45+0900\n" "Last-Translator: Joe Translator <joe@example.com>\n" "Language-Team: French Team <fr-list@example.com>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=(n!=1);\n" "X-Generator: KBabel 1.9\n" </pre> <p></p> <p>The PO header follows a simliar structure to PO translation units, but is destinguished by its empty source element ( <code class="code">msgid</code> ). The header variables are contained in the headers' target ( <code class="code">msgstr</code> ) element, with newline character representations ( <code class="code">"\n"</code> ) separating each variable.</p> <p>The initial comment lines (comments are lines starting with <code class="code">"# "</code> ) usually contains a copyright notice as well as licensing information, followed by a list of all translators that has been involved in translating the specific PO file.</p> <p>The header skeleton in a POT file is initially marked with the <code class="code">fuzzy</code> flag (flags are comma separated entries on lines starting with <code class="code">"#, "</code> ). This flag is removed when the header variables are filled in and the POT file is initialized to a language-specific PO file.</p> <p></p> <div class="table"><a name="id2581684"></a> <p class="title"><b>Table 1. Predefined PO Header variables</b></p> <table summary="Predefined PO Header variables" border="1"> <colgroup> <col> <col> </colgroup> <thead> <tr> <th>Variable Name</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><code class="code">Project-Id-Version</code></td> <td>Application name and version</td> </tr> <tr> <td><code class="code">Report-Msgid-Bugs-To</code></td> <td>Mailing list or contact person for reporting errors in translation units.</td> </tr> <tr> <td><code class="code">POT-Creation-Date</code></td> <td>Date POT file was generated. Automatically filled in by Gettext</td> </tr> <tr> <td><code class="code">PO-Revision-Date</code></td> <td>Timestamp when PO file was last edited by a translator</td> </tr> <tr> <td><code class="code">Last-Translator</code></td> <td>Contact information for last translator editing the file.</td> </tr> <tr> <td><code class="code">Language-Team</code></td> <td>Name of language team that translated this file</td> </tr> <tr> <td><code class="code">MIME-Version</code></td> <td>MIME version used for specifying Content-Type</td> </tr> <tr> <td><code class="code">Content-Type</code></td> <td>MIME content type for this file</td> </tr> <tr> <td><code class="code">Content-Transfer-Encoding</code></td> <td>MIME transfer encoding</td> </tr> <tr> <td><code class="code">Plural-Forms</code></td> <td>Number of plural forms in target language, and c-expression for evaluating which plural form to use for a parameter.</td> </tr> </tbody> </table> </div> <p>In addition to these predefined variables, the PO header can contain custom user-defined variables of the same format.</p> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h3 class="title"><a name="s.po_overview.tu"></a>1.4. Translation Units</h3> </div> </div> <p></p> <pre class="programlisting"># Translator Comment #. Extracted Comment #: myfile.c:12 myfile.c:32 #, flag msgid "Original String" msgstr "Translated String"</pre> <p></p> <p>PO translation units use the source string (<code class="code">msgid</code>) as primary id, and contain the translation in the <code class="code">msgstr</code> field. In addition to this, PO translation units contain other meta-data, explained in further detail in the following sections.</p> <div class="section" lang="en"> <div class="titlepage"> <div> <h4 class="title"><a name="s.po_overview.tu.msg"></a>1.4.1. Source and Target</h4> </div> </div> <p>The <code class="code">msgid</code> and <code class="code">msgstr</code> contains the source and target string of a translation unit.</p> <p>The actual content of <code class="code">msgid</code> and <code class="code">msgstr</code> is a concatonation of the strings inside the quotes (<code class="code">' " '</code> characters ) on each line, meaning that the following two examples are identical:</p> <pre class="programlisting">msgid "" "My name is " "" "%s. \n" "What is" " " "your name?" </pre> <p>and</p> <pre class="programlisting">msgid "My name is %s. \n What is your name?"</pre> <p></p> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h4 class="title"><a name="s.po_overview.tu.transcomment"></a>1.4.2. Translator Comments</h4> </div> </div> <p></p> <pre class="programlisting"># This is a comment line # This is another comment line</pre> <p>Translator comments are lines starting with <code class="code">"# "</code> (hash character + whitespace character). These comments are added by translators, and are not present in POT files.</p> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h4 class="title"><a name="s.po_overview.tu.autocomment"></a>1.4.3. Extracted Comments</h4> </div> </div> <p></p> <pre class="programlisting">#. This is an extracted comment #. This is another extracted comment</pre> <p>Extracted comments are lines starting with <code class="code">"#."</code> (hash character + dot character). These comments are extracted from the source code. Source-code comments are normally extracted if they are on the same line as the source string, or on the line immidiately preceeding it, as in the following c-example:</p> <pre class="programlisting">/* This comment will be extracted */ gettext("Hello World");</pre> <p>This would become:</p> <pre class="programlisting">#. This comment will be extracted msgid "Hello World" msgstr ""</pre> <p></p> <p>When updating a PO file from a new POT file, existing extracted comments in the language specific PO file are discarded, and the extracted comments present in the POT file are inserted in the existing PO file.</p> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h4 class="title"><a name="s.po_overview.tu.references"></a>1.4.4. References</h4> </div> </div> <p></p> <pre class="programlisting">#: myfile.c:1 myfile.c:23 otherfile.c:1 #: otherfile.c:34</pre> <p>References are identified by lines starting with <code class="code">"#:"</code> (hash character + colon character). References are space separated lists of locations (<code class="code">sourcefile:linenumber</code>) specifying where the translation unit is found in a source file.</p> <p>As each <code class="code">msgid</code> has to be unique within a PO domain, a single translation unit can contain muliple references; one for each location where the string is found in the source code.</p> <p>Similar to extracted comments, when updating a PO file from a new POT file, existing references in the language specific PO file are discarded, and the references present in the POT file are inserted in the existing PO file.</p> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h4 class="title"><a name="s.po_overview.tu.flags"></a>1.4.5. Flags</h4> </div> </div> <p>Flags are identified by lines starting with <code class="code">"#,"</code> (hash character + comma character). Multiple flags are separated by commas.</p> <p>Flags are used both as processing instructions by the Gettext tools, and by translators to indicate that a translation unit is unfinished or "fuzzy".</p> <div class="table"><a name="id2581054"></a> <p class="title"><b>Table 2. Flag values and descriptions</b></p> <table summary="Flag values and descriptions" border="1"> <colgroup> <col> <col> </colgroup> <thead> <tr> <th>Flag Name</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><code class="code">fuzzy</code></td> <td> <p>Indicates that a translation units needs review by a translator.</p> <p>This flag is inserted by the gettext tools when a translation unit changes, or when the translation unit does not pass the format check. </p> <p>The flag is also commonly used by translators to mark a translation unit as unfinished.</p> <p>Note that entries marked as <code class="code">fuzzy</code> are not included when PO files are compiled to binary MO files.</p> </td> </tr> <tr> <td><code class="code">no-wrap</code></td> <td> <p>Indicates that the text in the <code class="code">msgid</code> field is not to be wrapped at page with (usually 80 characters) which it usually is. Note that this does not affect the wrapping of the actual source string, only the representation of it in the PO file.</p> <p>This flag is set by developers in the source code, or by adding a command-line flag when invoking the Gettext tools.</p> </td> </tr> <tr> <td><code class="code"><em class="replaceable"><code>X</code></em>-format</code>, where <em class="replaceable"><code>X</code></em> is any of the following: <div class="itemizedlist"> <ul type="bullet"> <li style="list-style-type: disc">awk</li> <li style="list-style-type: disc">c</li> <li style="list-style-type: disc">csharp</li> <li style="list-style-type: disc">elips</li> <li style="list-style-type: disc">gcc-internal</li> <li style="list-style-type: disc">java</li> <li style="list-style-type: disc">librep</li> <li style="list-style-type: disc">lisp</li> <li style="list-style-type: disc">objc</li> <li style="list-style-type: disc">object-pascal</li> <li style="list-style-type: disc">perl</li> <li style="list-style-type: disc">perl-brace</li> <li style="list-style-type: disc">php</li> <li style="list-style-type: disc">python</li> <li style="list-style-type: disc">qt</li> <li style="list-style-type: disc">scheme</li> <li style="list-style-type: disc">sh</li> <li style="list-style-type: disc">smalltalk</li> <li style="list-style-type: disc">tcl</li> <li style="list-style-type: disc">ycp</li> </ul> </div> </td> <td> <p>Indicates that Gettext is to do a format check on the translation unit to validate that both <code class="code">msgid</code> and <code class="code">msgstr</code> contains valid parameter values according to the source format.</p> <p>This flag is automatically inserted by the Gettext extraction tool.</p> </td> </tr> <tr> <td><code class="code">no-<em class="replaceable"><code>X</code></em>-format</code>, where <em class="replaceable"><code>X</code></em> is any of the items in the list above.</td> <td> <p>Indicates that Gettext is to skip the format check for this translation unit.</p> <p>This flag has to be set by developers in the source code.</p> </td> </tr> </tbody> </table> </div> <p></p> <p>Flags are inserted and overridden by developers in source code, by adding them to a comment immediately preceeding the call to gettext, as in the following example:</p> <pre class="programlisting">/* xgettext:no-c-format */ printf(_("Hello World")); </pre> <p>Since the Gettext call here is inside a <code class="code">printf</code> function call, the gettext tools will automatically assume this is a <code class="code">c-format</code> string. But in this example the developer overrides that, and specifies it is not so, which would generate the following PO translation unit:</p> <pre class="programlisting">#, no-c-format msgid "Hello World" msgstr "" </pre> <p></p> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h4 class="title"><a name="s.po_overview.tu.plural"></a>1.4.6. Plural Forms</h4> </div> </div> <p>Gettext, in addition to supporting normal translation units with a single <code class="code">msgid</code> and <code class="code">msgstr</code>, support <span class="emphasis"><em>plural form</em></span> translation units. These translation units contain the <span class="emphasis"><em>singular</em></span> English form in the <code class="code">msgid</code> field, and the <span class="emphasis"><em>plural</em></span> form in the <code class="code">msgid_plural</code>. As the target, these translation units have an array of <code class="code">msgstr</code>, representing the number of forms in the target language:</p> <pre class="programlisting">msgid "You have %d file" msgid_plural "You have %d files" msgstr[0] "Du har %d fil" msgstr[1] "Du har %d filer" </pre> <p></p> <p>The target language may have one or more forms (Japanese has one form, while Polish has 3 forms), and the logic for selecting which form to use for a parameter is defined in a PO header field, where <code class="code">nplurals</code> defines the number of forms and <code class="code">plural</code> contains a c-expression for evaluating which item in the <code class="code">msgstr</code> array to use at runtime:</p> <pre class="programlisting">"Plural-Forms: nplurals=2; plural=(n != 1);\n" </pre> <p>This is a typical example for a Germanic language, which has a special case when <code class="code">n</code> is 1. A more complex example is Polish, which has special cases for when <code class="code">n</code> is 1, and in addition some numbers ending in 2, 3 or 4:</p> <pre class="programlisting">"Plural-Forms: nplurals=3; " "plural=n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;" </pre> <p></p> <p>C-expressions are defined as <code class="code">condition ? true_value : false_value</code> where <code class="code">condition</code> is an expression evaluating to true/false. In the above example, the first condition is <code class="code">n==1</code> which if true gives the result <code class="code">0</code>, and if false gives the result of a second c-expression. For the second expression, the condition is <code class="code">n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20)</code>, which if true gives the result <code class="code">1</code>, and if false gives the result <code class="code">2</code>. At runtime, Gettext will use the <code class="code">msgstr</code> with the index returned from this expression.</p> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h4 class="title"><a name="s.po_overview.tu.obsolete_entries"></a>1.4.7. Obsolete Translation Units</h4> </div> </div> <p>Obsolete entries are translation units that are no longer present in the source-files, and are therefore commented out when a PO file is updated. These entries are re-used by Gettext only if the translation-unit re-appears in the project, and are also used for fuzzy matching by the 'msgmerge' tool. Obsolete entries are marked with <code class="code">#~</code>, as in the following example:</p> <pre class="programlisting"># This is a translator comment #~ msgid "" #~ "Please enter the following details:\n" #~ " - First Name\n" #~ " - Last Name\n" #~ msgstr "" #~ "Venligst fyll inn følgende data:\n" #~ " - Fornavn\n" #~ " - Etternavn\n" </pre> <p></p> </div> </div> <div class="section" lang="en"> <div class="titlepage"> <div> <h3 class="title"><a name="s.po_overview.domains"></a>1.5. Domains</h3> </div> </div> <p>One single PO file normally represents one MO file, known as a Gettext <span class="emphasis"><em>domain</em></span>, but the PO format also allows for representing multiple domains in a single PO file. This is done by adding the <code class="code">domain</code> keyword followed by the domain name, as in the following example:</p> <pre class="programlisting">domain "domain_1" msgid "hello world" msgstr "hei verden" domain "domain_2" msgid "hello world" msgstr "hei verden" </pre> <p></p> The above example would produce two MO files, <code class="code">domain_1.mo</code> and <code class="code">domain_2.mo</code>. If no domain is specified, translation units belong to the default domain <code class="code">messages</code>. <p>A PO header is bound to a domain, so each domain has its own header. </p> <p>Having mulitple domains in a single PO file is very rare; in fact, the authors have never seen this in use.</p> </div> </div> </div> </body> </html>