[xliff-tools] PO Representation Guide: The PO Header

Fri Feb 11 01:57:52 PST 2005

On Friday 11 February 2005 00:25, Rodolfo M. Raya wrote:
> The following is an extract from the guide:
>
>
>   <source xml:space="preserve">
>     # SOME DESCRIPTIVE TITLE.
>     # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
>     # This file is distributed under the same license as the PACKAGE
> package.
>     # FIRST AUTHOR &lt;EMAIL at ADDRESS&gt;, YEAR.
>     #
>     #, fuzzy
>     msgid ""
>     msgstr ""
>     "Project-Id-Version: PACKAGE VERSION\n"
>     "Report-Msgid-Bugs-To: \n"
>     "POT-Creation-Date: 2004-11-11 04:29+0900\n"
>     "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
>     "Last-Translator: FULL NAME &lt;EMAIL at ADDRESS&gt;\n"
>     "Language-Team: LANGUAGE &lt;LL at li.org&gt;\n"
>     "MIME-Version: 1.0\n"
>     "Content-Type: text/plain; charset=CHARSET\n"
>     "Content-Transfer-Encoding: 8bit\n"
>   </source>
>
> You included comments in the <source> element and that's wrong. Comments
> should be part of a <note> element attached to the <trans-unit> element.

Yep. My mistake. This option would handle the header just as a normal 
translation unit, as follows: (similar to the po-entry specs in the guide):

(indenting done for readability, white-space should be preserved)

<trans-unit id="message_header" approved="no">
  <source xml:space="preserve">
    Project-Id-Version: PACKAGE VERSION
    Report-Msgid-Bugs-To: 
    POT-Creation-Date: 2004-11-11 04:29+0900
    PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE
    Last-Translator: FULL NAME &lt;EMAIL at ADDRESS&gt;
    Language-Team: LANGUAGE &lt;LL at li.org&gt;
    MIME-Version: 1.0
    Content-Type: text/plain; charset=CHARSET
    Content-Transfer-Encoding: 8bit
  </source>
  <target xml:space="preserve">
    Project-Id-Version: MyPackage 1.0
    Report-Msgid-Bugs-To: foo at example.com
    POT-Creation-Date: 2004-11-11 04:29+0900
    PO-Revision-Date: 2005-02-01 12:00+0900
    Last-Translator: Foo Bar &lt;foo at example.com&gt;
    Language-Team: My Language &lt;LL at li.org&gt;
    MIME-Version: 1.0
    Content-Type: text/plain; charset=utf-8
    Content-Transfer-Encoding: 8bit
    Plural-Forms: nplurals=2; plural=n>1;
  </target>
  <note from="po-file">
    SOME DESCRIPTIVE TITLE.
    Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
    This file is distributed under the same license as the PACKAGE package.
    FIRST AUTHOR &lt;EMAIL at ADDRESS&gt;, YEAR.
  </note>
</trans-unit>

I think this is the most practical sollution in that it's easy for translators 
to modify the header (if they have knowledge of PO), and also makes life 
easier for developing filters. But I'm still not sure if it's a _legal_ way 
to do it according to the XLIFF Specification. The <trans-unit> element is 
meant for 'translatable data', and the PO header does not have any 
translatable data, only informative and technical meta-data. Further, it 
would cause some garbage in TM's, by having PO headers in the databases when 
automatically importing approved TUs. 

> You also included a flag inside the source (the line that states that
> the entry is a fuzzy one) and that should be considered as the value of
> the "approved" attribute of the <trans-unit>. If the message is set to
> fuzzy, then the <trans-unit> element should have the "approved"
> attribute set to "no". If the fuzzy flag is not present, then the
> message should be considered translated.

Yep. For indicating 'fuzzy', should we use the 'state' attribute of the 
<target> element set to 'needs-review-translation', or the 'approved' 
attribute of the <trans-unit> element set to 'no'? Also, if we use the 
'approved' attribute, should this be set to 'yes' for non-fuzzy messages?

If we use the 'state' attribute, it would also be possible to set this to 
'new' for messages with fuzzy and blank msgstr, indicating a new trans-unit, 
esp. useful when merging POT's and translated PO's.  

> If you put the header in a translation unit, you don't need to treat it
> in a special way.

Yeah, by having the header in a <trans-unit>, all possible elements of a PO 
file are covered, and even the need to extract anything to a skeleton is 
eliminated (but skeletons should be guide-independent anyway, so it's up to 
implementers if skeltons are used)

> > Implementations would need to take care when converting PO files
>
> though, as
>
> > the character set is specified in a header field. But filters should
>
> be able
>
> > to deal with that by a) imposing restrictions on character sets (e.g.
>
> utf-8
>
> > only), or b) parsing the charset field of the header and do the
>
> conversion to
>
> > the XML character set, or c) manually specify the PO character set on
> > conversion.
>
> Heartsome's tools assume that PO files are always in UTF-8 and that's
> the preset encoding. Nevertheless, the user is able to override the
> default value (I once found a PO file encoded in ISO-8859-1 in Fedora).

Yep, sounds like a good way to me. Maybe we should include a paragraph on 
character set handling in the guide, but not stating any requirements on how 
this is done?

> > Any comments?
>
> Two comments:
>
> 1) I will review the guide during the weekend. You can expect more
> comments on Monday.

Thanks for that, looking forward to a lot of constructive comments. Feel free 
to propose 'radical' changes and don't be shy in criticising it, as it's more 
important for us to get a specification we all can agree on :) 

> 2) I did not send the required info because our headquarters in
> Singapore were closed for Chinese New Year and my local office in
> Uruguay was also on holidays (carnival). I'm waiting for a definition on
> the license, which should be ready by Friday.

That's fine. 

cheers,
asgeir