[xliff-tools] comments about Representation Guide - *-format

Asgeir Frimannsson asgeirf at redhat.com
Mon Oct 17 05:56:32 PDT 2005


Hi Bruno,

On Monday 10 October 2005 22:12, Bruno Haible wrote:
> In the process of putting a PO -> XLIFF converter into the GNU gettext
> tools, I have a few issues with the Representation Guide (labelled
> Working Draft 7 June 2005). Please tell me if they have already been
> resolved - I've been unable to follow all the discussions on this list.

This work is moved over to the XLIFF Technical Committee. The latest (final?) 
draft is available at 
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff (Documents-> 
xliff-profile-po). The document is, however, not very different to the one 
mentioned above. 

> Here is the first, blocking one.
>
>
> Section 5.6.3. X-format
>
> The proposed approach of using <ph> elements to shield the translator
> from the complexities of format strings is interesting. However, there are
> issues:
>
> a) What is the expected labelling of a format directive that doesn't
> take an argument? Example:
>
>    #, c-format
>    msgid "Increased by 0.7%%."
>    msgstr "Gestiegen um 7 o/oo."
>
> Such a format directive can be omitted or added by the translator.

Ideally (IMO), the string should be normalized, and the "escape sequence" "%%" 
should be represented as the unicode characther '%' in XLIFF (and 
back-converted as appropriate). But I understand this solution is not trivial.

There are some discussions on the XLIFF TC list on format directives:
http://lists.oasis-open.org/archives/xliff/200509/msg00021.html
(see http://lists.oasis-open.org/archives/xliff/200509/threads.html for the 
full thread). They are talking about the complication it causes in Java/.NET - 
but PO is much worse, as you have tens of different possible prog.languages 
and ways of doing it, so it is indeed complex!

> b) What is the expected labelling of a format directive that takes
> several arguments? Example:
>
>    #, c-format
>    msgid "File name is %.*s, number is %d."
>    msgstr "Le %d-ième fichier est %.*s."
>
> The "%.*s" directive consumes two arguments: an integer and a string
> argument. How should it be labelled?

It could still use the same labeling scheme as if was a single-argument 
directive - if you're able to implement the logic in the xliff converter. If 
not, I would suggest not abstracting the format directives at all. (In fact, I 
don't even know how to do reordering of the above example to make the French 
example not segfault!). 

> c) How to extend this principle to Lisp and Scheme format strings?
> Recall that these format strings can contain loops. For example,
> in order to output a list of strings, in C the programmer has to write
> code that dispatches among
>
>      "Written in %d by %s."
>      "Written in %d by %s and %s."
>      "Written in %d by %s, %s and %s."
>      "Written in %d by %s, %s, %s and %s."
>      "Written in %d by %s, %s, %s, %s and %s."
>
> whereas in Lisp a single format string does it all, for any number of
> arguments:
>
>      "Written in ~D by ~{~A~^~*~#[ and ~:;, ~]~:*~}."
>
> The format string represents a loop. The translator cannot simply
> translate the text pieces and permute the rest. In cases like this,
> the translator must also be able to change the control logic in the
> format string.
>
> And of course the ~D and the ~A are not permutable: the format string
> expects one number and a list of strings, and if a translator would
> write   "Ecrit par ~A en ~{~D~^~*~#[ and ~:;, ~]~:*~}."
> then it would expect a single string and a list of numbers.

Any more abstraction than the raw string is currently unsupported in XLIFF. 
This issue is similar to that of translation of Voice XML, which has been 
discussed in the past on the XLIFF TC list - and there is no good solution for 
handling this in the present XLIFF specification.

To summarise, abstraction of formatting directives is not trivial, 
except maybe for sources from Qt/KDE (%1, %2 - style) and Java. I think it 
would be overkill to add a complete section to the repr. guide for handling 
format directives in all gettext supported languages - at least until the 
XLIFF TC decides on some conventions of some sorts for this.

cheers,
asgeir


More information about the xliff-tools mailing list