[xliff-tools] Another question on PO and XLIFF

Tue May 3 07:08:39 PDT 2005

>> I would expect the XLIFF content to have the text as close as possible 
>> to the real output
>
> Where do you draw the line between the text that was presented for 
> translation originally, and any rendering that gets done on the 
> translated text further downstream ? (that's the problem/thought-process 
> I'm wrestling with - and haven't quite worked out yet)
>
> eg. "This is \n text" vs.
>    "This is &my-own-defined-newline; text"
>    "This is ${0} text"
>    "This is $${${}$}}$} text" (using an absurd example, probably from OOo, eh David ;-)

I'm with you: we need to stop at some point. But there is a big difference between \n and ${0} (and others). The first is an escaped
character, the latter are not escapes but some kind of place-holder mechanism more complex.

The \n notation happened to be there because the C syntax has to use an escape character for \u000D in that context, but that is the
only reason.

An example: In a properties file you would use \u0020 to indicate a leading space (because the string is not quoted), but when the
spaces are within the string or trailing you don't bother with the escape. Would you expect the XLIFF output for this:

Myid=\u0020Some text

To be:

<source xml:space='preserve'><ph id='1'>\u0020</ph>Some text</source>

Or this:

<source xml:space='preserve'> Some text</source> 

Probably the first, right?

I guess, I'm trying to say that replacing escaped characters by their real value when possible seems reasonnable. Dealing with the
${0} kind is different.

>>  to make it easier on the translators. They will be much happier with 
>> this:
>
> Right, but if we could do that in the editor (rendering the string 
> cleverly for translation), then we could still preserve the difference 
> between two different strings :

But we don't use XLIFF only with editors. By that I mean that the text stored in XLIFF can be used to do many other things where no
editor is involved: term extraction, spell-checking, machine translation, terminology check, tag verification, etc. Another thing is
that XLIFF file can be translated in XML (not XLIFF) editors too.

There is probably a need for some standardization before you get to the editor, when the text is extracted.

As Josep noted earlier: this is a three-tiers system, XLIFF is the middle tier and part (if not most) of the 'standardization' could
probably be done at that level.

> I agree it's a tough one to call.

Me too :)

I do understand very well the that most (if not all) tools (including mine :) keep the '\n' at present.

But maybe we need to question ourselves from time to time to adapt to the new environments available. XML is an environment where
line-breaks can be preserved without escaping them. Why then escaping them? Especially if it's a benefit to the translator.

I had hopped XLIFF Representation Guides for software formats would start to address those issues.

Oh well, at least now the topic is being discussed and thought about. Maybe this will bear fruits in a few years. Standardization is
a long journey :)

Kenavo,
-yves