[xliff-tools] Another question on PO and XLIFF

Asgeir Frimannsson asgeirf at redhat.com
Mon May 2 20:36:03 PDT 2005


Hi Rodolfo,

(A bit of "fighting" over this makes my day a lot more interesting, as I'm at 
home with a flu at the moment :) )

On Tue, 3 May 2005 11:47, Rodolfo M. Raya wrote:
> On Tue, 2005-05-03 at 11:13 +1000, Asgeir Frimannsson wrote:
>
> Hi,
>
> > I agree with you in that nr 3 is the most logical, and that is what I'm
> > currently using our filters.
> >
> > So the following example:
> > > msgid ""
> > > "Line 1\n line 2\n"
> > > "Line 3.\n"
> >
> > (which would be identical to:
> > msgid "Line 1\n line 2\nLine3.\n"
> > )
> >
> > becomes:
> >
> > <source xml:space='preserve'>Line1
> >  line2
> > Line3
> > </source>
>
> As said before, I think this is wrong, because you removed translatable
> text.
>
> The "\n" is part of the text. It is a sequence of two characters: '\'
> and 'n'.  It is not only an instruction for the program that will
> display the text on screen. The translator should be able to see these
> characters and move them wherever they fit.

"\n" is a sequence of two characters, yes I agree so far. But it is still only 
a representation of an escape-sequence. And this is also how they are 
represented internaly in gettext. In addition, Gettext ignores totally how 
the PO file is formatted (if it's on multiple lines, or a single line). Let's 
do a simple test:

I have the following Po file "test.po"

msgid "hello \n world"
msgstr ""

msgid "hello my "
""
""
"\n"
" beautiful world"
msgstr""

Let's now create a test program that tests what happens if Gettext loads this 
PO file into memory, and then saves it again as a copy "output.po" without 
modifying the file:

...
/* let's load the PO file into memory using the gettext api */
gettext_po_file = po_file_read("test.po", &gettext_error_handler);
/* let's simply save it again with a different name */
po_file_write (gettext_po_file, "output.po", &gettext_error_handler);
...

and then output.po:
msgid ""
"hello \n"
" world"
msgstr ""

msgid ""
"hello my \n"
" beautiful world"
msgstr ""

As you can see, gettext saves it using hardcoded formatting rules, and ignores 
the original formatting. Gettext simply concatenates what's between the quote 
characters, and puts every line containing a newline character on a separate 
line (for translator readability)

Representing this in XLIFF by replacing THE TWO CHARACTERS '\' and 'n' with a 
real newline character on conversion, and similarly replacing the real 
newline character with "\n" on back-conversion would be a just as valid 
approach.

In fact, if  I were to use your approach here, I would have to manually 
replace all real newline characters with "\\n" before converting to XLIFF, as 
the gettext API handles "\n" as real newline characters internally (and yes, 
I'm using the gettext api for parsing/reading/writing PO files in my 
filters).

>
> > It would be interesting to see how this is solved in the Java .properties
> > guide, as it is basically the same problem:
> > string_1=This is line 1\nThis is line 2\n This is line 3\n
>
> The filters I wrote for Java .properties also treat "\n" as part of the
> text. Other translation tools that I've seen for .properties do the
> same.

Yes, this is also the case for the Okapi (COM based) filter.

> > Going back to the three approaches you mentioned: I don't see a problem
> > with using the 1st approach either:
> > <source xml:space='preserve'>
> > Line 1\n line 2\n
> > Line 3.\n</source>
> >
> >
> >
> > ..But I don't see any benefits of using this approach, other than the
> > windows/unix line-ending issue. In fact, I think it's less garbage in the
> > TM if we remove the \n characters.
>
> The sequence of two characters that form "\n" are important in the TM
> database. Their presence/absence mark the difference between exact and
> fuzzy matches.

Yes, but they will still confuse the TM:

msgid "hello \n world"

would produce the string "hello \n world":
<source xml:space='preserve'>hello \n world</source>

msgid ""
"hello \n "
"world"

would in XLIFFF (and then TMX - if no further segmentation is used) produce 
the string "{\n}hello \n {\n} world" where {\n} is the real newline 
character:
<source xml:space='preserve'>
hello \n 
world</source>

(The developer intended the string "hello \n world" which would at runtime 
produce "hello {\n} world")

In practice, these two translation units should be identical and a 100% match, 
but not so using this approach.

> > For XLIFF editors, if a translator wants to
> > see the newline characters, he/she can always turn on the 'view
> > formatting' option, visually displaying newline characters and word-wrap
> > lines etc..
>
> I think that there is a confusion here. An XLIFF editor displays the
> content of <source> and <target> elements. Those elements contain text
> and in-line elements. Sometimes inline elements can be represented in a
> simpler or abbreviated form and when you enable "view formatting" or
> similar options,  what you see is the content of the inline elements.

Sorry, I should have been more clear. I meant 'view formatting' as in similar 
to what you do in word processors or KBabel, which would display a visual 
indicator on newline characters, tabs and spaces. Heartsome does not have 
this function in my version (~6mths old)

> An XLIFF editor is not supposed to consider the sequence "\n" as
> "formatting". That is something that doesn't exist. In a <source> or
> <target> element you have translatable text or inline elements; that's
> it. If the pair of characters that form "\n" is part of the translatable
> then it will always be visible. if not, then nothing will make it appear
> from the air.
>
> And before you say that it depends on the editor, let me remind you that
> an XLIFF editor should handle XLIFF files originated anywhere and from
> any source format. So, you should not expect that an XLIFF editor
> magically display a "\n" because the file was generated with a filter
> that decided to remove two characters from the translatable text.

I don't want the XLIFF editor to display a '\n', i just want it to add a 
newline character where there is a newline in the source, so:
msgid "hello \n world"
becomes 
<source xml:space='preserve'>hello
world</source>

and would display in a editor:
hello
world

...maybe with a nifty nice <enter> arrow after 'hello' if 'view formatting' is 
turned on.

cheers,
asgeir


More information about the xliff-tools mailing list