[xliff-tools] Another question on PO and XLIFF

Tue May 3 05:45:33 PDT 2005

Hi Rodolfo,

On Tue, 3 May 2005 20:45, Rodolfo M. Raya wrote:
> On Tue, 2005-05-03 at 13:36 +1000, Asgeir Frimannsson wrote:
>
> Hii,
>
> > > The "\n" is part of the text. It is a sequence of two characters: '\'
> > > and 'n'.  It is not only an instruction for the program that will
> > > display the text on screen. The translator should be able to see these
> > > characters and move them wherever they fit.
> >
> > "\n" is a sequence of two characters, yes I agree so far. But it is still
> > only a representation of an escape-sequence. And this is also how they
> > are represented internaly in gettext. In addition, Gettext ignores
> > totally how the PO file is formatted (if it's on multiple lines, or a
> > single line). Let's do a simple test:
>
> I see that you base everything on Gettext API. Isn't it too dangerous to
> assume that all files were originated in C programs?

We're basing the guide on GNU Gettext - and the tools provided by GNU Gettext. 
Not any 3rd party ad-hoc extraction tools out there (there are plenty).  With 
the Gettext API I mean the one in use by gettext when extracting from c, c++, 
php, sh, python and other formats the toolkit supports.

> Some PO files are generated from XML documents. If at reverse conversion
> you add the sequence "\n" whenever you find a linefeed, the result will
> be a mess.

PO files generated from XML documents are excluded - they are plain simply 
evil. Not that I don't respect the authors of these extraction tools - or 
saying that they are not serving a beneficial purpose in todays localisation, 
but in converting to XLIFF, these formats would benefit from custom XLIFF 
guides (e.g. docbook). Let us keep to PO for Software Message catalogs in 
this discussion.

> > Representing this in XLIFF by replacing THE TWO CHARACTERS '\' and 'n'
> > with a real newline character on conversion, and similarly replacing the
> > real newline character with "\n" on back-conversion would be a just as
> > valid approach.
>
> What about PO files originated from PHP? Is it still correct to replace
> a real newline character with "\n" on back conversion? And what about
> Python? XML? Any format?

Yep, any format that Gettext supports as far as I'm aware of.

> > In fact, if  I were to use your approach here, I would have to manually
> > replace all real newline characters with "\\n" before converting to
> > XLIFF, as the gettext API handles "\n" as real newline characters
> > internally (and yes, I'm using the gettext api for
> > parsing/reading/writing PO files in my filters).
>
> You don't have to convert real newlines to "\\n". Simply write a newline
> character in the <source> or <target> element.

Gettext handles the string "hello \n world" internally as "hello \u000d 
world". I would put this string in the source or target element, but to 
follow your approach, I would here have to replace \u000d with "\n".

> > I don't want the XLIFF editor to display a '\n', i just want it to add a
> > newline character where there is a newline in the source, so:
> > msgid "hello \n world"
> > becomes
> > <source xml:space='preserve'>hello
> > world</source>
> >
> > and would display in a editor:
> > hello
> > world
>
> As the attribute xml:space is set to "preserve", XLIFF editors display
> the text as you sketched above.

Yes, and that's what I want, but using your approach, it would display:
hello\n
world

which is one more \ and n for my liking :)

> BTW, it is better to set the xml:space attribute in the <trans-unit>
> element and let the scope rules cover the <target> and <alt-trans>
> children.

Ah, good point. thanks for that tip :)

> > ...maybe with a nifty nice <enter> arrow after 'hello' if 'view
> > formatting' is turned on.
>
> Ahh, that's decoration.

:)

cheers,
asgeir