[xliff-tools] XLIFF representation

Fri Apr 29 08:26:43 PDT 2005

Hi Asgeir, Hi Rodolfo,

Thank you both for the answers.
I understand we can't change habits/behaviors easily :) and I can see some of the advantages of using the source string.

I guess I'll simply have our PO filter generates resname based on msgid only if an option is set. And have that option set to false
by default. I still want to provide a way to make use of msgid in more localization-useful manner for people who want to. I can't
see a lot of issues by having _("HELLO_WORD") in the source code.

Thanks again,
-yves

-----Original Message-----
From: Asgeir Frimannsson [mailto:asgeirf at redhat.com] 
Sent: Tuesday, April 26, 2005 4:36 PM
To: Yves Savourel
Cc: xliff-tools at lists.freedesktop.org
Subject: Re: [xliff-tools] XLIFF representation

Hi Yves,

Yves Savourel wrote:

> Hi,
>
> I was looking at the XLIFF PO Guide Draft 2
> (_http://xliff-tools.freedesktop.org/wiki/Projects_2fXliffPoGuideDraft
> 2_) which is, I think, the latest draft I can access, and I had a 
> question:
>
> I noticed that <trans-unit> have an id but no resname. It seems that 
> it would be reasonnable for a software file format to have unique ID, 
> and 'msgid' seems to be capable of doing this. I realize that msgid is 
> really used for the source text, and that leads to make it in pratice 
> not really usable for resname. Many localization tools rely on ID to 
> do things like leveraging, updates, or alignment. It would be nice to 
> have a solution for resname. (One cannot use id as it's just a 
> sequential number).
>
> I guess my question goes a little further and touched on the usage of 
> msgid itself. Wouldn't be more efficent from a localization viewpoint 
> to recommend using unique IDs there instead of the source string? That 
> would also follow the concept of treating the source language as "just 
> another language".
>
The problem with radically changing Gettext (or rather how you use
gettext) is that we're changing the way (ten-)thousands of developers work. Developers want minimal effort with implementing
localisation support, hence all they really need to do at preesent is change strings from "hello world" to _("hello world"). This
approach is favourable because:
1) It's easier to read through code as you have the original string messages and not some more or less cryptic string ID.
2) No external resource files are needed to run the application in its original language (sadly by GNU standards American English, -
Should have been Norwegian)
3) No tool-support is needed to manage string table ids.

The main disadvantages are:
1) No way of having same message with different contexts within the same gettext domain (not without using 'hacks' anyway)
2) As you say, no way of really uniquely identifying a translation unit ( especially hard when changing spelling mistakes etc in the
original string - as you need fuzzy matching to identify the old string in the string table)
3) Developers are locked in to using American English (or at least a Germanic language - as Gettext natively only supports Germanic
plural forms).

What could be done is to use a hash of the orignal string as the resname attribute in XLIFF, and in this way uniquely identifying
the string within the file (as gettext can't have two identical strings within the same domain).

As Rodolfo mentioned, we're not aiming at changing Gettext - or the way developers use gettext. But what's really interesting here
is that when we eventually start using XLIFF in favour of PO, we have eliminated the dependency on Gettext in the
development/localisation process. Hence, we can then start customizing the way gettext works - or even use other toolkits like ICU,
without breaking anything in the localisation process (keeping translators happy).

> Just a thought.
>
Really appreciate your input Yves!

cheers,
asgeir