[xliff-tools] XLIFF representation

Asgeir Frimannsson asgeirf at redhat.com
Tue Apr 26 15:35:46 PDT 2005


Hi Yves,

Yves Savourel wrote:

> Hi,
>
> I was looking at the XLIFF PO Guide Draft 2 
> (_http://xliff-tools.freedesktop.org/wiki/Projects_2fXliffPoGuideDraft2_) 
> which is, I think, the latest draft I can access, and I had a question:
>
> I noticed that <trans-unit> have an id but no resname. It seems that 
> it would be reasonnable for a software file format to have unique ID, 
> and 'msgid' seems to be capable of doing this. I realize that msgid is 
> really used for the source text, and that leads to make it in pratice 
> not really usable for resname. Many localization tools rely on ID to 
> do things like leveraging, updates, or alignment. It would be nice to 
> have a solution for resname. (One cannot use id as it's just a 
> sequential number).
>
> I guess my question goes a little further and touched on the usage of 
> msgid itself. Wouldn't be more efficent from a localization viewpoint 
> to recommend using unique IDs there instead of the source string? That 
> would also follow the concept of treating the source language as "just 
> another language".
>
The problem with radically changing Gettext (or rather how you use 
gettext) is that we're changing the way (ten-)thousands of developers 
work. Developers want minimal effort with implementing localisation 
support, hence all they really need to do at preesent is change strings 
from "hello world" to _("hello world"). This approach is favourable because:
1) It's easier to read through code as you have the original string 
messages and not some more or less cryptic string ID.
2) No external resource files are needed to run the application in its 
original language (sadly by GNU standards American English, - Should 
have been Norwegian)
3) No tool-support is needed to manage string table ids.

The main disadvantages are:
1) No way of having same message with different contexts within the same 
gettext domain (not without using 'hacks' anyway)
2) As you say, no way of really uniquely identifying a translation unit 
( especially hard when changing spelling mistakes etc in the original 
string - as you need fuzzy matching to identify the old string in the 
string table)
3) Developers are locked in to using American English (or at least a 
Germanic language - as Gettext natively only supports Germanic plural 
forms).

What could be done is to use a hash of the orignal string as the resname 
attribute in XLIFF, and in this way uniquely identifying the string 
within the file (as gettext can't have two identical strings within the 
same domain).

As Rodolfo mentioned, we're not aiming at changing Gettext - or the way 
developers use gettext. But what's really interesting here is that when 
we eventually start using XLIFF in favour of PO, we have eliminated the 
dependency on Gettext in the development/localisation process. Hence, we 
can then start customizing the way gettext works - or even use other 
toolkits like ICU, without breaking anything in the localisation process 
(keeping translators happy).

> Just a thought.
>
Really appreciate your input Yves!

cheers,
asgeir


More information about the xliff-tools mailing list