[xliff-tools] XLIFF usage

Mon Apr 18 12:57:42 PDT 2005

Hey folks

Sorry for not chiming in on this thread up until now, was in Prague all
last week meeting up with the other translation tools developers at Sun
at the moment chatting about (among other things) the upcoming open
sourcing of our translation editor and xliff filters, so I guess this
email is timely ;-)

Most of the reasons "why xliff" have been fairly well covered so far, so
I'm not sure it's worth going over that stuff again. However, we do have
quite a bit of production experience with using XLIFF in our translation
workflow here at Sun, so I thought I'd mention some of that.

We don't see XLIFF as being a replacement for any native file format,
instead it's one that allows us to simplify our tools development : if
XLIFF didn't exist, we'd probably have come up with our own internal
format anyway, just that it wouldn't have been standard.

In particular, when dealing with new translators that didn't have
experience of some of the more complex formats (like Docbook) it would
take a while before they were familiar enough with it to translate it
efficiently.

XLIFF was the great leveller : by hiding the complexity of the source
file format from the translator, we're getting two benefits :

* translators don't need to learn new file formats
* translators don't get the opportunity to screw up the file(!)
 (well, mostly)

Bits of XLIFF that we haven't yet taken advantage of yet, are some of
the aspects about using multiple tools to "add value" to the XLIFF file
as it progresses through the translation process. For example, we don't
yet do term-mining on the source segments, nor suggest glossary matches
for any terms found and marked with <mrk> tags. Also, right now, we only
have one translation system adding translation matches to the XLIFF
file, but there's also the possibility of using other TM systems to add
additional alt-trans elements, which is something that you just can't do
in PO (or at least, you can't do it with style).

XLIFF isn't the silver bullet, but it seems to be working pretty well
for us so far. The "one format to rule them all" is working quite
nicely, in that whenever a product development team comes up to us who
perhaps haven't been familiar with standard i18n formats and have ended
up rolling their own, we're able to point to a standards document from a
respected standards body and say to them : "Give us that, please". Of
course, if the base team aren't up to it, most of the time, we can roll
our own filters without too much trouble, but it's always nice when
others do the hard work ;-) 

Now, as regards cost savings and speed-up figures that Steve was asking
for, I don't have those at the moment, but since we really started using
TM systems internally at the same time as adopting XLIFF, I'd imagine
they're pretty large. Of course our translation vendors were always
using TM systems, just that we weren't controlling the reigns, so to
speak.

Hope this is of some interest (from a production point of view)

	cheers,
			tim

Tue, 2005-04-12 at 15:13, Stephen Holmes wrote: 
> Thanks Josep,
> 
> << Apologies for not in-lining my response, the mail is getting quite
> large!>> 
> 
> So would it be fair of me to distill this down to say that the thrust of
> the XLIFF deployment in the context of this Freedesktop.org effort is to
> push it as a native resource file format to replace/consolidate 1..n
> existing formats?  In this specific domain to migrate from PO to XLIFF?
> 
> Is this sensible?  
> 
> What about XAML, XUL, Glade and others that have been specifically
> crafted for this purpose?  Microsoft, for example, have gone to great
> lengths to enhance XML markup for application specific purposes in their
> Longhorn technologies (avalon etc).  This will impact Linux through the
> mono technology.  I know from my own experience in localisation business
> that even something like a "standard" Windows RC specification to have a
> large number of interpretations - this is why we moved to binary
> localisation because RC content is largely unmanageable - custom
> resource formats etc (binary simplifies but doesn't remove all of the
> complexities). 
> 
> If you think about it, from a tools perspective we already have stable
> and proven parsers for most of the common formats in use today - these
> formats are largely stable.  New formats are being intro'd all of the
> time but couldn't one simply parse them into a common framework (based
> on a standard API) rather than complicate the process with an XLIFF
> interim?  I say this, because I simply can't see XLIFF adopt the
> characteristics of a Impress Slide, a Macromedia SWF, or a streaming
> audio clip - let alone media rich application markup languages.
> 
> In my opinion, native XLIFF is way too generalised and too heavy for
> native application resourcing purposes.  At my company, we were
> compelled (architecturally) to move to a variant that could be used  as
> a native resource markup (something I'd describe as XliffLite), but even
> this requires a transformer from Xliff to XliffLite and back within the
> localisation cycle (not to mention the tailoring of the OASIS DTD - very
> bad behaviour, I know!).
> 
> The problem, as I see it, is less about the file format and more about
> the API that binds a given specification (getNextTransUnit etc).  I do
> of course see the advantages of the "One format to rule them all", but
> even within GNU/Linux we
> have .desktop, .po, .glade, .jpg, .svg, .png, .resx (through mono) - all
> formats designed to perform a specific purpose (sometimes legal,
> sometimes technological)
> 
> I fear that in treating XLIFF as a silver bullet that consolidates
> resource formats that we will ultimately commit it to an untimely
> demise. 
> 
> I would still love to hear about practical applications and case studies
> involving XLIFF. I'm extremely keen to understand from a deployment
> perspective how the transformation to XLIFF impacted on quality, speed,
> flexibility, cost and dependency issues before and after the effort. 
> 
> Oh dear, I do sound terribly cynical don't I? It's not really
> intentional!
> 
> 
> On Tue, 2005-04-12 at 15:14 +0200, Josep Condal wrote:
> > Hi Steve,
> > 
> > > Steve is fine (only my mum calls me Stephen!)
> > 
> > Sorry about my strange message, due to - hopefully temporary - mental
> > turbulences. For some reason I had arrived to the conclusion that Steve
> > and Stephen were two completely different names and all of a sudden I
> > had a strong urge to apologize. :)
> > 
> > >>>> START OF QUOTE
> > So here's my dilemma.  We currently have tools  that extract
> > translatable content from whichever file formats we're  interested in.
> > So, I take a PO for example, it gets parsed (LEX/YACC) and then
> > translated within a tools environment.  The complexities of the
> > meta-data schema are hidden (as it should be) and the parser also comes
> > with a generator to create the translated output format - end to end
> > process.  The database format might proprietary or represented in XLIFF
> > but the interface to this data would be open.  
> > 
> > I've understood your comments correctly, then in the "new world order",
> > I'd do this....
> > 
> > [ Develop ]-->Src PO-->XLIFF-->[ Translation ]--> Tgt PO-->[ Test ]
> > 
> > But in the current model it would be...
> > 
> >    [ Develop ]-->Src PO-->[ Translation ]-->Tgt PO-->[ Test ]
> > 
> > So now, I have to create a PO parser AND a forward and reversion XLIFF
> > to PO transformer.
> > 
> > Seems like I have to do more work.  I do however see the benefit in
> > having, say, an Alchemy Catalyst dump it's TTK format to XLIFF so I
> > could use the same translated content with gTranslator/KBabel or
> > whatever other GUI tool made use of XLIFF repository information.
> > 
> > >>>> END OF QUOTE
> > 
> > The idea would be that XLIFF is the preferred representation of any
> > given home format, and the closer it appears to the real source (for
> > example if gettext generated XLIFF instead of PO and back), the most
> > sense it makes (as your diagram above reflects).
> > 
> > >>>> START OF QUOTE
> > 
> > Understood, absolutely, but isn't the value in XLIFF really to do with
> > helping me move my translated assets (with their meta data) between
> > systems rather than create yet another file format?  The benefit for me
> > is a loss-less transfer when  tooling technologies become available that
> > afford me a higher % and quality of leverage.
> > 
> > >>>> END OF QUOTE
> > 
> > XLIFF is per definition an exchange format, but in practical scenarios I
> > personally see most of the opportunity into making the subject format
> > more stable as a starting point for development of tools and therefore
> > be able to adjust the chosen tool to the characteristics of the project.
> > 
> > In an scenario where the roundtrip filters are essentially available and
> > the best ones prevail (with the help of representation guides as
> > official guidance), the tool developers can focus in the features of
> > their software to the user rather in solving the same problem of
> > interpreting the new format in town.  
> > 
> > I see less opportunity, but still an option at Alchemy's choice, in the
> > Alchemy Catalyst scenario that you mention, because the TTK format is a
> > little bit away from the home format.
> > 
> > In other words, an .RC file (home format) is willing to be translated so
> > XLIFF can help build a process behind that, while a TTK file is probably
> > willing to be translated with Catalyst so XLIFF could be not a practical
> > option here. Let's say it depends largely on Alchemy willingness to do
> > so, even if you decide to reverse-engineer the TTK file and build a
> > filter for it.
> > 
> > Also XLIFF allows for exchange. While I see a relatively high risk of
> > trouble by exchanging without extensive interoperability tests, what is
> > clear is that many quality-oriented features (LINT-like stuff but
> > linguistic, for example) can be built into into a more stable format
> > such as XLIFF.  This way, sometimes you buy, sometimes you make.
> > 
> > >>>>
> > My view would be that a good tools environment abstracts this complexity
> > anyway.  I certainly wouldn't send raw XLIFF out because I'd then have
> > to add LINT checks to the inbound materials.
> > >>>>
> > 
> > Yes, I meant actually that the opportunity of making the format more
> > stable, allows for interesting tools to be developed on it more easily
> > than if the project implies complex home formats. Some of the tools in
> > the market are very good at what they do and may fulfill all your needs.
> > (Or apparent needs as perceived by you. For example, I hadn't needed the
> > "concordance" function for a few years (before 1998 more or less)
> > because I had used always CAT tools without it and life continued
> > without trouble, but when I discovered it, I saw that the value it
> > brought was enormous and cannot imagine a single translation process
> > without it playing a important role.  That's why we developed ApSIC
> > Xbench, to be able to bring concordance to the system level rather than
> > at the application level).
> > 
> > Regards,
> > 
> > Josep.
> 
> ______________________________________________________________________
> _______________________________________________
> xliff-tools mailing list
> xliff-tools at lists.freedesktop.org
> http://lists.freedesktop.org/cgi-bin/mailman/listinfo/xliff-tools
-- 
Tim Foster - Tools Engineer, Software Globalisation
http://sunweb.ireland/~timf http://blogs.sun.com/timf
http://www.netsoc.ucd.ie/~timf