[xliff-tools] xml to xliff
Josep Condal
pcondal at apsic.com
Mon Apr 4 04:31:59 PDT 2005
Hi Tim,
In a Translation Memory system, it is essential that the segments are
units full in meaning. When you increase the granularity, you improve
the leverage, but if (accidentally) the segmenter makes a mistake
(possibly because of miscoded source) with a high granularity, some
segments are not full in meaning by theirselves.
Paragraph level segmentation is more conservative, sentence level is
more risky.
To get an idea of what I mean, if you make a segmentation at character
level of the Bible, you get a nomimal word count with 26 words and a few
tens of millions nominal repetitions. If you negotiate the repetitions
value aggresively, you may get a good price as there are only 26 new
words ;)
Even when it looks as a kludge, it is still technically feasible to
translate the bible at character level, as you can select the next
character from a list of proposals of 0 or more characters and add a new
segment if nothing in the list is acceptable. You can even align an
already translated Bible.
The best segmentation level for most target languages with structures
close to English is probably sentence-level but since it is at the
boundary of a meaning unit, it is more dangerous for machine-only
handling. For example, if the writer of the original text puts a not
known abbreviation, the segmenter may break the segment in the middle of
the segment and meaning unicity of segment is lost. Samething similar
happens if a typo changes a comma to a full stop. However, context-aware
leverages can overcome this, but it is not always implemented or well
implemented by the CAT publishers.
That said, However, in most projects (English->Spanish) I would select
as default sentence level segmentation
Josep.
Josep Condal
Managing Director
ApSIC S.L.
---
Caballero, 76 4-3
08029 Barcelona
Spain
T: +34 93 405 11 00
F: +34 93 430 81 77
@: pcondal at apsic.com
---
-----Mensaje original-----
De: xliff-tools-bounces at lists.freedesktop.org
[mailto:xliff-tools-bounces at lists.freedesktop.org] En nombre de Tim
Foster
Enviado el: lunes, 04 de abril de 2005 13:08
Para: cobaco (aka Bart Cornelis)
CC: xliff-tools at lists.freedesktop.org
Asunto: Re: [xliff-tools] xml to xliff
Hi cobaco,
On Mon, 2005-04-04 at 11:32, cobaco (aka Bart Cornelis) wrote:
> On Monday 04 April 2005 09:59, Tim Foster wrote:
> > a segment/msgid out of each paragraph, vs. ours that create a
> > segment/msgid our of every sentence.
> hm, I'm not at all sure that's a good idea:
> the smaller the granularity of the to-be-translated bits, the harder
> non-literal translation becomes, and especially for documents that can
> make a large difference in the quality of the translation.
Define "quality" (only joking!)
- but seriously, we've been using sentence-level segmentation at Sun
for all of our docs material for the past 3 years (with our internal
tools, no idea what the translation vendors we were using before were
using wrt. paragraph vs. sentence segmentation) and have found that it's
really not a problem. Linguistic reviewers have been generally happy
with the quality of Sun documentation.
Now, I suspect that part of this could be due to the excellent technical
writers we have and some style-checking tools which are used in the
authoring process to catch sentence-structures that may be difficult to
translate.
Along with that, since the translators are always shown sentences in
their correct context wrt. other sentences in the paragraph, and can
choose multiple different translations for the same sentence (based on
the book name, product name, part number and other attributes) this
seems to work okay. Of course, only allowing one possible translation
per source sentence would result in a very poor quality translation : we
don't do that.
I'm not a translator or a linguist, so I can't argue the finer points of
this, except to say that we haven't found it to be a limiting factor at
all and customers haven't been complaining about our translation
quality.
(docs.sun.com I think has some translated books, if you want to check
them out )
cheers,
tim
--
Tim Foster - Tools Engineer, Software Globalisation
http://sunweb.ireland/~timf http://blogs.sun.com/timf
http://www.netsoc.ucd.ie/~timf
_______________________________________________
xliff-tools mailing list
xliff-tools at lists.freedesktop.org
http://lists.freedesktop.org/cgi-bin/mailman/listinfo/xliff-tools
More information about the xliff-tools
mailing list