[xliff-tools] The Fuzzy Flag

Wed Feb 16 08:49:33 PST 2005

Rodolfo M. Raya wrote:
> > This flag also occurs as side effect from msgmerge. I would therefore
> > not use the <target> element in this case, and instead give
> > <alt-trans><source>...</source> <target>...</target></alt-trans>
>
> There is a subtle problem with this approach.
>
> The <source> element of the <trans-unit>  and the <source> inside the
> <alt-trans> elements will have exactly the same content as both are
> extracted from "msgid". This implicitly means a 100% match.
>
> XLIFF editors need to measure the differences between the <source> in
> <trans-unit> and the <source> included in an <alt-trans> element. The
> difference is used for sorting the different <alt-trans> elements
> according to relevance and to automatically accept good matches
> extracted from TM database.

If your tools contains a logic saying

    If an <alt-trans> is a 100% match and there is no <target> in the
    containing <trans-unit>, promote the <alt-trans>'s <target> to
    the containing <trans-unit>.

then it will not work reliably. The <alt-trans> can come from different
sources: translation memory pertaining to the same project and translator,
translation memory coming from different project or different translators,
automatic translation attempts, dictionary lookups. There needs to be an
indicator for the level of trust that an <alt-trans> can have, so that
the rule mentioned above is only executed when the origin of the
<alt-trans> is trusted. An <alt-trans> attribute like 'origin' should do
it.

The po2xliff converter should then set this 'origin' value to an untrusted
one that prevents automatic insertion of <target>.

For the non-automatic case, where an explicit translator action is needed,
we have the choice between putting the fuzzy translation into the <target>
and label it with a certain state, or putting it into an <alt-trans>.

Putting it into the <target> is not so good, because
  - the "state" is something related to the workflow between translators,
    QA, etc.
  - the translator is more tempted to say "OK" to a wrong fuzzy translation
    when it is presented as <target> than when it is presented as
    <alt-trans><target>.

Therefore I still think <alt-trans> is the right way to map these.

---

Another observation is that this will only matter during the migration
phase from PO to XLIFF. After the first translation in XLIFF format is
done, this issue will not be relevant any more. Why?

With the gettext tools, the fuzzy mark appears in four situations:

 1) When the programmer has changed an English message a little bit,
    msgmerge most often finds the previous translation of the old message,
    and attaches it with "fuzzy".

 2) During this msgmerge process, new English messages are often also
    combined with a translation of an unrelated message, and marked fuzzy.

 3) The translator may decide that s/he has produced only a half-finished
    translation, and mark it fuzzy.

 4) When the translator has produced an syntactically invalid translation for
    a message tagged with 'c-format', msgmerge will notice it and mark the
    message as fuzzy.

Now let's look at the workflow when XLIFF is used. In this case it is
likely that all translation memory functionality is performed on the
XLIFF side, because - as Josep Condal noted - msgmerge's functionality
on the PO side is less reliable.

Therefore when a PO file is regenerated from an updated .POT template or
updated translations (converted from XLIFF), msgmerge will usually be
called with option --no-fuzzy-matching, which will disable the cases 1)
and 2) above. Case 3) does not occur any more either: When a translator
wants to withdraw a translation temporarily, s/he can do it in such a
way that the xliff2po converter will drop the translation; but the
translation will still go into the translation memory. And case 4) is
very rare.

Bruno