[AppStream] The AppStream XML locale apocalypse
Richard Hughes
hughsient at gmail.com
Fri Apr 7 08:36:54 UTC 2023
Counter proposal: just continue to use the underscore locale variant and
save a lot of heartache...? R
On Thu, 6 Apr 2023, 17:15 Matthias Klumpp, <matthias at tenstral.net> wrote:
> Hi!
>
> While investigating why zh_TW/zh_CN translations were not showing up
> in AppStream-based software centers, we found out that this was due to
> the locale being listed as "zh-TW" in the XML. This was first noticed
> at KDE, which was going to edit their tools to use POSIX locale in the
> XML instead[1].
>
> I looked into the matter and found out that when using XML, the
> contents of the xml:lang tag are not arbitrary or UNIX locale though,
> but need to follow the IETF BCP47 specification.
> So by assuming POSIX locale, AppStream was doing the wrong thing for a
> really long time!
>
> We only didn't notice this so far for two reasons: One, AppStream's
> locale matching is quite good and will fall back to country codes if a
> dedicated translation wasn't found, which coincidentally is where
> POSIX and BCP47 locale are pretty much identical. So the issue wasn't
> as noticeable unless you were from a language depending on the
> territory specifier.
> Secondly though, GNOME's tooling is also wrong in many cases and seems
> to be using POSIX locale while BCP47 locale should be used.
>
> I thought quite a while about this, and I think the best worst thing
> to do here is to make AppStream use BCP47 locale, with hopefully not
> too much breakage..
> I hate making this change (it complicates locale handling in AppStream
> quite a bit, especially since BCP47/POSIX locale can't easily be
> mapped 1:1 (see[3])), but I think doing it is the most sane change.
> Just ignoring the IETF specification and having POSIX locale there
> seemed attractive at first, but a lot of tools that translate XML do
> not handle this well and will continue to output BCP47 locale, such as
> the ones used by KDE and itstool. So even if KDE would switch, we
> would set a trap for many other projects using alternative translation
> solutions, with no easy way for projects to solve this.
> Making this change will cause problems for projects which also did it
> "the wrong way", but on the other hand it will fix a real bug for
> projects which were using BCP47 all along.
> Since there is a specification for this, I think not following
> established practices for XML is a pretty bad choice.
>
> So, I implemented a locale mapping algorithm in AppStream that
> translates POSIX to BCP47 based on the same rules that itstool[4] uses
> to do the same task. That will work for pretty much all cases, I hope.
> AppStream will also do its best to find good locale in case someone
> did use POSIX in xml:lang, but for performance reasons I can't
> implement anything that just accepts both locale - that would not only
> be a lot of engineering work for little gain, but it would also slow
> down parsing for everyone. I may adjust appstream-compose though the
> correct wrong locale, which should also fix the issue for people
> before it reaches users.
>
> All of the changes are not yet merged into master, but I intend to do
> that soon, unless there are objections or feedback that I haven't
> considered.
>
> So, what do you think? It's a pretty bad situation to be in, but it
> needs to be addressed somehow.
>
> Cheers,
> Matthias
>
> P.S: I think POSIX locale are better than BCP47, their format is just
> a lot more versatile and expressive and can easily be split into
> individual language/territory/modifier parts, where it's less clear
> for BCP47. I wish we would have only one format to identify locale,
> but that ship sailed in 1995/2001.
>
> [1]: https://invent.kde.org/sysadmin/l10n-scripty/-/merge_requests/61
> [2]: https://en.wikipedia.org/wiki/IETF_language_tag
> [3]: https://wiki.openoffice.org/wiki/LocaleMapping
> [4]: https://itstool.org/
>
> --
> I welcome VSRE emails. See http://vsre.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/appstream/attachments/20230407/01d535e1/attachment.htm>
More information about the AppStream
mailing list