[Mesa-dev] [PATCH v2 8/9] python: Rework bytes/unicode string handling

Thu Aug 9 22:06:07 UTC 2018

This doesn't work with python 2. It's also pointed out to me that we don't
handle translations in meson at all...

Here's the traceback:
make[5]: Entering directory '/home/jenkins/workspace/Leeroy_3/repos/mesa/build_m64/src/util/xmlpool'
Updating (ca) ca/LC_MESSAGES/options.mo from /home/jenkins/workspace/Leeroy_3/repos/mesa/src/util/xmlpool/ca.po.
Updating (de) de/LC_MESSAGES/options.mo from /home/jenkins/workspace/Leeroy_3/repos/mesa/src/util/xmlpool/de.po.
Updating (es) es/LC_MESSAGES/options.mo from /home/jenkins/workspace/Leeroy_3/repos/mesa/src/util/xmlpool/es.po.
Updating (nl) nl/LC_MESSAGES/options.mo from /home/jenkins/workspace/Leeroy_3/repos/mesa/src/util/xmlpool/nl.po.
Updating (fr) fr/LC_MESSAGES/options.mo from /home/jenkins/workspace/Leeroy_3/repos/mesa/src/util/xmlpool/fr.po.
Updating (sv) sv/LC_MESSAGES/options.mo from /home/jenkins/workspace/Leeroy_3/repos/mesa/src/util/xmlpool/sv.po.
  GEN      options.h
Traceback (most recent call last):
  File "/home/jenkins/workspace/Leeroy_3/repos/mesa/src/util/xmlpool/gen_xmlpool.py", line 210, in <module>
    expandMatches ([matchDESC], translations)
  File "/home/jenkins/workspace/Leeroy_3/repos/mesa/src/util/xmlpool/gen_xmlpool.py", line 133, in expandMatches
    text = (matches[0].expand (r'\1' + lang + r'\3"' + text + r'"\7') + suffix)
  File "/usr/lib/python2.7/re.py", line 282, in _expand
    return sre_parse.expand_template(template, match)
  File "/usr/lib/python2.7/sre_parse.py", line 862, in expand_template
    return sep.join(literals)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128)
Makefile:720: recipe for target 'options.h' failed

I'm running everything up through this through our CI again, but assuming
everything still looks good I'll be merging everything but this patch and the
next patch today.

For every patch up to this point:
Reviewed-by: Dylan Baker <dylan at pnwbakers.com>

Quoting Mathieu Bridon (2018-08-09 01:27:25)
> In both Python 2 and 3, opening a file without specifying the mode will
> open it for reading in text mode ('r').
> 
> On Python 2, the read() method of a file object opened in mode 'r' will
> return byte strings, while on Python 3 it will return unicode strings.
> 
> Explicitly specifying the binary mode ('rb') then decoding the byte
> string means we always handle unicode strings on both Python 2 and 3.
> 
> Which in turns means all re.match(line) will return unicode strings as
> well.
> 
> If we also make expandCString return unicode strings, we don't need the
> call to the unicode() constructor any more.
> 
> We were using the ugettext() method because it always returns unicode
> strings in Python 2, contrarily to the gettext() one which returns
> strings in the same type as its input. The ugettext() method doesn't
> exist on Python 3, so we must use the gettext() one.
> 
> This is fine now that we know we only pass unicode strings to gettext().
> (the return values of expandCString)
> 
> The last hurdles are that Python 3 doesn't let us concatenate unicode
> and byte strings directly, and that Python 2's stdout wants encoded byte
> strings while Python 3's want unicode strings.
> 
> With these changes, the script gives the same output on both Python 2
> and 3.
> 
> Signed-off-by: Mathieu Bridon <bochecha at daitauha.fr>
> ---
>  src/util/xmlpool/gen_xmlpool.py | 35 +++++++++++++++++++++++----------
>  1 file changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/src/util/xmlpool/gen_xmlpool.py b/src/util/xmlpool/gen_xmlpool.py
> index b0db183854..db20e2767f 100644
> --- a/src/util/xmlpool/gen_xmlpool.py
> +++ b/src/util/xmlpool/gen_xmlpool.py
> @@ -60,7 +60,7 @@ def expandCString (s):
>      octa = False
>      num = 0
>      digits = 0
> -    r = ''
> +    r = u''
>      while i < len(s):
>          if not escape:
>              if s[i] == '\\':
> @@ -128,16 +128,29 @@ def expandMatches (matches, translations, end=None):
>          if len(matches) == 1 and i < len(translations) and \
>                 not matches[0].expand (r'\7').endswith('\\'):
>              suffix = ' \\'
> -        # Expand the description line. Need to use ugettext in order to allow
> -        # non-ascii unicode chars in the original English descriptions.
> -        text = escapeCString (trans.ugettext (unicode (expandCString (
> -            matches[0].expand (r'\5')), "utf-8"))).encode("utf-8")
> -        print(matches[0].expand (r'\1' + lang + r'\3"' + text + r'"\7') + suffix)
> +        text = escapeCString (trans.gettext (expandCString (
> +            matches[0].expand (r'\5'))))
> +        text = (matches[0].expand (r'\1' + lang + r'\3"' + text + r'"\7') + suffix)
> +
> +        # In Python 2, stdout expects encoded byte strings, or else it will
> +        # encode them with the ascii 'codec'
> +        if sys.version_info.major == 2:
> +            text = text.encode('utf-8')
> +
> +        print(text)
> +
>          # Expand any subsequent enum lines
>          for match in matches[1:]:
> -            text = escapeCString (trans.ugettext (unicode (expandCString (
> -                match.expand (r'\3')), "utf-8"))).encode("utf-8")
> -            print(match.expand (r'\1"' + text + r'"\5'))
> +            text = escapeCString (trans.gettext (expandCString (
> +                match.expand (r'\3'))))
> +            text = match.expand (r'\1"' + text + r'"\5')
> +
> +            # In Python 2, stdout expects encoded byte strings, or else it will
> +            # encode them with the ascii 'codec'
> +            if sys.version_info.major == 2:
> +                text = text.encode('utf-8')
> +
> +            print(text)
>  
>          # Expand description end
>          if end:
> @@ -168,9 +181,11 @@ print("/***********************************************************************\
>  
>  # Process the options template and generate options.h with all
>  # translations.
> -template = open (template_header_path, "r")
> +template = open (template_header_path, "rb")
>  descMatches = []
>  for line in template:
> +    line = line.decode('utf-8')
> +
>      if len(descMatches) > 0:
>          matchENUM     = reENUM    .match (line)
>          matchDESC_END = reDESC_END.match (line)
> -- 
> 2.17.1
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180809/4bad852e/attachment.sig>