[Mesa-dev] [PATCH v3] python: Rework bytes/unicode string handling
Jose Fonseca
jfonseca at vmware.com
Fri Aug 17 12:29:49 UTC 2018
This change caused one of our MSVC build machines to fail with
scons: Building targets ...
Generating build\windows-x86-debug\util\xmlpool\options.h ...
Traceback (most recent call last):
File "src\util\xmlpool\gen_xmlpool.py", line 221, in <module>
print(line, end='')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in
position 68: ordinal not in range(128)
scons: *** [build\windows-x86-debug\util\xmlpool\options.h] Error 1
I have no idea why that machine is affected, but AppVeyor and my local
runs are not.
Setting PYTHONIOENCODING=utf-8 helps, but then bad things still happen
when the output is loaded src/gallium/auxiliary/pipe-loader/
But the fact is that everything was working before.
Perhaps a solution is to just start using Python 3 for the generation
scripts, as it might yield more consistent results.
Jose
On 10/08/18 22:17, Mathieu Bridon wrote:
> In both Python 2 and 3, opening a file without specifying the mode will
> open it for reading in text mode ('r').
>
> On Python 2, the read() method of a file object opened in mode 'r' will
> return byte strings, while on Python 3 it will return unicode strings.
>
> Explicitly specifying the binary mode ('rb') then decoding the byte
> string means we always handle unicode strings on both Python 2 and 3.
>
> Which in turns means all re.match(line) will return unicode strings as
> well.
>
> If we also make expandCString return unicode strings, we don't need the
> call to the unicode() constructor any more.
>
> We were using the ugettext() method because it always returns unicode
> strings in Python 2, contrarily to the gettext() one which returns
> byte strings. The ugettext() method doesn't exist on Python 3, so we
> must use the right method on each version of Python.
>
> The last hurdles are that Python 3 doesn't let us concatenate unicode
> and byte strings directly, and that Python 2's stdout wants encoded byte
> strings while Python 3's want unicode strings.
>
> With these changes, the script gives the same output on both Python 2
> and 3.
>
> Signed-off-by: Mathieu Bridon <bochecha at daitauha.fr>
> ---
> src/util/xmlpool/gen_xmlpool.py | 41 +++++++++++++++++++++++++--------
> 1 file changed, 31 insertions(+), 10 deletions(-)
>
> diff --git a/src/util/xmlpool/gen_xmlpool.py b/src/util/xmlpool/gen_xmlpool.py
> index b0db183854..327709c7f8 100644
> --- a/src/util/xmlpool/gen_xmlpool.py
> +++ b/src/util/xmlpool/gen_xmlpool.py
> @@ -13,6 +13,12 @@ import sys
> import gettext
> import re
>
> +
> +if sys.version_info < (3, 0):
> + gettext_method = 'ugettext'
> +else:
> + gettext_method = 'gettext'
> +
> # Path to t_options.h
> template_header_path = sys.argv[1]
>
> @@ -60,7 +66,7 @@ def expandCString (s):
> octa = False
> num = 0
> digits = 0
> - r = ''
> + r = u''
> while i < len(s):
> if not escape:
> if s[i] == '\\':
> @@ -128,16 +134,29 @@ def expandMatches (matches, translations, end=None):
> if len(matches) == 1 and i < len(translations) and \
> not matches[0].expand (r'\7').endswith('\\'):
> suffix = ' \\'
> - # Expand the description line. Need to use ugettext in order to allow
> - # non-ascii unicode chars in the original English descriptions.
> - text = escapeCString (trans.ugettext (unicode (expandCString (
> - matches[0].expand (r'\5')), "utf-8"))).encode("utf-8")
> - print(matches[0].expand (r'\1' + lang + r'\3"' + text + r'"\7') + suffix)
> + text = escapeCString (getattr(trans, gettext_method) (expandCString (
> + matches[0].expand (r'\5'))))
> + text = (matches[0].expand (r'\1' + lang + r'\3"' + text + r'"\7') + suffix)
> +
> + # In Python 2, stdout expects encoded byte strings, or else it will
> + # encode them with the ascii 'codec'
> + if sys.version_info.major == 2:
> + text = text.encode('utf-8')
> +
> + print(text)
> +
> # Expand any subsequent enum lines
> for match in matches[1:]:
> - text = escapeCString (trans.ugettext (unicode (expandCString (
> - match.expand (r'\3')), "utf-8"))).encode("utf-8")
> - print(match.expand (r'\1"' + text + r'"\5'))
> + text = escapeCString (getattr(trans, gettext_method) (expandCString (
> + match.expand (r'\3'))))
> + text = match.expand (r'\1"' + text + r'"\5')
> +
> + # In Python 2, stdout expects encoded byte strings, or else it will
> + # encode them with the ascii 'codec'
> + if sys.version_info.major == 2:
> + text = text.encode('utf-8')
> +
> + print(text)
>
> # Expand description end
> if end:
> @@ -168,9 +187,11 @@ print("/***********************************************************************\
>
> # Process the options template and generate options.h with all
> # translations.
> -template = open (template_header_path, "r")
> +template = open (template_header_path, "rb")
> descMatches = []
> for line in template:
> + line = line.decode('utf-8')
> +
> if len(descMatches) > 0:
> matchENUM = reENUM .match (line)
> matchDESC_END = reDESC_END.match (line)
>
More information about the mesa-dev
mailing list