[PATCH] dim: fix handling of 8-bit non-UTF-8 messages

Jani Nikula jani.nikula at linux.intel.com
Wed Dec 16 08:20:21 UTC 2020


On Tue, 15 Dec 2020, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> Worksforme, so I applied it. Thanks for figuring out what's going on here.
> -Daniel

Mmh, this bumps the required python version to 3.2+ as a side effect,
and with our dim_python=$(command -v python || command -v python3)
breaks the script for anyone who still has python 2 as 'python'.

Time to ditch python 2 explicitly I think.


BR,
Jani.


>
> On Tue, Dec 15, 2020 at 11:37 AM Simon Ser <contact at emersion.fr> wrote:
>>
>> Python's open() function will return a file object that decodes input
>> bytes to an UTF-8 string. Python assumes all files are UTF-8 by default
>> (unless an explicit encoding param is passed).
>>
>> This works fine with 7-bit and UTF-8 messages. However, when a message
>> uses a 8-bit Content-Transfer-Encoding and a non-UTF-8 charset (such as
>> iso-8859-1), Python will error out.
>>
>> To prevent this, open the file in binary mode to prevent Python from
>> doing any charset conversion under-the-hood.
>>
>> Signed-off-by: Simon Ser <contact at emersion.fr>
>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>> ---
>>  dim | 12 +++++++-----
>>  1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/dim b/dim
>> index ac53ade475c4..f4366ea165a2 100755
>> --- a/dim
>> +++ b/dim
>> @@ -443,9 +443,11 @@ function check_dim_config
>>  message_get_id ()
>>  {
>>         $dim_python <<EOF
>> -from email.parser import Parser
>> -headers = Parser().parse(open('$1', 'r'))
>> -message_id = headers['message-id']
>> +import email
>> +
>> +f = open('$1', 'rb')
>> +msg = email.message_from_binary_file(f)
>> +message_id = msg['message-id']
>>  if message_id is not None:
>>      print(message_id.strip('<> \n'))
>>  EOF
>> @@ -457,12 +459,12 @@ message_print_body ()
>>  import email
>>
>>  def print_msg(file):
>> -    msg = email.message_from_file(file)
>> +    msg = email.message_from_binary_file(file)
>>      for part in msg.walk():
>>          if part.get_content_type() == 'text/plain':
>>              print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii'), 'replace'))
>>
>> -print_msg(open('$1', 'r'))
>> +print_msg(open('$1', 'rb'))
>>  EOF
>>  }
>>
>> --
>> 2.29.2
>>
>>

-- 
Jani Nikula, Intel Open Source Graphics Center


More information about the dim-tools mailing list