[PATCH] dim: fix handling of 8-bit non-UTF-8 messages
Daniel Vetter
daniel.vetter at ffwll.ch
Tue Dec 15 16:28:00 UTC 2020
Worksforme, so I applied it. Thanks for figuring out what's going on here.
-Daniel
On Tue, Dec 15, 2020 at 11:37 AM Simon Ser <contact at emersion.fr> wrote:
>
> Python's open() function will return a file object that decodes input
> bytes to an UTF-8 string. Python assumes all files are UTF-8 by default
> (unless an explicit encoding param is passed).
>
> This works fine with 7-bit and UTF-8 messages. However, when a message
> uses a 8-bit Content-Transfer-Encoding and a non-UTF-8 charset (such as
> iso-8859-1), Python will error out.
>
> To prevent this, open the file in binary mode to prevent Python from
> doing any charset conversion under-the-hood.
>
> Signed-off-by: Simon Ser <contact at emersion.fr>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> ---
> dim | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/dim b/dim
> index ac53ade475c4..f4366ea165a2 100755
> --- a/dim
> +++ b/dim
> @@ -443,9 +443,11 @@ function check_dim_config
> message_get_id ()
> {
> $dim_python <<EOF
> -from email.parser import Parser
> -headers = Parser().parse(open('$1', 'r'))
> -message_id = headers['message-id']
> +import email
> +
> +f = open('$1', 'rb')
> +msg = email.message_from_binary_file(f)
> +message_id = msg['message-id']
> if message_id is not None:
> print(message_id.strip('<> \n'))
> EOF
> @@ -457,12 +459,12 @@ message_print_body ()
> import email
>
> def print_msg(file):
> - msg = email.message_from_file(file)
> + msg = email.message_from_binary_file(file)
> for part in msg.walk():
> if part.get_content_type() == 'text/plain':
> print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii'), 'replace'))
>
> -print_msg(open('$1', 'r'))
> +print_msg(open('$1', 'rb'))
> EOF
> }
>
> --
> 2.29.2
>
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the dim-tools
mailing list