[PATCH] dim: decode email message content charset to unicode
Rodrigo Vivi
rodrigo.vivi at intel.com
Wed Sep 16 13:54:47 UTC 2020
On Wed, Sep 16, 2020 at 12:57:43PM +0300, Jani Nikula wrote:
> Email messages need two levels of decoding: First, content transfer
> encoding, such as base64 or quoted-printable. Second, charset decoding.
>
> We've done the first (with part.get_payload(decode=True)), but we've
> ignored the charset. Mostly, it has not mattered, since most email is
> ascii or utf-8 anyway, and python2 has been relaxed about it. However,
> python3 part.get_payload(decode=True) gives us binary instead of
> unicode, so we also need to do the charset decoding to get the result we
> want.
>
> The problem has likely been observed only now that 'python' no longer
> exists or points at python3 instead of python2.
>
> Use part.get_content_charset() for charset decoding, defaulting to
> 'us-ascii' source charset if nothing is specified.
>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Signed-off-by: Jani Nikula <jani.nikula at intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
(Although it continue to fail with the encoded email)
Thanks,
Rodrigo.
> ---
> dim | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/dim b/dim
> index c3a048db8956..3f489976c6bc 100755
> --- a/dim
> +++ b/dim
> @@ -447,7 +447,7 @@ def print_msg(file):
> msg = email.message_from_file(file)
> for part in msg.walk():
> if part.get_content_type() == 'text/plain':
> - print(part.get_payload(decode=True))
> + print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))
>
> print_msg(open('$1', 'r'))
> EOF
> --
> 2.20.1
>
More information about the dim-tools
mailing list