[PATCH] dim: decode email message content charset to unicode

Vivi, Rodrigo rodrigo.vivi at intel.com
Wed Sep 16 14:16:15 UTC 2020



> On Sep 16, 2020, at 7:00 AM, Jani Nikula <jani.nikula at intel.com> wrote:
> 
> On Wed, 16 Sep 2020, Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
>> On Wed, Sep 16, 2020 at 12:57:43PM +0300, Jani Nikula wrote:
>>> Email messages need two levels of decoding: First, content transfer
>>> encoding, such as base64 or quoted-printable. Second, charset decoding.
>>> 
>>> We've done the first (with part.get_payload(decode=True)), but we've
>>> ignored the charset. Mostly, it has not mattered, since most email is
>>> ascii or utf-8 anyway, and python2 has been relaxed about it. However,
>>> python3 part.get_payload(decode=True) gives us binary instead of
>>> unicode, so we also need to do the charset decoding to get the result we
>>> want.
>>> 
>>> The problem has likely been observed only now that 'python' no longer
>>> exists or points at python3 instead of python2.
>>> 
>>> Use part.get_content_charset() for charset decoding, defaulting to
>>> 'us-ascii' source charset if nothing is specified.
>>> 
>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>> Cc: Daniel Vetter <daniel at ffwll.ch>
>>> Signed-off-by: Jani Nikula <jani.nikula at intel.com>
>> 
>> Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> Tested-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> 
>> (Although it continue to fail with the encoded email)
> 
> Which one?

I got that gvt-next one and resent using outlook so I got a real encoded case.
That doesn't work with any of our versions.
It would be good to get the case that you told we had working in the past to test
as well...

> 
> BR,
> Jani.
> 
> 
>> 
>> Thanks,
>> Rodrigo.
>> 
>>> ---
>>> dim | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/dim b/dim
>>> index c3a048db8956..3f489976c6bc 100755
>>> --- a/dim
>>> +++ b/dim
>>> @@ -447,7 +447,7 @@ def print_msg(file):
>>>     msg = email.message_from_file(file)
>>>     for part in msg.walk():
>>>         if part.get_content_type() == 'text/plain':
>>> -            print(part.get_payload(decode=True))
>>> +            print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))
>>> 
>>> print_msg(open('$1', 'r'))
>>> EOF
>>> -- 
>>> 2.20.1
>>> 
> 
> -- 
> Jani Nikula, Intel Open Source Graphics Center
> _______________________________________________
> dim-tools mailing list
> dim-tools at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dim-tools



More information about the dim-tools mailing list