[PATCH] dim: decode email message content charset to unicode

Jani Nikula jani.nikula at intel.com
Wed Sep 16 14:36:51 UTC 2020


On Wed, 16 Sep 2020, "Vivi, Rodrigo" <rodrigo.vivi at intel.com> wrote:
>> On Sep 16, 2020, at 7:00 AM, Jani Nikula <jani.nikula at intel.com> wrote:
>> 
>> On Wed, 16 Sep 2020, Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
>>> On Wed, Sep 16, 2020 at 12:57:43PM +0300, Jani Nikula wrote:
>>>> Email messages need two levels of decoding: First, content transfer
>>>> encoding, such as base64 or quoted-printable. Second, charset decoding.
>>>> 
>>>> We've done the first (with part.get_payload(decode=True)), but we've
>>>> ignored the charset. Mostly, it has not mattered, since most email is
>>>> ascii or utf-8 anyway, and python2 has been relaxed about it. However,
>>>> python3 part.get_payload(decode=True) gives us binary instead of
>>>> unicode, so we also need to do the charset decoding to get the result we
>>>> want.
>>>> 
>>>> The problem has likely been observed only now that 'python' no longer
>>>> exists or points at python3 instead of python2.
>>>> 
>>>> Use part.get_content_charset() for charset decoding, defaulting to
>>>> 'us-ascii' source charset if nothing is specified.
>>>> 
>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>> Cc: Daniel Vetter <daniel at ffwll.ch>
>>>> Signed-off-by: Jani Nikula <jani.nikula at intel.com>
>>> 
>>> Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>> Tested-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>> 
>>> (Although it continue to fail with the encoded email)
>> 
>> Which one?
>
> I got that gvt-next one and resent using outlook so I got a real encoded case.
> That doesn't work with any of our versions.

:(

> It would be good to get the case that you told we had working in the past to test
> as well...

Wish I knew which one it was...

J.


>
>> 
>> BR,
>> Jani.
>> 
>> 
>>> 
>>> Thanks,
>>> Rodrigo.
>>> 
>>>> ---
>>>> dim | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/dim b/dim
>>>> index c3a048db8956..3f489976c6bc 100755
>>>> --- a/dim
>>>> +++ b/dim
>>>> @@ -447,7 +447,7 @@ def print_msg(file):
>>>>     msg = email.message_from_file(file)
>>>>     for part in msg.walk():
>>>>         if part.get_content_type() == 'text/plain':
>>>> -            print(part.get_payload(decode=True))
>>>> +            print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))
>>>> 
>>>> print_msg(open('$1', 'r'))
>>>> EOF
>>>> -- 
>>>> 2.20.1
>>>> 
>> 
>> -- 
>> Jani Nikula, Intel Open Source Graphics Center
>> _______________________________________________
>> dim-tools mailing list
>> dim-tools at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dim-tools
>

-- 
Jani Nikula, Intel Open Source Graphics Center


More information about the dim-tools mailing list