[PATCH] dim: decode email message content charset to unicode

Jani Nikula jani.nikula at intel.com
Wed Nov 4 08:33:41 UTC 2020


On Wed, 04 Nov 2020, Dave Airlie <airlied at gmail.com> wrote:
> is this why I get
> dim apply-pull drm-next < /tmp/PULL-drm-intel-next-queued.patch
> Traceback (most recent call last):
>   File "<stdin>", line 9, in <module>
>   File "<stdin>", line 7, in print_msg
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
> position 1256: ordinal not in range(128)
>
> now?
>
> just taking the pull request patch from patchwork
> https://patchwork.freedesktop.org/patch/398659/

*sigh*

When the message left here, and also when a copy arrived through a round
trip via the mailing list, it had Content-Transfer-Encoding:
quoted-printable, and the decoding works fine on the local copies, on
both python2 and python3.

The message from patchwork has Content-Transfer-Encoding: 8bit,
i.e. patchwork modified the encoding, and the decoding fails on
python3 due to invalid characters. Python2 is less picky.

With the change reverted, message_print_body() prints the message as
binary without decoding on python3. I don't know if that works by
coincidence.

Everything also seems to work on the mbox downloaded from Lore [1], can
you please use that in the mean time?


BR,
Jani.


[1] https://lore.kernel.org/dri-devel/87o8kehbaj.fsf@intel.com/raw


>
> Dave.
>
> On Wed, 28 Oct 2020 at 21:16, Vivi, Rodrigo <rodrigo.vivi at intel.com> wrote:
>>
>>
>>
>> On Oct 28, 2020, at 12:46 AM, Jani Nikula <jani.nikula at intel.com> wrote:
>>
>> On Tue, 27 Oct 2020, Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
>>
>> On Mon, Oct 26, 2020 at 12:21:24PM +0200, Jani Nikula wrote:
>>
>> On Wed, 16 Sep 2020, Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
>>
>> On Wed, Sep 16, 2020 at 12:57:43PM +0300, Jani Nikula wrote:
>>
>> Email messages need two levels of decoding: First, content transfer
>> encoding, such as base64 or quoted-printable. Second, charset decoding.
>>
>> We've done the first (with part.get_payload(decode=True)), but we've
>> ignored the charset. Mostly, it has not mattered, since most email is
>> ascii or utf-8 anyway, and python2 has been relaxed about it. However,
>> python3 part.get_payload(decode=True) gives us binary instead of
>> unicode, so we also need to do the charset decoding to get the result we
>> want.
>>
>> The problem has likely been observed only now that 'python' no longer
>> exists or points at python3 instead of python2.
>>
>> Use part.get_content_charset() for charset decoding, defaulting to
>> 'us-ascii' source charset if nothing is specified.
>>
>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> Cc: Daniel Vetter <daniel at ffwll.ch>
>> Signed-off-by: Jani Nikula <jani.nikula at intel.com>
>>
>>
>> Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> Tested-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>
>> (Although it continue to fail with the encoded email)
>>
>>
>> Thanks, pushed, though still work to do I guess. :/
>>
>>
>> yeap... it also fails with recent gvt-fixes pull request :(
>>
>>
>> Except this is an altogether different issue. The mail parsing works
>> just fine.
>>
>> Pulling https://github.com/intel/gvt-linux tags/gvt-fixes-2020-10-27 ...
>> From https://github.com/intel/gvt-linux
>> * tag                         gvt-fixes-2020-10-27 -> FETCH_HEAD
>> dim: 401ccfa87856 ("drm/i915/gvt: Only pin/unpin intel_context along with workload"): Subject in fixes line doesn't match referenced commit:
>> dim:     e6ba76480299 (drm/i915: Remove i915->kernel_context)
>> dim: ERROR: issues in commits detected, aborting
>>
>>
>> $ git log e6ba76480299 -1 --format="%s"
>> drm/i915: Remove i915->kernel_context
>>
>>
>> This is a valid complaint.
>>
>> This is what's in the pull request:
>>
>> $ git show 401ccfa87856 | grep Fixes
>>    Fixes: e6ba76480299 (drm/i915: Remove i915->kernel_context)
>>
>> And this is what it should have:
>>
>> $ dim fixes e6ba76480299 | grep Fixes
>> Fixes: e6ba76480299 ("drm/i915: Remove i915->kernel_context")
>>
>>
>> holy! Because my eyes didn't catch this and I assumed this old bug was the cause I had
>> pulled gvt-fixes into drm-intel-fixes bypassing dim. :/
>>
>> I'm going to remove, force-push and request the fix there. So we don't propagate bad
>> tag that might break other scripts on the way.
>>
>> Sorry,
>> Rodrigo.
>>
>>
>>
>> BR,
>> Jani.
>>
>>
>>
>>
>> BR,
>> Jani.
>>
>>
>>
>> Thanks,
>> Rodrigo.
>>
>> ---
>> dim | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/dim b/dim
>> index c3a048db8956..3f489976c6bc 100755
>> --- a/dim
>> +++ b/dim
>> @@ -447,7 +447,7 @@ def print_msg(file):
>>     msg = email.message_from_file(file)
>>     for part in msg.walk():
>>         if part.get_content_type() == 'text/plain':
>> -            print(part.get_payload(decode=True))
>> +            print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))
>>
>> print_msg(open('$1', 'r'))
>> EOF
>> --
>> 2.20.1
>>
>>
>> --
>> Jani Nikula, Intel Open Source Graphics Center
>>
>>
>> --
>> Jani Nikula, Intel Open Source Graphics Center
>>
>>
>> _______________________________________________
>> dim-tools mailing list
>> dim-tools at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dim-tools

-- 
Jani Nikula, Intel Open Source Graphics Center


More information about the dim-tools mailing list