[PATCH] dim: decode email message content charset to unicode

Daniel Vetter daniel at ffwll.ch
Tue Dec 15 09:23:16 UTC 2020


On Wed, Nov 4, 2020 at 9:33 AM Jani Nikula <jani.nikula at intel.com> wrote:
>
> On Wed, 04 Nov 2020, Dave Airlie <airlied at gmail.com> wrote:
> > is this why I get
> > dim apply-pull drm-next < /tmp/PULL-drm-intel-next-queued.patch
> > Traceback (most recent call last):
> >   File "<stdin>", line 9, in <module>
> >   File "<stdin>", line 7, in print_msg
> > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
> > position 1256: ordinal not in range(128)
> >
> > now?
> >
> > just taking the pull request patch from patchwork
> > https://patchwork.freedesktop.org/patch/398659/
>
> *sigh*
>
> When the message left here, and also when a copy arrived through a round
> trip via the mailing list, it had Content-Transfer-Encoding:
> quoted-printable, and the decoding works fine on the local copies, on
> both python2 and python3.
>
> The message from patchwork has Content-Transfer-Encoding: 8bit,
> i.e. patchwork modified the encoding, and the decoding fails on
> python3 due to invalid characters. Python2 is less picky.
>
> With the change reverted, message_print_body() prints the message as
> binary without decoding on python3. I don't know if that works by
> coincidence.
>
> Everything also seems to work on the mbox downloaded from Lore [1], can
> you please use that in the mean time?

gmail seems to do the same mangling, at least my local mailbox also
has issues. And it's with all of Thomas' pull requests. Pulling from
lore is kinda awkward.

Any ideas?
-Daniel

>
>
> BR,
> Jani.
>
>
> [1] https://lore.kernel.org/dri-devel/87o8kehbaj.fsf@intel.com/raw
>
>
> >
> > Dave.
> >
> > On Wed, 28 Oct 2020 at 21:16, Vivi, Rodrigo <rodrigo.vivi at intel.com> wrote:
> >>
> >>
> >>
> >> On Oct 28, 2020, at 12:46 AM, Jani Nikula <jani.nikula at intel.com> wrote:
> >>
> >> On Tue, 27 Oct 2020, Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
> >>
> >> On Mon, Oct 26, 2020 at 12:21:24PM +0200, Jani Nikula wrote:
> >>
> >> On Wed, 16 Sep 2020, Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
> >>
> >> On Wed, Sep 16, 2020 at 12:57:43PM +0300, Jani Nikula wrote:
> >>
> >> Email messages need two levels of decoding: First, content transfer
> >> encoding, such as base64 or quoted-printable. Second, charset decoding.
> >>
> >> We've done the first (with part.get_payload(decode=True)), but we've
> >> ignored the charset. Mostly, it has not mattered, since most email is
> >> ascii or utf-8 anyway, and python2 has been relaxed about it. However,
> >> python3 part.get_payload(decode=True) gives us binary instead of
> >> unicode, so we also need to do the charset decoding to get the result we
> >> want.
> >>
> >> The problem has likely been observed only now that 'python' no longer
> >> exists or points at python3 instead of python2.
> >>
> >> Use part.get_content_charset() for charset decoding, defaulting to
> >> 'us-ascii' source charset if nothing is specified.
> >>
> >> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> >> Cc: Daniel Vetter <daniel at ffwll.ch>
> >> Signed-off-by: Jani Nikula <jani.nikula at intel.com>
> >>
> >>
> >> Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> >> Tested-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> >>
> >> (Although it continue to fail with the encoded email)
> >>
> >>
> >> Thanks, pushed, though still work to do I guess. :/
> >>
> >>
> >> yeap... it also fails with recent gvt-fixes pull request :(
> >>
> >>
> >> Except this is an altogether different issue. The mail parsing works
> >> just fine.
> >>
> >> Pulling https://github.com/intel/gvt-linux tags/gvt-fixes-2020-10-27 ...
> >> From https://github.com/intel/gvt-linux
> >> * tag                         gvt-fixes-2020-10-27 -> FETCH_HEAD
> >> dim: 401ccfa87856 ("drm/i915/gvt: Only pin/unpin intel_context along with workload"): Subject in fixes line doesn't match referenced commit:
> >> dim:     e6ba76480299 (drm/i915: Remove i915->kernel_context)
> >> dim: ERROR: issues in commits detected, aborting
> >>
> >>
> >> $ git log e6ba76480299 -1 --format="%s"
> >> drm/i915: Remove i915->kernel_context
> >>
> >>
> >> This is a valid complaint.
> >>
> >> This is what's in the pull request:
> >>
> >> $ git show 401ccfa87856 | grep Fixes
> >>    Fixes: e6ba76480299 (drm/i915: Remove i915->kernel_context)
> >>
> >> And this is what it should have:
> >>
> >> $ dim fixes e6ba76480299 | grep Fixes
> >> Fixes: e6ba76480299 ("drm/i915: Remove i915->kernel_context")
> >>
> >>
> >> holy! Because my eyes didn't catch this and I assumed this old bug was the cause I had
> >> pulled gvt-fixes into drm-intel-fixes bypassing dim. :/
> >>
> >> I'm going to remove, force-push and request the fix there. So we don't propagate bad
> >> tag that might break other scripts on the way.
> >>
> >> Sorry,
> >> Rodrigo.
> >>
> >>
> >>
> >> BR,
> >> Jani.
> >>
> >>
> >>
> >>
> >> BR,
> >> Jani.
> >>
> >>
> >>
> >> Thanks,
> >> Rodrigo.
> >>
> >> ---
> >> dim | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/dim b/dim
> >> index c3a048db8956..3f489976c6bc 100755
> >> --- a/dim
> >> +++ b/dim
> >> @@ -447,7 +447,7 @@ def print_msg(file):
> >>     msg = email.message_from_file(file)
> >>     for part in msg.walk():
> >>         if part.get_content_type() == 'text/plain':
> >> -            print(part.get_payload(decode=True))
> >> +            print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))
> >>
> >> print_msg(open('$1', 'r'))
> >> EOF
> >> --
> >> 2.20.1
> >>
> >>
> >> --
> >> Jani Nikula, Intel Open Source Graphics Center
> >>
> >>
> >> --
> >> Jani Nikula, Intel Open Source Graphics Center
> >>
> >>
> >> _______________________________________________
> >> dim-tools mailing list
> >> dim-tools at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dim-tools
>
> --
> Jani Nikula, Intel Open Source Graphics Center



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dim-tools mailing list