[PATCH] dim: decode email message content charset to unicode

Daniel Vetter daniel at ffwll.ch
Tue Dec 15 11:32:34 UTC 2020


On Tue, Dec 15, 2020 at 12:26 PM Jani Nikula <jani.nikula at intel.com> wrote:
>
> On Tue, 15 Dec 2020, Daniel Vetter <daniel at ffwll.ch> wrote:
> > Adding Thomas too.
> >
> > On Tue, Dec 15, 2020 at 10:23 AM Daniel Vetter <daniel at ffwll.ch> wrote:
> >>
> >> On Wed, Nov 4, 2020 at 9:33 AM Jani Nikula <jani.nikula at intel.com> wrote:
> >> >
> >> > On Wed, 04 Nov 2020, Dave Airlie <airlied at gmail.com> wrote:
> >> > > is this why I get
> >> > > dim apply-pull drm-next < /tmp/PULL-drm-intel-next-queued.patch
> >> > > Traceback (most recent call last):
> >> > >   File "<stdin>", line 9, in <module>
> >> > >   File "<stdin>", line 7, in print_msg
> >> > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
> >> > > position 1256: ordinal not in range(128)
> >> > >
> >> > > now?
> >> > >
> >> > > just taking the pull request patch from patchwork
> >> > > https://patchwork.freedesktop.org/patch/398659/
> >> >
> >> > *sigh*
> >> >
> >> > When the message left here, and also when a copy arrived through a round
> >> > trip via the mailing list, it had Content-Transfer-Encoding:
> >> > quoted-printable, and the decoding works fine on the local copies, on
> >> > both python2 and python3.
> >> >
> >> > The message from patchwork has Content-Transfer-Encoding: 8bit,
> >> > i.e. patchwork modified the encoding, and the decoding fails on
> >> > python3 due to invalid characters. Python2 is less picky.
> >> >
> >> > With the change reverted, message_print_body() prints the message as
> >> > binary without decoding on python3. I don't know if that works by
> >> > coincidence.
> >> >
> >> > Everything also seems to work on the mbox downloaded from Lore [1], can
> >> > you please use that in the mean time?
> >>
> >> gmail seems to do the same mangling, at least my local mailbox also
> >> has issues. And it's with all of Thomas' pull requests. Pulling from
> >> lore is kinda awkward.
> >>
> >> Any ideas?
>
> Isn't this fixed by
>
> commit 03f281de0f9175875b8d4da0a43d9d288debb228
> Author: Jani Nikula <jani.nikula at intel.com>
> Date:   Wed Nov 18 15:11:03 2020 +0200
>
>     dim: replace message characters leading to decoding errors with U+FFFD

Nope, I had that one already. Simon debugged it, apparently problem is
even earlier in the python magic.
-Daniel

>
> BR,
> Jani.
>
>
>
> >> -Daniel
> >>
> >> >
> >> >
> >> > BR,
> >> > Jani.
> >> >
> >> >
> >> > [1] https://lore.kernel.org/dri-devel/87o8kehbaj.fsf@intel.com/raw
> >> >
> >> >
> >> > >
> >> > > Dave.
> >> > >
> >> > > On Wed, 28 Oct 2020 at 21:16, Vivi, Rodrigo <rodrigo.vivi at intel.com> wrote:
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Oct 28, 2020, at 12:46 AM, Jani Nikula <jani.nikula at intel.com> wrote:
> >> > >>
> >> > >> On Tue, 27 Oct 2020, Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
> >> > >>
> >> > >> On Mon, Oct 26, 2020 at 12:21:24PM +0200, Jani Nikula wrote:
> >> > >>
> >> > >> On Wed, 16 Sep 2020, Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
> >> > >>
> >> > >> On Wed, Sep 16, 2020 at 12:57:43PM +0300, Jani Nikula wrote:
> >> > >>
> >> > >> Email messages need two levels of decoding: First, content transfer
> >> > >> encoding, such as base64 or quoted-printable. Second, charset decoding.
> >> > >>
> >> > >> We've done the first (with part.get_payload(decode=True)), but we've
> >> > >> ignored the charset. Mostly, it has not mattered, since most email is
> >> > >> ascii or utf-8 anyway, and python2 has been relaxed about it. However,
> >> > >> python3 part.get_payload(decode=True) gives us binary instead of
> >> > >> unicode, so we also need to do the charset decoding to get the result we
> >> > >> want.
> >> > >>
> >> > >> The problem has likely been observed only now that 'python' no longer
> >> > >> exists or points at python3 instead of python2.
> >> > >>
> >> > >> Use part.get_content_charset() for charset decoding, defaulting to
> >> > >> 'us-ascii' source charset if nothing is specified.
> >> > >>
> >> > >> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> >> > >> Cc: Daniel Vetter <daniel at ffwll.ch>
> >> > >> Signed-off-by: Jani Nikula <jani.nikula at intel.com>
> >> > >>
> >> > >>
> >> > >> Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> >> > >> Tested-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> >> > >>
> >> > >> (Although it continue to fail with the encoded email)
> >> > >>
> >> > >>
> >> > >> Thanks, pushed, though still work to do I guess. :/
> >> > >>
> >> > >>
> >> > >> yeap... it also fails with recent gvt-fixes pull request :(
> >> > >>
> >> > >>
> >> > >> Except this is an altogether different issue. The mail parsing works
> >> > >> just fine.
> >> > >>
> >> > >> Pulling https://github.com/intel/gvt-linux tags/gvt-fixes-2020-10-27 ...
> >> > >> From https://github.com/intel/gvt-linux
> >> > >> * tag                         gvt-fixes-2020-10-27 -> FETCH_HEAD
> >> > >> dim: 401ccfa87856 ("drm/i915/gvt: Only pin/unpin intel_context along with workload"): Subject in fixes line doesn't match referenced commit:
> >> > >> dim:     e6ba76480299 (drm/i915: Remove i915->kernel_context)
> >> > >> dim: ERROR: issues in commits detected, aborting
> >> > >>
> >> > >>
> >> > >> $ git log e6ba76480299 -1 --format="%s"
> >> > >> drm/i915: Remove i915->kernel_context
> >> > >>
> >> > >>
> >> > >> This is a valid complaint.
> >> > >>
> >> > >> This is what's in the pull request:
> >> > >>
> >> > >> $ git show 401ccfa87856 | grep Fixes
> >> > >>    Fixes: e6ba76480299 (drm/i915: Remove i915->kernel_context)
> >> > >>
> >> > >> And this is what it should have:
> >> > >>
> >> > >> $ dim fixes e6ba76480299 | grep Fixes
> >> > >> Fixes: e6ba76480299 ("drm/i915: Remove i915->kernel_context")
> >> > >>
> >> > >>
> >> > >> holy! Because my eyes didn't catch this and I assumed this old bug was the cause I had
> >> > >> pulled gvt-fixes into drm-intel-fixes bypassing dim. :/
> >> > >>
> >> > >> I'm going to remove, force-push and request the fix there. So we don't propagate bad
> >> > >> tag that might break other scripts on the way.
> >> > >>
> >> > >> Sorry,
> >> > >> Rodrigo.
> >> > >>
> >> > >>
> >> > >>
> >> > >> BR,
> >> > >> Jani.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> BR,
> >> > >> Jani.
> >> > >>
> >> > >>
> >> > >>
> >> > >> Thanks,
> >> > >> Rodrigo.
> >> > >>
> >> > >> ---
> >> > >> dim | 2 +-
> >> > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >> > >>
> >> > >> diff --git a/dim b/dim
> >> > >> index c3a048db8956..3f489976c6bc 100755
> >> > >> --- a/dim
> >> > >> +++ b/dim
> >> > >> @@ -447,7 +447,7 @@ def print_msg(file):
> >> > >>     msg = email.message_from_file(file)
> >> > >>     for part in msg.walk():
> >> > >>         if part.get_content_type() == 'text/plain':
> >> > >> -            print(part.get_payload(decode=True))
> >> > >> +            print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))
> >> > >>
> >> > >> print_msg(open('$1', 'r'))
> >> > >> EOF
> >> > >> --
> >> > >> 2.20.1
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Jani Nikula, Intel Open Source Graphics Center
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Jani Nikula, Intel Open Source Graphics Center
> >> > >>
> >> > >>
> >> > >> _______________________________________________
> >> > >> dim-tools mailing list
> >> > >> dim-tools at lists.freedesktop.org
> >> > >> https://lists.freedesktop.org/mailman/listinfo/dim-tools
> >> >
> >> > --
> >> > Jani Nikula, Intel Open Source Graphics Center
> >>
> >>
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
>
> --
> Jani Nikula, Intel Open Source Graphics Center



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dim-tools mailing list