[PATCH] dim: decode email message content charset to unicode

Vivi, Rodrigo rodrigo.vivi at intel.com
Wed Oct 28 11:16:02 UTC 2020



On Oct 28, 2020, at 12:46 AM, Jani Nikula <jani.nikula at intel.com<mailto:jani.nikula at intel.com>> wrote:

On Tue, 27 Oct 2020, Rodrigo Vivi <rodrigo.vivi at intel.com<mailto:rodrigo.vivi at intel.com>> wrote:
On Mon, Oct 26, 2020 at 12:21:24PM +0200, Jani Nikula wrote:
On Wed, 16 Sep 2020, Rodrigo Vivi <rodrigo.vivi at intel.com<mailto:rodrigo.vivi at intel.com>> wrote:
On Wed, Sep 16, 2020 at 12:57:43PM +0300, Jani Nikula wrote:
Email messages need two levels of decoding: First, content transfer
encoding, such as base64 or quoted-printable. Second, charset decoding.

We've done the first (with part.get_payload(decode=True)), but we've
ignored the charset. Mostly, it has not mattered, since most email is
ascii or utf-8 anyway, and python2 has been relaxed about it. However,
python3 part.get_payload(decode=True) gives us binary instead of
unicode, so we also need to do the charset decoding to get the result we
want.

The problem has likely been observed only now that 'python' no longer
exists or points at python3 instead of python2.

Use part.get_content_charset() for charset decoding, defaulting to
'us-ascii' source charset if nothing is specified.

Cc: Rodrigo Vivi <rodrigo.vivi at intel.com<mailto:rodrigo.vivi at intel.com>>
Cc: Daniel Vetter <daniel at ffwll.ch<mailto:daniel at ffwll.ch>>
Signed-off-by: Jani Nikula <jani.nikula at intel.com<mailto:jani.nikula at intel.com>>

Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com<mailto:rodrigo.vivi at intel.com>>
Tested-by: Rodrigo Vivi <rodrigo.vivi at intel.com<mailto:rodrigo.vivi at intel.com>>

(Although it continue to fail with the encoded email)

Thanks, pushed, though still work to do I guess. :/

yeap... it also fails with recent gvt-fixes pull request :(

Except this is an altogether different issue. The mail parsing works
just fine.

Pulling https://github.com/intel/gvt-linux tags/gvt-fixes-2020-10-27 ...
>From https://github.com/intel/gvt-linux
* tag                         gvt-fixes-2020-10-27 -> FETCH_HEAD
dim: 401ccfa87856 ("drm/i915/gvt: Only pin/unpin intel_context along with workload"): Subject in fixes line doesn't match referenced commit:
dim:     e6ba76480299 (drm/i915: Remove i915->kernel_context)
dim: ERROR: issues in commits detected, aborting


$ git log e6ba76480299 -1 --format="%s"
drm/i915: Remove i915->kernel_context

This is a valid complaint.

This is what's in the pull request:

$ git show 401ccfa87856 | grep Fixes
   Fixes: e6ba76480299 (drm/i915: Remove i915->kernel_context)

And this is what it should have:

$ dim fixes e6ba76480299 | grep Fixes
Fixes: e6ba76480299 ("drm/i915: Remove i915->kernel_context")

holy! Because my eyes didn't catch this and I assumed this old bug was the cause I had
pulled gvt-fixes into drm-intel-fixes bypassing dim. :/

I'm going to remove, force-push and request the fix there. So we don't propagate bad
tag that might break other scripts on the way.

Sorry,
Rodrigo.



BR,
Jani.




BR,
Jani.



Thanks,
Rodrigo.

---
dim | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dim b/dim
index c3a048db8956..3f489976c6bc 100755
--- a/dim
+++ b/dim
@@ -447,7 +447,7 @@ def print_msg(file):
    msg = email.message_from_file(file)
    for part in msg.walk():
        if part.get_content_type() == 'text/plain':
-            print(part.get_payload(decode=True))
+            print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))

print_msg(open('$1', 'r'))
EOF
--
2.20.1


--
Jani Nikula, Intel Open Source Graphics Center

--
Jani Nikula, Intel Open Source Graphics Center

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dim-tools/attachments/20201028/ff446cd3/attachment.htm>


More information about the dim-tools mailing list