[PATCH] dim: replace message characters leading to decoding errors with U+FFFD

Jani Nikula jani.nikula at intel.com
Wed Nov 18 13:11:03 UTC 2020


The character set decoding added in commit b66d07db11e5 ("dim: decode
email message content charset to unicode") started failing with unicode
decoding failures under certain conditions. (Specifically python 3 and
mboxes downloaded from patchwork.)

Instead of raising UnicodeDecodeErrors, replace values that can't be
converted with U+FFFD (REPLACEMENT CHARACTER, �).

Reported-by: Dave Airlie <airlied at gmail.com>
Cc: Dave Airlie <airlied at gmail.com>
Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
Signed-off-by: Jani Nikula <jani.nikula at intel.com>
---
 dim | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dim b/dim
index 1be1435a1a52..1572cf33f25c 100755
--- a/dim
+++ b/dim
@@ -460,7 +460,7 @@ def print_msg(file):
     msg = email.message_from_file(file)
     for part in msg.walk():
         if part.get_content_type() == 'text/plain':
-            print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))
+            print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii'), 'replace'))
 
 print_msg(open('$1', 'r'))
 EOF
-- 
2.20.1



More information about the dim-tools mailing list