[PATCH] dim: fix handling of 8-bit non-UTF-8 messages

Simon Ser contact at emersion.fr
Tue Dec 15 10:37:09 UTC 2020


Python's open() function will return a file object that decodes input
bytes to an UTF-8 string. Python assumes all files are UTF-8 by default
(unless an explicit encoding param is passed).

This works fine with 7-bit and UTF-8 messages. However, when a message
uses a 8-bit Content-Transfer-Encoding and a non-UTF-8 charset (such as
iso-8859-1), Python will error out.

To prevent this, open the file in binary mode to prevent Python from
doing any charset conversion under-the-hood.

Signed-off-by: Simon Ser <contact at emersion.fr>
Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
---
 dim | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/dim b/dim
index ac53ade475c4..f4366ea165a2 100755
--- a/dim
+++ b/dim
@@ -443,9 +443,11 @@ function check_dim_config
 message_get_id ()
 {
 	$dim_python <<EOF
-from email.parser import Parser
-headers = Parser().parse(open('$1', 'r'))
-message_id = headers['message-id']
+import email
+
+f = open('$1', 'rb')
+msg = email.message_from_binary_file(f)
+message_id = msg['message-id']
 if message_id is not None:
     print(message_id.strip('<> \n'))
 EOF
@@ -457,12 +459,12 @@ message_print_body ()
 import email
 
 def print_msg(file):
-    msg = email.message_from_file(file)
+    msg = email.message_from_binary_file(file)
     for part in msg.walk():
         if part.get_content_type() == 'text/plain':
             print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii'), 'replace'))
 
-print_msg(open('$1', 'r'))
+print_msg(open('$1', 'rb'))
 EOF
 }
 
-- 
2.29.2




More information about the dim-tools mailing list