[PATCH] dim: replace message characters leading to decoding errors with U+FFFD
Jani Nikula
jani.nikula at intel.com
Mon Nov 23 09:37:29 UTC 2020
Anyone care to review, please?
For convenience, see [1] and [2] for what's going on.
BR,
Jani.
[1] https://docs.python.org/3/howto/unicode.html#the-string-type
[2] https://docs.python.org/3/library/stdtypes.html#bytes.decode
On Wed, 18 Nov 2020, Jani Nikula <jani.nikula at intel.com> wrote:
> The character set decoding added in commit b66d07db11e5 ("dim: decode
> email message content charset to unicode") started failing with unicode
> decoding failures under certain conditions. (Specifically python 3 and
> mboxes downloaded from patchwork.)
>
> Instead of raising UnicodeDecodeErrors, replace values that can't be
> converted with U+FFFD (REPLACEMENT CHARACTER, �).
>
> Reported-by: Dave Airlie <airlied at gmail.com>
> Cc: Dave Airlie <airlied at gmail.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Signed-off-by: Jani Nikula <jani.nikula at intel.com>
> ---
> dim | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/dim b/dim
> index 1be1435a1a52..1572cf33f25c 100755
> --- a/dim
> +++ b/dim
> @@ -460,7 +460,7 @@ def print_msg(file):
> msg = email.message_from_file(file)
> for part in msg.walk():
> if part.get_content_type() == 'text/plain':
> - print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))
> + print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii'), 'replace'))
>
> print_msg(open('$1', 'r'))
> EOF
--
Jani Nikula, Intel Open Source Graphics Center
More information about the dim-tools
mailing list