[systemd-devel] Antw: [EXT] Re: Q: non-ASCII in syslog

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Apr 28 07:37:41 UTC 2022


>>> Lennart Poettering <lennart at poettering.net> schrieb am 27.04.2022 um 13:10
in
Nachricht <Ymkksza00BPhDMGq at gardel-login>:
> On Mi, 27.04.22 09:09, Ulrich Windl (Ulrich.Windl at rz.uni-regensburg.de) 
> wrote:
> 
>> Hi!
>>
>> Having written an RFC 3164 compatible syslog daemon, I noticed that
systemd
>> created syslog messages with non-ASCII characters.
>> The problem is that a remote syslogd can hardly guess the correct
character
>> set (I'm using rsyslog to forward local messages to a remote
>> server).
> 
> It's 2022. I think at this point, software should always assume the
> charset is UTF-8 if it doesn't have an reason to believe otherwise.
> 
> It's kinda what we started to do all across our codebase really. We'll
> use UTF-8 for everything by default. For some things where people
> complain sufficeintly loudly we'll conditionalize them so that we have
> some fallback in place if we know for sure UTF-8 is not OK, but the
> default we do is always and everywhere UTF-8.
> 
>> Example of such message:
>> systemd-tmpfiles[3311]: [/usr/lib/tmpfiles.d/svnserve.conf:1] Line 
> references
>> path below legacy directory /var/run/, updating /var/run/svnserve →
>> /run/svnserve; please update the tmpfiles.d/ drop-in file accordingly.
>>
>> (The arrow is encoded as three bytes (\xe2\x86\x92))
>>
>> RFC 5425 syslog messages require the use of a BOM (%xEF.BB.BF) at the
>> beginning of a message if the message used UTF-8:
> 
> We do not implement RFC 5425, as glibc doesn't support that. In fact
> we don't even implement RFC 3164 in full, since glibc generates the
> messages in a very specific format only.
> 
>>
>>       MSG             = MSG-ANY / MSG-UTF8
>>       MSG-ANY         = *OCTET ; not starting with BOM
>>       MSG-UTF8        = BOM UTF-8-STRING
>>       BOM             = %xEF.BB.BF
>>
>> Wouldn't it make sense to add such a BOM for RFC 3164 syslog messages also

> if
>> non-ASCII (i.e.: UTF-8) encoded characters are used?
> 
> There's plenty software that doesn't support RFC 5425, and putting a
> BOM first is certainly not implemented in any of those. I think BOM is
> hideous and defaulting to UTF-8 generally safe. If we'd put BOM first,
> these messages would likely not be compatible with a large variety of
> consumers anymore, because they can't handle BOM. This would be worse

That's a non-argument:
You say you don't adhere to any of the standards, and claim if you would do,
things would break. ???

> than the status quo I am sure, since if we just send UTF-8 things
> should generally just work fine for any software that either a) also
> defaults to UTF-8 when encountering an 8bit char or b) is agonistic to
> charsets and just passes data thorugh.

Yes, put the head in the sand hoping problems are gone when you look up
again... ;-)

> 
> So, yeah, we might be stretching stdandards and tradition a bit, but
> it actually works out quite well so far.

A good argument for driving without a saftey-belt, BTW.

Regards,
Ulrich

> 
> Lennart
> 
> --
> Lennart Poettering, Berlin





More information about the systemd-devel mailing list