[Spice-devel] [PATCH spice-gtk 3/4] util: add unix2dos and dos2unix

Sat Aug 24 06:07:04 PDT 2013

Hi,

On 08/24/2013 02:56 PM, Marc-André Lureau wrote:
>
>
> ----- Mensaje original -----
>> Hi,
>>
>> On 08/24/2013 02:32 PM, Marc-André Lureau wrote:
>>>
>>>
>>> ----- Mensaje original -----
>>>> Hi,
>>>>
>>>> On 08/24/2013 02:17 PM, Marc-André Lureau wrote:
>>>>
>>>> <snip>
>>>>
>>>>>>> +
>>>>>>> +    if (!g_utf8_validate(str, len, NULL)) {
>>>>>>> +        g_set_error_literal(error, G_CONVERT_ERROR,
>>>>>>> +                            G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
>>>>>>> +                            "Invalid byte sequence in conversion
>>>>>>> input");
>>>>>>> +        return -1;
>>>>>>> +    }
>>>>>>
>>>>>>
>>>>>> And once you simply treat this as a regular C-string without worrying
>>>>>> about multi-byte encodings you can also drop this.
>>>>>
>>>>> Actually, during implementation, I have encountered/produced invalid
>>>>> utf8 that will break later on in gtk+, so I prefer to validate the
>>>>> production.
>>>>
>>>> Thinking more about this, if we want to do utf-8 validation, it should not
>>>> be done here, but rather in gtk/channel-main.c, since this code only gets
>>>> called in certain guest-line-end + direction cases, and if we want to do
>>>> utf-8 validation we should always do it.
>>>
>>> Perhaps, although the difference is that here we do parse/modify the
>>> string,
>>> so it's important to check we don't produce garbage.
>>
>> Right, but since garbage in = garbage out, you're not only checking that
>> the conversion code did not foo-bar, you're also validating the original
>> input,
>> at which point it makes sense to me to always do that even when not doing
>> conversion.
>
> In one case, it's a pass-through, the caller and the destination are responsible for validation.
>
> But here, we do parse and modify, so it's necessary to validate.
>
> I am not stricly against validating all the time utf8, but I don't think it belongs to the messenger.

I agree that validation is best left up to the receiver, but in that case we should simply
never verify, as I suggested in the first place. line-ending conversion only inserts / removes
single-byte characters, and since these can never be part of a multi-byte character in UTF-8,
we cannot make the input any more (or less) broken then it was.

I really think we are doing ourselves a disservice by validating only when doing line-ending
conversion, since we will then likely get difficult to debug bugs, where we get non valid utf-8
in, and end up rejecting it only in some cases (while most receivers will likely accept it and
make the best out of it). Following the receiver should validate (and decide whether to outright
reject, or simply insert some ? chars or some such) reasoning to its logical conclusion,
we should simply never validate.

Regards,

Hans