[Spice-devel] [PATCH 07/17] Get rid of C-style memset initializations, use C++ style aggregates

Fri Feb 23 10:52:14 UTC 2018

> On 22 Feb 2018, at 19:03, Frediano Ziglio <fziglio at redhat.com> wrote:
> 
>> 
>> [Indeed, I had not sent this reply, took time to check standard and
>> compilers]
>> 
>>> On 20 Feb 2018, at 12:31, Frediano Ziglio <fziglio at redhat.com> wrote:
>>> 
>>>>> On 20 Feb 2018, at 10:39, Lukáš Hrázký <lhrazky at redhat.com> wrote:
>>>>> 
>>>>> On Tue, 2018-02-20 at 10:18 +0100, Christophe de Dinechin wrote:
>>>>>>> On Feb 19, 2018, at 7:19 PM, Lukáš Hrázký <lhrazky at redhat.com> wrote:
>>>>>>> 
>>>>>>> On Fri, 2018-02-16 at 17:15 +0100, Christophe de Dinechin wrote:
>>>>>>>> From: Christophe de Dinechin <dinechin at redhat.com>
>>>>>>>> 
>>>>>>>> Signed-off-by: Christophe de Dinechin <dinechin at redhat.com>
>>>>>>>> ---
>>>>>>>> src/spice-streaming-agent.cpp | 47
>>>>>>>> ++++++++++++++++++++++++++-----------------
>>>>>>>> 1 file changed, 28 insertions(+), 19 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/src/spice-streaming-agent.cpp
>>>>>>>> b/src/spice-streaming-agent.cpp
>>>>>>>> index 69c27a3..1e19e43 100644
>>>>>>>> --- a/src/spice-streaming-agent.cpp
>>>>>>>> +++ b/src/spice-streaming-agent.cpp
>>>>>>>> @@ -181,19 +181,24 @@ write_all(int fd, const void *buf, const size_t
>>>>>>>> len)
>>>>>>>>  return written;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> -static int spice_stream_send_format(int streamfd, unsigned w,
>>>>>>>> unsigned
>>>>>>>> h, unsigned c)
>>>>>>>> +static int spice_stream_send_format(int streamfd, unsigned w,
>>>>>>>> unsigned
>>>>>>>> h, uint8_t c)
>>>>>>>> {
>>>>>>>> -
>>>>>>>> -    SpiceStreamFormatMessage msg;
>>>>>>>> -    const size_t msgsize = sizeof(msg);
>>>>>>>> -    const size_t hdrsize  = sizeof(msg.hdr);
>>>>>>>> -    memset(&msg, 0, msgsize);
>>>>>>>> -    msg.hdr.protocol_version = STREAM_DEVICE_PROTOCOL;
>>>>>>>> -    msg.hdr.type = STREAM_TYPE_FORMAT;
>>>>>>>> -    msg.hdr.size = msgsize - hdrsize; /* includes only the body? */
>>>>>>>> -    msg.msg.width = w;
>>>>>>>> -    msg.msg.height = h;
>>>>>>>> -    msg.msg.codec = c;
>>>>>>>> +    const size_t msgsize = sizeof(SpiceStreamFormatMessage);
>>>>>>>> +    const size_t hdrsize  = sizeof(StreamDevHeader);
>>>>>>>> +    SpiceStreamFormatMessage msg = {
>>>>>>>> +        .hdr = {
>>>>>>>> +            .protocol_version = STREAM_DEVICE_PROTOCOL,
>>>>>>>> +            .padding = 0,       // Workaround GCC "not implemented"
>>>>>>>> bug
>>>>>>>> +            .type = STREAM_TYPE_FORMAT,
>>>>>>>> +            .size = msgsize - hdrsize
>>>>>>>> +        },
>>>>>>>> +        .msg = {
>>>>>>>> +            .width = w,
>>>>>>>> +            .height = h,
>>>>>>>> +            .codec = c,
>>>>>>>> +            .padding1 = { }
>>>>>>>> +        }
>>>>>>>> +    };
>>>>>>>>  syslog(LOG_DEBUG, "writing format\n");
>>>>>>>>  std::lock_guard<std::mutex> stream_guard(stream_mtx);
>>>>>>>>  if (write_all(streamfd, &msg, msgsize) != msgsize) {
>>>>>>>> @@ -204,14 +209,18 @@ static int spice_stream_send_format(int
>>>>>>>> streamfd,
>>>>>>>> unsigned w, unsigned h, unsign
>>>>>>>> 
>>>>>>>> static int spice_stream_send_frame(int streamfd, const void *buf,
>>>>>>>> const
>>>>>>>> unsigned size)
>>>>>>>> {
>>>>>>>> -    SpiceStreamDataMessage msg;
>>>>>>>> -    const size_t msgsize = sizeof(msg);
>>>>>>>>  ssize_t n;
>>>>>>>> +    const size_t msgsize = sizeof(SpiceStreamFormatMessage);
>>>>>>>> +    SpiceStreamDataMessage msg = {
>>>>>>>> +        .hdr = {
>>>>>>>> +            .protocol_version = STREAM_DEVICE_PROTOCOL,
>>>>>>>> +            .padding = 0,       // Workaround GCC "not implemented"
>>>>>>>> bug
>>>>>>>> +            .type = STREAM_TYPE_DATA,
>>>>>>>> +            .size = size  /* includes only the body? */
>>>>>>>> +        },
>>>>>>>> +        .msg = {}
>>>>>>>> +    };
>>>>>>> 
>>>>>>> So, someone should find out if we can use the designated initializers,
>>>>>>> I suppose it depends on the compilers on all platforms we care about
>>>>>>> supporting them?
>>>>>>> 
>>>>>>> I wasn't able to find much useful information so far. Anyone knows in
>>>>>>> which version of gcc it was introduced?
>>>>>> 
>>>>>> My experience is that clang has no issue with it in any version.
>>>>>> 
>>>>>> GCC supports it in C++ since at least 4.8.5, as tested on RHEL 7.4 with
>>>>>> the following short example:
>>>>>> 
>>>>>> struct Thing { int x; float y; };
>>>>>> 
>>>>>> int foo()
>>>>>> {
>>>>>> Thing x = { x: 10, y:20 };
>>>>>> Thing y = { .x = 10, .y = 20 };
>>>>>> Thing z { 10, 20 };
>>>>>> Thing t { .x = 10, .y = 20 };
>>>>>> }
>>>>>> 
>>>>>> It has, however, trouble with out-of-order initializations, with the
>>>>>> same
>>>>>> message in 4.8.5 as on Fedora 25 (6.4.1):
>>>>>> 
>>>>>> glop.cpp:9:30: sorry, unimplemented: non-trivial designated initializers
>>>>>> not supported
>>>>>> Thing a { .y = 10, .x = 20 };
>>>>>> 
>>>>>> The “not implemented” message is a bit scary, but the fact that the code
>>>>>> as written has been supported as far back as 4.8.5 makes me confident
>>>>>> that we are not in some experimental section of the compiler.
>>>>> 
>>>>> I've found this on the matter:
>>>>> 
>>>>> https://gcc.gnu.org/ml/gcc/2014-09/msg00207.html
>>>>> 
>>>>> That doesn't give much confidence... Is this documented anywhere? I
>>>>> mean I've only used Google and haven't found anything…
>>>> 
>>>> I think it’s a low priority thing because the “fix” in source code is so
>>>> easy
>>>> (reorder the fields), and the fix complicated. According to the message
>>>> you
>>>> referenced, presently, GCC just checks that the annotations match the
>>>> names,
>>>> but otherwise ignores them and proceeds like for a POD init. If it were to
>>>> be implemented, it would have to support a lot of corner cases mentioned
>>>> in
>>>> the message, so this makes the fix non-trivial.
>>>> 
>>> 
>>> This is not a bug but a feature. In C++20 fields have to be in the right
>>> order so when clang will start implementing correctly the standard will
>>> have to fix it. In C++ fields are initialized in order.
>>> About the title aggregates are also a C style actually as we are supposed
>>> to be C++11 is more a C style than a C++ style.
>>> But is not this that worry me. memset(0) and aggregate initializers are
>>> quite different, is not only a question of style. Consider this:
>>> 
>>>  struct xxx {
>>>     float f;
>>>     int i;
>>>  };
>>>  xxx v { };
>>> 
>>> this initialize f and i to 0.0 and 0 respectively. On some machines
>>> the memset does not set f to 0.0 (depends on floating point
>>> representation).
>> 
>>> But not a big deal, we don't use floating point.
>>> But padding worry me a bit more, consider this:
>>> 
>>> struct xxx {
>>>    uint8_t c;
>>>    uint16_t n;
>>> };
>>> xxx v { 1, 2 }; // or v { .c = 1, .n = 2 };
>> 
>> All padding is initialized to zero in the case of zero-initialization, see
>> http://en.cppreference.com/w/cpp/language/zero_initialization. So if you are
>> really concerned, you can write:
>> 
>> xxx v = { };
>> x.c = 1; x.n = 2;
>> 
> 
> 
> Source:
> 
> void aaa()
> {
>    struct {
>        char c;
>        int i;
>    } x = { 1, 2 };
>    write(0, &x, sizeof(x));
> }
> 
> Code:
> 
> 0000000000000610 <_Z3aaav>:
>     610:       48 83 ec 18             sub    $0x18,%rsp
>     614:       31 ff                   xor    %edi,%edi
>     616:       ba 08 00 00 00          mov    $0x8,%edx
>     61b:       48 89 e6                mov    %rsp,%rsi
>     61e:       c6 04 24 01             movb   $0x1,(%rsp)
>     622:       c7 44 24 04 02 00 00    movl   $0x2,0x4(%rsp)
>     629:       00
>     62a:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
>     631:       00 00
>     633:       48 89 44 24 08          mov    %rax,0x8(%rsp)
>     638:       31 c0                   xor    %eax,%eax
>     63a:       e8 00 00 00 00          callq  63f <_Z3aaav+0x2f>
>     63f:       48 8b 44 24 08          mov    0x8(%rsp),%rax
>     644:       64 48 33 04 25 28 00    xor    %fs:0x28,%rax
>     64b:       00 00
>     64d:       75 05                   jne    654 <_Z3aaav+0x44>
>     64f:       48 83 c4 18             add    $0x18,%rsp
>     653:       c3                      retq
>     654:       e8 00 00 00 00          callq  659 <_Z3aaav+0x49>
>     659:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
> 
> so I assume I have a compiler bug here or that the code you wrote
> is potentially not zero filling the paddings.

No, it’s not a compiler bug. Read-carefully what I wrote. The standard distinguishes zero-initialization with {} and regular initialization with values between the brackets. For zero initialization, there is a guarantee of zero padding. Otherwise there is none. I know, this is stupid, I believe you can thank some Redmond company for that inconsistency.

If your compiler still does that after you do:

struct foo { char c; int i; };
foo x = {};
foo = foo { 1, 2 }

then yes, your compiler has a bug.

> 
>> Now, while your concerns are correct by the word of the standard, they apply
>> equally well to default copy and assignment. So unless we disable default
>> copies on the C structs, we have the same risk there, including when we
>> return values…
>> 
>> It’s not a real risk, though, for multiple different reasons, the primary one
>> being that we should *never ever* depend on the value padding!!! If we do,
>> we have a more serious issue elsewhere (see below), and whoever did that
>> deserves to spend time debugging his own mess ;-)
>>> 
>>> now, we know there's a byte between c and n. What's the value of this byte?
>>> In the case of memset is 0, in the case of aggregate initializers... we
>>> don't
>>> know! That is only the bytes occupied by the fields are initialized, not
>>> the
>>> padding. Currently is not a big issue, there are no implicit holes and all
>>> bytes are initialized but is this became false we would potentially have a
>>> security issue as we are sending the full raw structure to an external
>>> entity.
>> 
>> While I understand what you are saying, the problem would only arise if the
>> following chain of events happened:
>> 
>> 1) We insert a new field in an (presently nonexistent) implicit padding
>> 2) We rely on that insertion not relocating the fields behind, i.e. we do it
>> “on purpose” because we are so “smart” (asking for trouble)
>> 3) We do so without also adding capabilities or version check and assume the
>> field exists on the other end (really asking for trouble)
>> 4) We access the field in new software that talks to old software which does
>> not have the field (REALLY asking for trouble)
>> 5) We rely on that new field being initialized at 0 by the “legacy” other
>> end, when we know it’s not true (REALLY REALLY asking for trouble)
>> 6) We depend on the struct never being copied on either side, which would
>> erase that zero guarantee (Uh oh!)
>> 7) We never thought of using the “packed” attribute for that struct… (at that
>> point, why bother mentioning this?)
>> 8) We compiled with gcc at -O0 to ensure it would not zero-init the padding
>> 9) There is a security issue due to the 8 steps above, but we claim the
>> problem is to use C++-style init
>> 
> 
> Only 7 apply here for good reasons. Try to see why ip protocol take into account alignment.

Yes, but you need all others at the same time for the problem to occur. It’s not an “or”, it’s an “and”.

> 
>> That seems far-fetched enough that I’d argue we have more pressing concerns.
>> 
>>> At the moment the entity should be considered more secure but for
>>> instance moving the same code on the server size would cause a security
>>> issue. To avoid that any future structure should have no implicit padding
>>> but this imposes an implementation detail of the protocol definition in a
>>> project written in C (spice-protocol) just to satisfy some much higher
>>> level style detail. It does seem to me quite easy to break this is the
>>> future. In this respect also there are some condition which are even harder
>>> to avoid. Consider something like
>>> 
>>> struct xxx {
>>>    uint8_t type_of_union;
>>>    uint8_t padding;
>>>    union {
>>>       uint8_t type1;
>>>       uint16_t type2;
>>>    }
>>> };
>>> xxx v { };
>>> 
>>> now all the bytes are covered by fields however the last byte is not
>>> potentially initialized as C++ mandate that only first no static field
>>> of an union is initialized.
>>> 
>>> Yes, aggregate initialization is a bit easier to read but seems that
>>> the cost is potentially high.
>> 
>> None of the cases you evoke happens in our current codebase. We have explicit
>> padding, no unions. So not a problem today and in foreseeable future.
>> 
> 
> I said that. Current code is not affected but what I'm complaining is
> that is not robust and requires attentions.

I understood that. My counter-point was that the robustness issue is NOT in the initializers, since the same (lack of) guarantees also applies to default copy constructors and assignments.

> 
>>> 
>>>>> 
>>>>> OTOH, if the extension is robust against random mistakes and typos, and
>>>>> GCC and clang are the only compilers we care about, I think it should
>>>>> be fine?
>>>> 
>>>> If we agree that a message that contains “not supported” if we mess up is
>>>> OK,
>>>> I think that’s fine.
>>>> 
>>>> However, I believe I need to beef up the comment and explain what the
>>>> message
>>>> is and what the fix is.
>>>> 
>>>>> 
>>>>> Lukas
>>>>> 
>>>>>> The alternatives are:
>>>>>> - Not tagging fields at all
>>>>>> - Tagging them “the old way”, i.e. field:value, but that has been
>>>>>> deprecated since 2.5 and actually has the same bug as .field = value, so
>>>>>> no benefit.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Lukas
>>>>>>> 
>>>>>>>> -    memset(&msg, 0, msgsize);
>>>>>>>> -    msg.hdr.protocol_version = STREAM_DEVICE_PROTOCOL;
>>>>>>>> -    msg.hdr.type = STREAM_TYPE_DATA;
>>>>>>>> -    msg.hdr.size = size; /* includes only the body? */
>>>>>>>>  std::lock_guard<std::mutex> stream_guard(stream_mtx);
>>>>>>>>  n = write_all(streamfd, &msg, msgsize);
>>>>>>>>  syslog(LOG_DEBUG,
>>>>>>>> @@ -379,7 +388,7 @@ do_capture(int streamfd, const char *streamport,
>>>>>>>> FILE *f_log)
>>>>>>>> 
>>>>>>>>          if (frame.stream_start) {
>>>>>>>>              unsigned width, height;
>>>>>>>> -                unsigned char codec;
>>>>>>>> +                uint8_t codec;
>>>>>>>> 
>>>>>>>>              width = frame.size.width;
>>>>>>>>              height = frame.size.height;
>>> 
>>> Maybe even better if the type is SpiceVideoCodecType here or use auto
>>> to avoid any possible truncation.
>> 
>> Makes sense to change for clarity, but StreamMsgFormat codec is uint8_t.
>> 
> 
> I know, C has no specification for enumeration size so we can't use enums
> in a binary structure.

So there is a cast / truncation somewhere (with no risk given the values of the enum). The question is where do we put it.

My choice was that Message<…> encapsulates a StreamXYZ, so I respected the datatypes in StreamXYZ, not the data types in the surrounding code. Does that make sense?

> 
> Frediano