[Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations

Francisco Jerez currojerez at riseup.net
Fri Jun 26 06:53:46 PDT 2015

Davin McCall <davmac at davmac.org> writes:

> On 26/06/15 13:18, Francisco Jerez wrote:
>> Davin McCall <davmac at davmac.org> writes:
>>> On 26/06/15 11:08, Erik Faye-Lund wrote:
>>>> On Thu, Jun 25, 2015 at 1:48 AM, Davin McCall <davmac at davmac.org> wrote:
>>>>> This is an alternative to my earlier patch [1] (and it is now constructed
>>>>> properly using git format-patch).
>>>>> Quick background:
>>>>> There is a problem in exec_list due to it directly including a trio
>>>>> of 'struct exec_node *' members to implement two overlapping sentinel
>>>>> nodes. The sentinel nodes do not exist as exec_node objects and so
>>>>> should not be accessed as such, according to C99 6.5 paragraph 7.
>>>>> When this strict aliasing rule is violated the compiler may re-order
>>>>> reads and writes in unexpected ways, such as demonstrated in another
>>>>> email [2].
>>>>> The problem only manifests if compiling without -fno-strict-aliasing.
>>>>> This patch addresses the issue by introducing some new methods for
>>>>> setting the 'next' and 'prev' members of the exec_node structure, which
>>>>> avoid the aliasing restrictions by way of casting the exec_node pointer
>>>>> (to an exec_node-pointer-pointer) whenever it may actual point to a
>>>>> sentinel node. Essentially an exec_node can be seen as an array of two
>>>>> exec_node pointers, and this view is compatible with the sentinel
>>>>> structure in exec_list.
>>>>> Compared to the previous patch, this patch is much less intrusive, and
>>>>> does not increase the storage requirements of the exec_list structure.
>>>>> While I'm not proposing that -fno-strict-aliasing no longer be used for
>>>>> Mesa builds, this patch represents a step in that direction. With this
>>>>> patch applied, a working Mesa library can be built, although bugs may
>>>>> be present (and could be triggered only when using particular compiler
>>>>> versions and options). FWIW file sizes with and without strict aliasing:
>>>>> (gcc 4.8.4, -O3 -fomit-frame-pointer -march=corei7).
>>>>>               -fno-strict-aliasing:        with strict aliasing:
>>>>> libGL.so          699188                  699188    (no change)
>>>>> *_dri.so         9575876                 9563104    (-2772)
>>>>> (dri bundle includes r300, r600, kms_swrast and swrast).
>>>>> So, not a huge win, size-wise. Dave Airlie reports a 30K difference in
>>>>> his r600_dri.so build however [3].
>>>>> In terms of performance:
>>>>> (export LIBGL_ALWAYS_SOFTWARE=1; time glmark2)
>>>>> -fno-strict-aliasing:
>>>>> glmark2 Score: 244
>>>>> real    5m34.707s
>>>>> user    11m36.192s
>>>>> sys    0m29.596s
>>>>> with strict aliasing:
>>>>> glmark2 Score: 247
>>>>> real    5m34.438s
>>>>> user    11m29.904s
>>>>> sys    0m29.556s
>>>>> Again, only a very small improvement when strict aliasing is enabled.
>>>>> With the above in mind it is reasonable to question whether this patch
>>>>> is worthwhile. However, it's done, and it seems to work, so I offer it
>>>>> for review.
>>>>> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-June/087179.html
>>>>> [2] http://lists.freedesktop.org/archives/mesa-dev/2015-June/087246.html
>>>>> [3] http://lists.freedesktop.org/archives/mesa-dev/2015-June/087206.html
>>>>> ---
>>>>>    src/glsl/list.h | 123
>>>>> ++++++++++++++++++++++++++++++++++++--------------------
>>>>>    1 file changed, 80 insertions(+), 43 deletions(-)
>>>>> diff --git a/src/glsl/list.h b/src/glsl/list.h
>>>>> index 15fcd4a..cfbe5a9 100644
>>>>> --- a/src/glsl/list.h
>>>>> +++ b/src/glsl/list.h
>>>>> @@ -76,6 +76,12 @@
>>>>>    #include "util/ralloc.h"
>>>>>    struct exec_node {
>>>>> +   /**
>>>>> +    * Accessing these fields directly may be ill-advised; if the
>>>>> 'exec_node'
>>>>> +    * is actually a sentinel node embedded in the exec_list structure, it
>>>>> may
>>>>> +    * be a strict-aliasing violation (C99 6.5 paragraph 7). Use the methods
>>>>> +    * provided instead.
>>>>> +    */
>>>>>       struct exec_node *next;
>>>>>       struct exec_node *prev;
>>>>> @@ -140,35 +146,55 @@ exec_node_init(struct exec_node *n)
>>>>>       n->prev = NULL;
>>>>>    }
>>>>> +/**
>>>>> + * Strict-aliasing safe method for setting the next pointer for any
>>>>> + * node, including sentinel nodes.
>>>>> + */
>>>>> +static inline void
>>>>> +exec_node_set_next(struct exec_node *n, struct exec_node *next)
>>>>> +{
>>>>> +   ((struct exec_node **)n)[0] = next;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * Strict-aliasing safe method for setting the next pointer for any
>>>>> + * node, including sentinel nodes.
>>>>> + */
>>>>> +static inline void
>>>>> +exec_node_set_prev(struct exec_node *n, struct exec_node *next)
>>>>> +{
>>>>> +   ((struct exec_node **)n)[1] = next;
>>>>> +}
>>>>> +
>>>>>    static inline const struct exec_node *
>>>>>    exec_node_get_next_const(const struct exec_node *n)
>>>>>    {
>>>>> -   return n->next;
>>>>> +   return ((const struct exec_node **)n)[0];
>>>>>    }
>>>> How exactly is this supposed to be strict-aliasing safe?
>>>> Here's the wording from the C++14 spec:
>>>> "If a program attempts to access the stored value of an object through
>>>> a glvalue of other than one of the following types the behavior is
>>>> undefined:
>>>> * the dynamic type of the object,
>>>> * a cv-qualified version of the dynamic type of the object,
>>>> * a type similar (as defined in 4.4) to the dynamic type of the object,
>>>> * a that is the signed or unsigned type corresponype tding to the
>>>> dynamic type of the object,
>>>> * a type that is the signed or unsigned type corresponding to a
>>>> cv-qualified version of the dynamic type of the object,
>>>> * an aggregate or union type that includes one of the aforementioned
>>>> types among its elements or non-static data members (including,
>>>> recursively, an element or non-static data member of a subaggregate or
>>>> contained union),
>>>> * a type that is a (possibly cv-qualified) base class type of the
>>>> dynamic type of the object,
>>>> * a char or unsigned char type."
>>>> You cast an 'struct exec_node *' to 'struct exec_node **', and read
>>>> it. Those are different types, and are not among the exceptions listed
>>>> above...
>>> In for eg:
>>>      return ((const struct exec_node **)n)[0]
>>> ... The stored value of 'n' is not accessed by any other type than the
>>> type of n itself. This value is then cast to a different pointer type.
>>> You are mistaken if you think that the cast accesses the stored value of
>>> n. The other "stored value" access that it occurs in that expression is
>>> to the object pointed at by the result of the cast. As the exec_node
>>> structure is defined as:
>>>       struct exec_node *next;
>>>       struct exec_node *prev;
>>> ... then the value being accessed is the 'next' member of the exec_node
>>> struct, not the exec_node struct itself. I.e. store value of type
>>> 'struct exec_node *' is being accessed via a glvalue of that same type.
>>> Putting it another way, casting some pointer p, of type 'struct
>>> exec_node *', to 'struct exec node **', is effectively the same as '&
>>> p->next', but without the de-reference of p (which is the potential an
>>> aliasing violation that the patch avoids).
>>> Although I don't think this is your concern, the cast is valid due to:
>>> C99 p13:
>>> "[...]. A pointer to a structure object, suitably converted, points to
>>> its initial member (or if that member is a
>>> bit-field, then to the unit in which it resides), and vice versa."
>>> I don't have C++14, so I've checked C++11. I'm less familiar with the
>>> C++ standard text in general, so I'm not sure where exactly the
>>> equivalent to the C99 is, but I am quite certain that the rules are
>>> essentially the same.
>>> Perhaps this snippet is all that's really required:
>>> C++11 1.8: "Unless an object is a bit-field or a base class subobject of
>>> zero size, the address of that object is the address
>>> of the first byte it occupies."
>>> There's also 4.10p2:
>>> "A prvalue of type “pointer to cv T,” where T is an object type, can be
>>> converted to a prvalue of type “pointer to cv void”. The result of
>>> converting a “pointer to cv T” to a “pointer to cv void” points to the
>>> start of the storage location where the object of type T resides [...]"
>>> ... which to some extent implies that maybe you should cast first to
>>> "void *" then to the member type, but in practice this isn't necessary.
>> That may be right, but it still seems kind of difficult to enforce that
>> other users of the linked list won't end up breaking the strict aliasing
>> rule by accident unless we change its data structures completely
>> (e.g. you still have exec_list objects pointed to by exec_node pointers
>> what makes it really tempting for other users to bypass the wrappers you
>> have introduced and break the rule again).
> Yes. I believe there are such cases in the code already; the patch does 
> not set out to correct all the aliasing violations. Such a patch would 
> be quite large. This was an attempt to make some progress, not resolve 
> the problem completely.

Yes, of course.  The question is whether it's worth the considerable
amount of work you mention to fix all users of a linked list
implementation that by design encourages them to violate the strict
aliasing rule.  The amount of effort required to migrate users to a
safer linked list implementation may be of the same order of magnitude.

>> Another issue is that AFAICT this introduces a new kind of UB e.g. when
>> you do "((const struct exec_node **)n)[1]".  The layout of the exec_node
>> (or exec_list) object "n" points to is unspecified, and the array
>> subscript past the end of the pointed-to object is undefined because of
>> section 6.5.6 of the C99 spec regarding the additive operators:
>> | 7 For the purposes of these operators, a pointer to an object that is
>> |   not an element of an array behaves the same as a pointer to the
>> |   first element of an array of length one with the type of the object
>> |   as its element type.
>> |
>> | 8 [..] If the result points one past the last element of the array
>> |   object, it shall not be used as the operand of a unary * operator
>> |   that is evaluated.
>> (Array subscripting is later on defined in terms of the '+' and '*'
>> operators)
> It's true that this is what the standard says, but this is one of those 
> cases where actual compiler implementations are more permissive than the 
> standard, as a matter of practicality.
> That being said, the code could be made to work without the array 
> subscripting by casting first to 'char *' before doing the pointer 
> arithmetic, and using 'offsetof' to be certain of the correct address of 
> the next/prev members.

That sounds a less dubious, but...

>> That said, this problem is IMHO worth fixing regardless of the
>> performance improvement, this linked list implementation relies on UB
>> which we work around by passing a non-standard compiler flag to, AFAIK,
>> only GCC and Clang, there's no guarantee that other compilers won't take
>> advantage of the strict aliasing rule and miscompile this code.
> I agree.
>> We may
>> just as well replace this linked list with another of the many
>> implementations we have, as someone else suggested earlier.
> The main problem is that the exec_node structure allows making a 
> determination of whether a node is at the head or the end of the list 
> without requiring a reference to the list structure itself. The list 
> implementation in src/util/list.h is on the other hand a simple 
> doubly-linked list where the list head is also the list tail and neither 
> can be distinguished from other list nodes (for exec_node, the next 
> pointer will be NULL in the tail sentinel, and the prev pointer will be 
> NULL in the head sentinel).

> The amount of effort required to use the src/util/list.h implementation 
> wherever the exec_list is currently used would be large, and would 
> probably incur a memory overhead because in various places you would 
> need to store a reference to the list as well as the node where you 
> previously only needed a reference to the node.
I'd be surprised if it incurred any measurable performance penalty not
offset by the improvement from '-fstrict-aliasing', might be worth

> The other alternative would be to make src/util/list.h use the exec_list 
> implementation, and hope that there's no code that requires on the list 
> being circular. You'd still have the aliasing violation to contend with, 
> too, unless you use my patch. So, even with this as an ultimate goal, 
> the patch I've provided makes progress. If there was concern about the 
> casting hoops necessary to avoid strict aliasing, the alternative would 
> be to use version 1 of the patch, which split the head sentinel and tail 
> sentinel into two nodes.
Your first approach seemed quite reasonable IMHO.  Were you able to
measure any performance regression from it?


> Davin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150626/206bcad6/attachment.sig>

More information about the mesa-dev mailing list