[Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations

Fri Jun 26 08:41:32 PDT 2015

On 26/06/15 14:53, Francisco Jerez wrote:
> Davin McCall <davmac at davmac.org> writes:
>
>> On 26/06/15 13:18, Francisco Jerez wrote:
>>> Davin McCall <davmac at davmac.org> writes:
>>>
>>>> On 26/06/15 11:08, Erik Faye-Lund wrote:
>>>>> On Thu, Jun 25, 2015 at 1:48 AM, Davin McCall <davmac at davmac.org> wrote:
>>>>>> This is an alternative to my earlier patch [1] (and it is now constructed
>>>>>> properly using git format-patch).
>>>>>>
>>>>>> Quick background:
>>>>>> There is a problem in exec_list due to it directly including a trio
>>>>>> of 'struct exec_node *' members to implement two overlapping sentinel
>>>>>> nodes. The sentinel nodes do not exist as exec_node objects and so
>>>>>> should not be accessed as such, according to C99 6.5 paragraph 7.
>>>>>> When this strict aliasing rule is violated the compiler may re-order
>>>>>> reads and writes in unexpected ways, such as demonstrated in another
>>>>>> email [2].
>>>>>>
>>>>>> The problem only manifests if compiling without -fno-strict-aliasing.
>>>>>>
>>>>>> This patch addresses the issue by introducing some new methods for
>>>>>> setting the 'next' and 'prev' members of the exec_node structure, which
>>>>>> avoid the aliasing restrictions by way of casting the exec_node pointer
>>>>>> (to an exec_node-pointer-pointer) whenever it may actual point to a
>>>>>> sentinel node. Essentially an exec_node can be seen as an array of two
>>>>>> exec_node pointers, and this view is compatible with the sentinel
>>>>>> structure in exec_list.
>>>>>>
>>>>>> Compared to the previous patch, this patch is much less intrusive, and
>>>>>> does not increase the storage requirements of the exec_list structure.
>>>>>>
>>>>>> While I'm not proposing that -fno-strict-aliasing no longer be used for
>>>>>> Mesa builds, this patch represents a step in that direction. With this
>>>>>> patch applied, a working Mesa library can be built, although bugs may
>>>>>> be present (and could be triggered only when using particular compiler
>>>>>> versions and options). FWIW file sizes with and without strict aliasing:
>>>>>>
>>>>>> (gcc 4.8.4, -O3 -fomit-frame-pointer -march=corei7).
>>>>>>
>>>>>>                -fno-strict-aliasing:        with strict aliasing:
>>>>>> libGL.so          699188                  699188    (no change)
>>>>>> *_dri.so         9575876                 9563104    (-2772)
>>>>>>
>>>>>> (dri bundle includes r300, r600, kms_swrast and swrast).
>>>>>>
>>>>>> So, not a huge win, size-wise. Dave Airlie reports a 30K difference in
>>>>>> his r600_dri.so build however [3].
>>>>>>
>>>>>> In terms of performance:
>>>>>>
>>>>>> (export LIBGL_ALWAYS_SOFTWARE=1; time glmark2)
>>>>>>
>>>>>> -fno-strict-aliasing:
>>>>>>
>>>>>> glmark2 Score: 244
>>>>>> real    5m34.707s
>>>>>> user    11m36.192s
>>>>>> sys    0m29.596s
>>>>>>
>>>>>> with strict aliasing:
>>>>>>
>>>>>> glmark2 Score: 247
>>>>>> real    5m34.438s
>>>>>> user    11m29.904s
>>>>>> sys    0m29.556s
>>>>>>
>>>>>> Again, only a very small improvement when strict aliasing is enabled.
>>>>>>
>>>>>> With the above in mind it is reasonable to question whether this patch
>>>>>> is worthwhile. However, it's done, and it seems to work, so I offer it
>>>>>> for review.
>>>>>>
>>>>>>
>>>>>> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-June/087179.html
>>>>>> [2] http://lists.freedesktop.org/archives/mesa-dev/2015-June/087246.html
>>>>>> [3] http://lists.freedesktop.org/archives/mesa-dev/2015-June/087206.html
>>>>>> ---
>>>>>> [...]
>>>>> [...] 
>>> That may be right, but it still seems kind of difficult to enforce that
>>> other users of the linked list won't end up breaking the strict aliasing
>>> rule by accident unless we change its data structures completely
>>> (e.g. you still have exec_list objects pointed to by exec_node pointers
>>> what makes it really tempting for other users to bypass the wrappers you
>>> have introduced and break the rule again).
>> Yes. I believe there are such cases in the code already; the patch does
>> not set out to correct all the aliasing violations. Such a patch would
>> be quite large. This was an attempt to make some progress, not resolve
>> the problem completely.
>>
> Yes, of course.  The question is whether it's worth the considerable
> amount of work you mention to fix all users of a linked list
> implementation that by design encourages them to violate the strict
> aliasing rule.  The amount of effort required to migrate users to a
> safer linked list implementation may be of the same order of magnitude.

Well, you could give 'next' and 'prev' nastier names easily enough :) If 
it was purely C++ this would probably be easier, because 'next' and 
'prev' could be made private.

Anyway, I take your point.

>
>>> Another issue is that AFAICT this introduces a new kind of UB e.g. when
>>> you do "((const struct exec_node **)n)[1]".  The layout of the exec_node
>>> (or exec_list) object "n" points to is unspecified, and the array
>>> subscript past the end of the pointed-to object is undefined because of
>>> section 6.5.6 of the C99 spec regarding the additive operators:
>>>
>>> | 7 For the purposes of these operators, a pointer to an object that is
>>> |   not an element of an array behaves the same as a pointer to the
>>> |   first element of an array of length one with the type of the object
>>> |   as its element type.
>>> |
>>> | 8 [..] If the result points one past the last element of the array
>>> |   object, it shall not be used as the operand of a unary * operator
>>> |   that is evaluated.
>>>
>>> (Array subscripting is later on defined in terms of the '+' and '*'
>>> operators)
>> It's true that this is what the standard says, but this is one of those
>> cases where actual compiler implementations are more permissive than the
>> standard, as a matter of practicality.
>>
>> That being said, the code could be made to work without the array
>> subscripting by casting first to 'char *' before doing the pointer
>> arithmetic, and using 'offsetof' to be certain of the correct address of
>> the next/prev members.
>>
> That sounds a less dubious, but...
>
>>> That said, this problem is IMHO worth fixing regardless of the
>>> performance improvement, this linked list implementation relies on UB
>>> which we work around by passing a non-standard compiler flag to, AFAIK,
>>> only GCC and Clang, there's no guarantee that other compilers won't take
>>> advantage of the strict aliasing rule and miscompile this code.
>> I agree.
>>
>>> We may
>>> just as well replace this linked list with another of the many
>>> implementations we have, as someone else suggested earlier.
>> The main problem is that the exec_node structure allows making a
>> determination of whether a node is at the head or the end of the list
>> without requiring a reference to the list structure itself. The list
>> implementation in src/util/list.h is on the other hand a simple
>> doubly-linked list where the list head is also the list tail and neither
>> can be distinguished from other list nodes (for exec_node, the next
>> pointer will be NULL in the tail sentinel, and the prev pointer will be
>> NULL in the head sentinel).
>>
> Right.
>
>> The amount of effort required to use the src/util/list.h implementation
>> wherever the exec_list is currently used would be large, and would
>> probably incur a memory overhead because in various places you would
>> need to store a reference to the list as well as the node where you
>> previously only needed a reference to the node.
>>
> I'd be surprised if it incurred any measurable performance penalty not
> offset by the improvement from '-fstrict-aliasing', might be worth
> checking.
>
>> The other alternative would be to make src/util/list.h use the exec_list
>> implementation, and hope that there's no code that requires on the list
>> being circular. You'd still have the aliasing violation to contend with,
>> too, unless you use my patch. So, even with this as an ultimate goal,
>> the patch I've provided makes progress. If there was concern about the
>> casting hoops necessary to avoid strict aliasing, the alternative would
>> be to use version 1 of the patch, which split the head sentinel and tail
>> sentinel into two nodes.
>>
> Your first approach seemed quite reasonable IMHO.  Were you able to
> measure any performance regression from it?

No, but I need to do a more thorough evaluation. I'll do this when I get 
the chance.

Thanks,

Davin