[Mesa-dev] [PATCH] Fix strict-aliasing violations in GLSL shader list implementation

Ian Romanick idr at freedesktop.org
Tue Jun 23 19:35:14 PDT 2015

On 06/23/2015 07:01 PM, Dave Airlie wrote:
> On 24 June 2015 at 11:44, Ian Romanick <idr at freedesktop.org> wrote:
>> On 06/24/2015 03:59 PM, Davin McCall wrote:
>>> Hi Ian,
>>> On 23/06/15 23:26, Ian Romanick wrote:
>>>> On 06/23/2015 02:36 PM, Thomas Helland wrote:
>>>>> 2015-06-24 23:05 GMT+02:00 Davin McCall <davmac at davmac.org>:
>>>>>> Hi - I'm new here.
>>>>>> I've recently started poking the Mesa codebase for little reason other than
>>>>>> personal interest. In the "help wanted" section of the website it mentions
>>>>>> aliasing violations as a target for newcomers to fix, so with that in mind
>>>>>> I've attached a patch (against git head) which resolves a few of them, by
>>>>>> targeting the linked list implementation (list.h) used in the GLSL
>>>>>> compiler/optimizers. This change slightly increases the storage requirements
>>>>>> for a list (adds one word) but resolves the blatant aliasing violation that
>>>>>> was caused by the trick used to conserve that word in the first place.
>>>>>> (I toyed with another approach - using a single sentinel node for both the
>>>>>> head and tail of a list - but this was much more invasive, and meant that
>>>>>> you could no longer check whether a particular node was a sentinel node
>>>>>> unless you had a reference to the list, so I gave up and went with this
>>>>>> simpler approach).
>>>>>> The most essential change is in the 'exec_list' structure. Three fields
>>>>>> 'head', 'tail' and 'tail_pred' are removed, and two separate sentinel nodes
>>>>>> are inserted in their place. The old 'head' is replaced by
>>>>>> 'head_sentinel.next', 'tail_pred' by 'tail_sentinel.prev', and tail (always
>>>>>> NULL) by 'head_sentinel.prev' and 'tail_sentinel.next' (both always NULL).
>>>> NAK.  The datastructure is correct as-is.  It has been in common use
>>>> since at least 1985.  See the references in the header file.
>>> I understand the data structure and how it is supposed to work; the
>>> issue is that the trick it employs cannot be implemented in C without
>>> breaking the strict aliasing rules (or at least, the current
>>> implementation in Mesa breaks the strict aliasing rules). The current
>>> code works correctly only with the -fno-strict-aliasing compiler option.
>>> The issue is that a pair of 'exec_node *' do not constitute an exec_node
>>> in the eyes of the language spec, even though exec_node is declared as
>>> holding two such pointers. Consider (from src/glsl/list.h):
>>>     static inline void
>>>     exec_list_make_empty(struct exec_list *list)
>>>     {
>>>        list->head = (struct exec_node *) & list->tail;
>>>        list->tail = NULL;
>>>        list->tail_pred = (struct exec_node *) & list->head;
>>>     }
>>> 'list->head' is of type 'struct exec_node *', and so should point at a
>>> 'struct  exec_node'. In the code above it is instead co-erced to point
>>> at a 'struct exec_node *' (list->tail). That in itself doesn't break the
>>> alias rules, but then:
>>>     static inline bool
>>>     exec_node_is_tail_sentinel(const struct exec_node *n)
>>>     {
>>>        return n->next == NULL;
>>>     }
>>> In 'exec_node_is_tail_sentinel', the sentinel is not actually an
>>> exec_node - it is &list->tail. So, if the parameter n does refer to the
>>> sentinel, then it does not point to an actual exec_node structure.
>>> However, it is de-referenced (by 'n->next') which breaks the strict
>>> aliasing rules. This means that the method above can only ever return
>>> false, unless it violates the aliasing rules.
>>> (The above method could be fixed by casting n to an 'struct exec_node
>>> **' and then comparing '**n' against NULL. However, there are other
>>> similar examples throughout the code that I do not think would be so
>>> trivial).
>>> I can quote the relevant parts of the standard if necessary, but your
>>> tone somewhat implies that you wouldn't even consider this patch?
>> Please quote the spec.  I have a hard time believing that the compiler
>> magically decides that an exec_node* is not really an exec_node* and
>> just does whatever it wants (which sounds like what you're describing).
>>  That sounds pretty rubbish... and I'm surprised that anything works in
>> such an environment.
>> Mesa has also had -fno-strict-aliasing for as long as I can remember,
>> and this structure has only been used here for a few years.  The whole
>> thing just doesn't smell right.
> Oh we've always had aliasing problems this is just one, you can't
> expect one person to fix them all at once.

But "With this patch, I can build a working (though perhaps not 100%
bug-free) Mesa without using -fno-strict-aliasing during compilation."
is a pretty strong statement that doesn't completely jibe many years of
Mesa using -fno-strict-aliasing before exec_list was added.

> But making headway is a good thing.
> You can't have
> struct exec_list *p;
> struct exec_node *p2 = (struct exec_list *)p
> And do things with p2 and hope that p will get them, because
> the compiler wants to store things its doing to p in registers,
> and when you go and do something in p2 it can't work out it's the
> same thing, so it has to spill/reload.

Which I think is different from what Davin was saying, but I may be
misunderstanding the whole thing.  That's why I'd like to see spec
language.  The part that really gets me is that this is across a
function boundary... that's generally a sacred line, so I'm surprised
that the compiler is allowed to disregard what it's told in that scenario.

I'd also like to see assembly dumps with and without
-fno-strict-aliasing of some place where this goes wrong.  If you,
Davin, or someone else can point out a specific function that actually
does the wrong thing, I can generate assembly myself.

For that matter... how the heck is the ralloc (or any memory allocator)
implementation valid?

> Dave.

More information about the mesa-dev mailing list