[Mesa-dev] [PATCH 18/24] i965: Add a 'has_side_effects' back-end instruction predicate.

Mon Sep 30 11:54:59 PDT 2013

Paul Berry <stereotype441 at gmail.com> writes:

> On 15 September 2013 00:10, Francisco Jerez <currojerez at riseup.net> wrote:
>
>> Analogous to the GLSL IR predicate with the same name.  This patch
>> fixes the three dead code elimination passes and the VEC4/FS
>> instruction scheduling passes so they leave instructions with side
>> effects alone.
>>
>> At some point it might be interesting to have the instruction
>> scheduler calculate the exact memory dependencies between atomic ops,
>> but they're rare enough that it seems unlikely that it will make any
>> practical difference.
>>
>
> Does ARB_shader_atomic_counters guarantee that order is properly preserved
> between atomic read operations?  In other words, if I do this in a shader:
>
> atomic_uint counter1;
> atomic_uint counter2;
>
> void main() {
>   ...
>   uint a = atomicCounter(counter1);
>   uint b = atomicCounter(counter2);
> }
>
> can I be guaranteed that the read from counter2 will happen after the read
> from counter1?  I can't tell from reading the spec but I'm inclined to
> think we should assume this is guaranteed, just to be on the safe side.
>
> If we make this assumption, then I believe the has_side_effects() predicate
> is not enough to guarantee the proper ordering.  We would need the
> scheduling code to use a stronger predicate, requires_exact_ordering(),
> which returns true for both SHADER_OPCODE_UNTYPED_ATOMIC and
> SHADER_OPCODE_UNTYPED_SURFACE_READ, to ensure that atomic counter reads
> don't get reordered with respect to each other.
>

The ARB_shader_atomic_counters extension is very unspecific in that
regard.  AFAICT the implementation is allowed to do whatever it wants as
long as the uniqueness guarantee is preserved.  The
ARB_shader_image_load_store is much more specific and it doesn't require
the implementation to preserve any particular ordering between read
operations, so I think it would make sense to have the same behavior
for both extensions.

>
>[...]
>> @@ -1943,31 +1943,26 @@ fs_visitor::dead_code_eliminate_local()
>>                 get_dead_code_hash_entry(ht, inst->dst.reg,
>>                                          inst->dst.reg_offset);
>>
>> -            if (inst->is_partial_write()) {
>> -               /* For a partial write, we can't remove any previous dead
>> code
>> -                * candidate, since we're just modifying their result, but
>> we can
>> -                * be dead code eliminiated ourselves.
>> -                */
>> -               if (entry) {
>> -                  entry->data = inst;
>> +            if (entry) {
>> +               if (inst->is_partial_write()) {
>> +                  /* For a partial write, we can't remove any previous
>> dead code
>> +                   * candidate, since we're just modifying their result.
>> +                   */
>>
>
> I'm not terribly familiar with this code, so this may be a stupid question,
> but:
>
> Previous to this patch, if entry was non-NULL and inst->is_partial_write(),
> we would set entry->data = inst.  With your rewrite, that doesn't happen
> anymore.  That seems like a problem.
>
>[...]
>
> Previously, we would only remove the entry from the hashtable if entry was
> non-NULL and !inst->is_partial_write().  Now we remove it whenever entry is
> non-NULL, regardless of whether inst->is_partial_write().  This also seems
> like a problem.
>
>>[...]
>
> Preveiously, we wouldn't insert the instruction in the dead code hash if
> entry was non-NULL and inst->is_partial_write().  We no longer do that
> check--was that an intentional change?
>

All these changes were intentional.  The old code did the following:
 - For partial writes with a matching hash table entry for the
   destination register, the existing entry was replaced with the
   current instruction.
 - For partial writes with no matching hash table entry a new entry was
   created for the current instruction.
 - For full writes with a matching hash table entry, the previous
   instruction was dead code-eliminated, and the hash table entry was
   replaced with the current instruction.
 - For full writes with no matching hash table entry a new entry was
   created for the current instruction.

The four conditions are preserved with the new code, with the difference
that we skip the insertion step for instructions with side effects
because they can never be dead code-eliminated.

>
>> -            }
>>           }
>>        }
>>     }
>> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
>> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
>> index 5530683..a688336 100644
>> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
>> @@ -562,7 +562,8 @@ fs_instruction_scheduler::calculate_deps()
>>        schedule_node *n = (schedule_node *)node;
>>        fs_inst *inst = (fs_inst *)n->inst;
>>
>> -      if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT)
>> +      if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||
>> +         inst->has_side_effects())
>>           add_barrier_deps(n);
>>
>>        /* read-after-write deps. */
>> @@ -795,6 +796,9 @@ vec4_instruction_scheduler::calculate_deps()
>>        schedule_node *n = (schedule_node *)node;
>>        vec4_instruction *inst = (vec4_instruction *)n->inst;
>>
>> +      if (inst->has_side_effects())
>> +         add_barrier_deps(n);
>> +
>>
>
> add_barrier_deps() provides a stronger scheduling guarantee than we need.
> It guarantees that the given instruction executes prior to *all*
> instructions that follow it and after *all* instructions that precede it.
> Actually, the only dependencies we need are between two atomic writes,
> between an atomic write and an atomic read, and (depending what we decide
> about my question at the top of this email) possibly between two atomic
> reads.
>
> I'd prefer to see us add just the dependencies we really need, since that
> will give the scheduler more freedom to move code around.
>
My plan is to implement some sort of memory dependency tracking
mechanism in a separate patch series.  Adding dependencies between all
atomic instructions would be possible but still unnecessarily strict.  I
don't think it's worth the effort, meanwhile it's unlikely to make much
of a difference in comparison to the "naive" add_barrier_deps()
implementation, and it would still be complex enough that doing it would
keep me away from working on the "correct" implementation for a
considerable amount of time. :P

>[...]

Thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130930/aeddf731/attachment.pgp>