[Mesa-dev] [PATCH 18/24] i965: Add a 'has_side_effects' back-end instruction predicate.

Thu Sep 26 11:11:12 PDT 2013

On 15 September 2013 00:10, Francisco Jerez <currojerez at riseup.net> wrote:

> Analogous to the GLSL IR predicate with the same name.  This patch
> fixes the three dead code elimination passes and the VEC4/FS
> instruction scheduling passes so they leave instructions with side
> effects alone.
>
> At some point it might be interesting to have the instruction
> scheduler calculate the exact memory dependencies between atomic ops,
> but they're rare enough that it seems unlikely that it will make any
> practical difference.
>

Does ARB_shader_atomic_counters guarantee that order is properly preserved
between atomic read operations?  In other words, if I do this in a shader:

atomic_uint counter1;
atomic_uint counter2;

void main() {
  ...
  uint a = atomicCounter(counter1);
  uint b = atomicCounter(counter2);
}

can I be guaranteed that the read from counter2 will happen after the read
from counter1?  I can't tell from reading the spec but I'm inclined to
think we should assume this is guaranteed, just to be on the safe side.

If we make this assumption, then I believe the has_side_effects() predicate
is not enough to guarantee the proper ordering.  We would need the
scheduling code to use a stronger predicate, requires_exact_ordering(),
which returns true for both SHADER_OPCODE_UNTYPED_ATOMIC and
SHADER_OPCODE_UNTYPED_SURFACE_READ, to ensure that atomic counter reads
don't get reordered with respect to each other.

> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp               | 25
> +++++++++-------------
>  .../drivers/dri/i965/brw_schedule_instructions.cpp |  6 +++++-
>  src/mesa/drivers/dri/i965/brw_shader.cpp           | 11 ++++++++++
>  src/mesa/drivers/dri/i965/brw_shader.h             |  7 ++++++
>  src/mesa/drivers/dri/i965/brw_vec4.cpp             |  2 +-
>  5 files changed, 34 insertions(+), 17 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 4afe37b..453752c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1805,7 +1805,7 @@ fs_visitor::dead_code_eliminate()
>     foreach_list_safe(node, &this->instructions) {
>        fs_inst *inst = (fs_inst *)node;
>
> -      if (inst->dst.file == GRF) {
> +      if (inst->dst.file == GRF && !inst->has_side_effects()) {
>           assert(this->virtual_grf_end[inst->dst.reg] >= pc);
>           if (this->virtual_grf_end[inst->dst.reg] == pc) {
>              inst->remove();
> @@ -1943,31 +1943,26 @@ fs_visitor::dead_code_eliminate_local()
>                 get_dead_code_hash_entry(ht, inst->dst.reg,
>                                          inst->dst.reg_offset);
>
> -            if (inst->is_partial_write()) {
> -               /* For a partial write, we can't remove any previous dead
> code
> -                * candidate, since we're just modifying their result, but
> we can
> -                * be dead code eliminiated ourselves.
> -                */
> -               if (entry) {
> -                  entry->data = inst;
> +            if (entry) {
> +               if (inst->is_partial_write()) {
> +                  /* For a partial write, we can't remove any previous
> dead code
> +                   * candidate, since we're just modifying their result.
> +                   */
>

I'm not terribly familiar with this code, so this may be a stupid question,
but:

Previous to this patch, if entry was non-NULL and inst->is_partial_write(),
we would set entry->data = inst.  With your rewrite, that doesn't happen
anymore.  That seems like a problem.

>                 } else {
> -                  insert_dead_code_hash(ht, inst->dst.reg,
> inst->dst.reg_offset,
> -                                        inst);
> -               }
> -            } else {
> -               if (entry) {
>                    /* We're completely updating a channel, and there was a
>                     * previous write to the channel that wasn't read.
>  Kill it!
>                     */
>                    fs_inst *inst = (fs_inst *)entry->data;
>                    inst->remove();
>                    progress = true;
> -                  _mesa_hash_table_remove(ht, entry);
>                 }
>
> +               _mesa_hash_table_remove(ht, entry);
>

Previously, we would only remove the entry from the hashtable if entry was
non-NULL and !inst->is_partial_write().  Now we remove it whenever entry is
non-NULL, regardless of whether inst->is_partial_write().  This also seems
like a problem.

> +            }
> +
> +            if (!inst->has_side_effects())
>                 insert_dead_code_hash(ht, inst->dst.reg,
> inst->dst.reg_offset,
>                                       inst);
>

Preveiously, we wouldn't insert the instruction in the dead code hash if
entry was non-NULL and inst->is_partial_write().  We no longer do that
check--was that an intentional change?

> -            }
>           }
>        }
>     }
> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> index 5530683..a688336 100644
> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> @@ -562,7 +562,8 @@ fs_instruction_scheduler::calculate_deps()
>        schedule_node *n = (schedule_node *)node;
>        fs_inst *inst = (fs_inst *)n->inst;
>
> -      if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT)
> +      if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||
> +         inst->has_side_effects())
>           add_barrier_deps(n);
>
>        /* read-after-write deps. */
> @@ -795,6 +796,9 @@ vec4_instruction_scheduler::calculate_deps()
>        schedule_node *n = (schedule_node *)node;
>        vec4_instruction *inst = (vec4_instruction *)n->inst;
>
> +      if (inst->has_side_effects())
> +         add_barrier_deps(n);
> +
>

add_barrier_deps() provides a stronger scheduling guarantee than we need.
It guarantees that the given instruction executes prior to *all*
instructions that follow it and after *all* instructions that precede it.
Actually, the only dependencies we need are between two atomic writes,
between an atomic write and an atomic read, and (depending what we decide
about my question at the top of this email) possibly between two atomic
reads.

I'd prefer to see us add just the dependencies we really need, since that
will give the scheduler more freedom to move code around.

       /* read-after-write deps. */
>        for (int i = 0; i < 3; i++) {
>           if (inst->src[i].file == GRF) {
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index 53364a5..7a47c6c 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -566,6 +566,17 @@ backend_instruction::is_control_flow()
>     }
>  }
>
> +bool
> +backend_instruction::has_side_effects() const
> +{
> +   switch (opcode) {
> +   case SHADER_OPCODE_UNTYPED_ATOMIC:
> +      return true;
> +   default:
> +      return false;
> +   }
> +}
> +
>  void
>  backend_visitor::dump_instructions()
>  {
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h
> b/src/mesa/drivers/dri/i965/brw_shader.h
> index 55769ff..88a5673 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.h
> +++ b/src/mesa/drivers/dri/i965/brw_shader.h
> @@ -46,6 +46,13 @@ public:
>     bool is_math();
>     bool is_control_flow();
>
> +   /**
> +    * True if the instruction has side effects other than writing to
> +    * its destination registers.  You are expected not to reorder or
> +    * optimize these out unless you know what you are doing.
> +    */
> +   bool has_side_effects() const;
> +
>     enum opcode opcode; /* BRW_OPCODE_* or FS_OPCODE_* */
>
>     uint32_t predicate;
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 6549d4e..1867b22 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -313,7 +313,7 @@ vec4_visitor::dead_code_eliminate()
>     foreach_list_safe(node, &this->instructions) {
>        vec4_instruction *inst = (vec4_instruction *)node;
>
> -      if (inst->dst.file == GRF) {
> +      if (inst->dst.file == GRF && !inst->has_side_effects()) {
>           assert(this->virtual_grf_end[inst->dst.reg] >= pc);
>           if (this->virtual_grf_end[inst->dst.reg] == pc) {
>              inst->remove();
> --
> 1.8.3.4
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130926/f80a2528/attachment-0001.html>