[Freedreno] [RFC 0/4] freedreno: Move some compiler offset computations to NIR

Fri Jan 25 23:42:58 UTC 2019

On Fri, Jan 25, 2019 at 10:48 AM Eduardo Lima Mitev <elima at igalia.com> wrote:
>
> There are a bunch of instructions emitted on ir3_compiler_nir related to
> offset computations for IO opcodes (ssbo, image, etc). This small series
> explores the possibility of moving these instructions to NIR, where we
> have higher chances of optimizing them.
>
> The series introduces a new, freedreno specific NIR pass,
> 'ir3_nir_lower_sampler_io' (final name not set). The pass is executed
> early on ir3_optimize_nir(), and the goal is to centralize all these
> computations there, hoping that later NIR passes will produce better
> code than what is currently emitted.

I can think of a few other things that would be interesting to lower
to driver specific nir opcodes (imul and various lowering for tex
instructions come to mind.. but probably also ubo and ssbo address
calculation.. maybe it could even make sense for some of the single
src alu instructions that translate into multiple ir3 instructions,
not sure)..

Are you thinking about having separate passes for each?  I guess at
least for alu instructions we might able to use nir_algebraic so
having things split up might be easier.

> So far, we have just implemented byte-offset computation for image store
> and atomics. This seemed like a good first target given the amount of
> instructions being emitted for it by the backend.
>
> This is an RFC series because there are a few open questions, but we
> wanted to gather feedback already now, in case this effort is something
> not worth it; and also hoping that somebody else will give it a try
> against other shaders and on other gens (we have just tried this on
> a5xx).
>
> * We have so far been unable to see any improvement in generated code
> (not a penalty either). shader-db has not been specially useful. Few
> shaders there exercise image store or image atomic ops, and of those
> that do, most require higher versions of GLSL than what freedreno
> supports, so they get skipped. The few that do actually run, don't
> show any meaningful difference.

I guess it would be easy enough to construct shaders that would
benefit from this, but maybe that is cheating..

Possibly UBO and SSBO is a better target, I guess there you might be
more likely to see patterns of access of successive elements (ie.
foo[idx], foo[idx+1], etc)?

Anyways, since we don't try to do (and I'd rather not do) any sort of
CSE post nir->ir3 I think starting to introduce more ir3 specific
nir->nir lowering seems like a thing we need, so I'm pretty happy that
someone is looking at this :-)

BR,
-R

> Then other shaders picked from tests suites are simple enough not to
> produce any difference in code either.
>
> There is still on-going work looking for cases where the pass helps
> instruction stats, whether writing custom shaders or porting complex
> shader from shader-db to run on GLES 310.
>
> There is though an open question here as to whether moving backend
> code to NIR is a benefit in and of itself.
>
> * The series adds a nir_op_imad opcode that didn't exist before, and
> perhaps not generally useful even for freedreno outside this pass,
> because it maps to IR3_MAD_S24 which is probably not suitable for
> generic integer multiply-add.
>
> * The pass currently has 2 alternative code-paths to emit the
> multiplication by the bytes-per-pixel of an image format. In one
> case, since this value can be obtained at compile time, it is
> emitted as an immediate by nir_imul_imm. The other alternative is
> emitting an nir_imul with an SSA value that will map to
> image_dims[0] at shader runtime.
>
> The former case is uglier but produces better code (a single SHL
> instruction), whereas the latter involves a generic imul, for which
> the backend emits a lot of code to cover for overflow.
>
> The doubt here is whether we should introduce a (lower precision)
> version of imul that maps directly to IR3_IMUL_S.
>
>
> A live (WIP) tree of the series can be found at:
> <https://gitlab.freedesktop.org/elima/mesa/commits/wip/fd-compiler-io>
>
> We plan to continue moving computations to the pass if we see
> good opportunities.
>
> Feedback very welcome,
>
> cheers,
> Eduardo
>
> Eduardo Lima Mitev (4):
>   nir: Add a new intrinsic 'load_image_stride'
>   nir: Add a new ALU nir_op_imad
>   ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io'
>   ir3: Use ir3_nir_lower_sampler_io pass
>
>  src/compiler/nir/nir_intrinsics.py           |   2 +
>  src/compiler/nir/nir_opcodes.py              |   1 +
>  src/freedreno/Makefile.sources               |   1 +
>  src/freedreno/ir3/ir3_compiler_nir.c         |  61 ++--
>  src/freedreno/ir3/ir3_nir.c                  |   1 +
>  src/freedreno/ir3/ir3_nir.h                  |   1 +
>  src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++++++++++++++++++
>  7 files changed, 383 insertions(+), 33 deletions(-)
>  create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c
>
> --
> 2.20.1
>
> _______________________________________________
> Freedreno mailing list
> Freedreno at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno