[Freedreno] [RFC 0/4] freedreno: Move some compiler offset computations to NIR

Fri Jan 25 15:48:29 UTC 2019

There are a bunch of instructions emitted on ir3_compiler_nir related to
offset computations for IO opcodes (ssbo, image, etc). This small series
explores the possibility of moving these instructions to NIR, where we
have higher chances of optimizing them.

The series introduces a new, freedreno specific NIR pass,
'ir3_nir_lower_sampler_io' (final name not set). The pass is executed
early on ir3_optimize_nir(), and the goal is to centralize all these
computations there, hoping that later NIR passes will produce better
code than what is currently emitted.

So far, we have just implemented byte-offset computation for image store
and atomics. This seemed like a good first target given the amount of
instructions being emitted for it by the backend.

This is an RFC series because there are a few open questions, but we
wanted to gather feedback already now, in case this effort is something
not worth it; and also hoping that somebody else will give it a try
against other shaders and on other gens (we have just tried this on
a5xx).

* We have so far been unable to see any improvement in generated code
(not a penalty either). shader-db has not been specially useful. Few
shaders there exercise image store or image atomic ops, and of those
that do, most require higher versions of GLSL than what freedreno
supports, so they get skipped. The few that do actually run, don't
show any meaningful difference.

Then other shaders picked from tests suites are simple enough not to
produce any difference in code either.

There is still on-going work looking for cases where the pass helps
instruction stats, whether writing custom shaders or porting complex
shader from shader-db to run on GLES 310.

There is though an open question here as to whether moving backend
code to NIR is a benefit in and of itself.

* The series adds a nir_op_imad opcode that didn't exist before, and
perhaps not generally useful even for freedreno outside this pass,
because it maps to IR3_MAD_S24 which is probably not suitable for
generic integer multiply-add.

* The pass currently has 2 alternative code-paths to emit the
multiplication by the bytes-per-pixel of an image format. In one
case, since this value can be obtained at compile time, it is
emitted as an immediate by nir_imul_imm. The other alternative is
emitting an nir_imul with an SSA value that will map to
image_dims[0] at shader runtime.

The former case is uglier but produces better code (a single SHL
instruction), whereas the latter involves a generic imul, for which
the backend emits a lot of code to cover for overflow.

The doubt here is whether we should introduce a (lower precision)
version of imul that maps directly to IR3_IMUL_S.

A live (WIP) tree of the series can be found at:
<https://gitlab.freedesktop.org/elima/mesa/commits/wip/fd-compiler-io>

We plan to continue moving computations to the pass if we see
good opportunities.

Feedback very welcome,

cheers,
Eduardo

Eduardo Lima Mitev (4):
  nir: Add a new intrinsic 'load_image_stride'
  nir: Add a new ALU nir_op_imad
  ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io'
  ir3: Use ir3_nir_lower_sampler_io pass

 src/compiler/nir/nir_intrinsics.py           |   2 +
 src/compiler/nir/nir_opcodes.py              |   1 +
 src/freedreno/Makefile.sources               |   1 +
 src/freedreno/ir3/ir3_compiler_nir.c         |  61 ++--
 src/freedreno/ir3/ir3_nir.c                  |   1 +
 src/freedreno/ir3/ir3_nir.h                  |   1 +
 src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++++++++++++++++++
 7 files changed, 383 insertions(+), 33 deletions(-)
 create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c

-- 
2.20.1