[Mesa-dev] [Freedreno] [RFC 0/4] freedreno: Move some compiler offset computations to NIR

Mon Jan 28 14:45:41 UTC 2019

On Mon, Jan 28, 2019 at 3:32 AM Eduardo Lima Mitev <elima at igalia.com> wrote:
>
> On 1/26/19 12:42 AM, Rob Clark wrote:
> > On Fri, Jan 25, 2019 at 10:48 AM Eduardo Lima Mitev <elima at igalia.com> wrote:
> >>
> >> There are a bunch of instructions emitted on ir3_compiler_nir related to
> >> offset computations for IO opcodes (ssbo, image, etc). This small series
> >> explores the possibility of moving these instructions to NIR, where we
> >> have higher chances of optimizing them.
> >>
> >> The series introduces a new, freedreno specific NIR pass,
> >> 'ir3_nir_lower_sampler_io' (final name not set). The pass is executed
> >> early on ir3_optimize_nir(), and the goal is to centralize all these
> >> computations there, hoping that later NIR passes will produce better
> >> code than what is currently emitted.
> >
> > I can think of a few other things that would be interesting to lower
> > to driver specific nir opcodes (imul and various lowering for tex
> > instructions come to mind.. but probably also ubo and ssbo address
> > calculation.. maybe it could even make sense for some of the single
> > src alu instructions that translate into multiple ir3 instructions,
> > not sure)..
> >
>
> Yes, the plan is to abstract to NIR whatever brings us a benefit in
> instruction stats. There is also the question of just simplifying the
> backend compiler, provided we don't hurt produced code.

yeah, simplifying nir->ir3 would itself be a worthwhile thing

> > Are you thinking about having separate passes for each?  I guess at
> > least for alu instructions we might able to use nir_algebraic so
> > having things split up might be easier.
> >
>
> I haven't thought too much about this yet, but it seems to make sense
> having at least 2 passes, one for I/O and one for ALUs.

I haven't thought about it too much, but I think this makes sense

> >> So far, we have just implemented byte-offset computation for image store
> >> and atomics. This seemed like a good first target given the amount of
> >> instructions being emitted for it by the backend.
> >>
> >> This is an RFC series because there are a few open questions, but we
> >> wanted to gather feedback already now, in case this effort is something
> >> not worth it; and also hoping that somebody else will give it a try
> >> against other shaders and on other gens (we have just tried this on
> >> a5xx).
> >>
> >> * We have so far been unable to see any improvement in generated code
> >> (not a penalty either). shader-db has not been specially useful. Few
> >> shaders there exercise image store or image atomic ops, and of those
> >> that do, most require higher versions of GLSL than what freedreno
> >> supports, so they get skipped. The few that do actually run, don't
> >> show any meaningful difference.
> >
> > I guess it would be easy enough to construct shaders that would
> > benefit from this, but maybe that is cheating..
> >
> > Possibly UBO and SSBO is a better target, I guess there you might be
> > more likely to see patterns of access of successive elements (ie.
> > foo[idx], foo[idx+1], etc)?
> >
>
> I took a first stab at SSBO's load/store/atomic, where the offset is
> divided by 4 in the backend, but was bitten by IR3_STGB requiring both
> the original byte-offset and the dword-offset (in src1 and src2
> respectively). So trivially emitting a nir_shr on the offset didn't buy
> us anything. I have in the backlog to revisit this, turning the offset
> into a 2-component reg so we can hold the original byte-offset and the
> offset divided by 4.

hmm, or just two src's?  Anyways, it seems like there is some
potential to fold that shift into other math, perhaps if you had a
shader that did something like 'somessbo[val << 1]'?

fwiw, I think ssbo's and images get a bit simpler, or rather less alu
instructions..  a5xx and earlier wanted both idx/coords and offset
(Ie. idx/coords used for clamping out of bounds access, but offset is
what is actually used to fetch/store).  With a6xx it becomes just
idx/coords, no offset calc.. which *hopefully* means we can deal with
tiled images more easily.  But a6xx has it's own challenge with one of
the instruction srcs being a vecN with both input and output values.
See 'instr_cat6_a6xx_t'.

(note: we don't use ldc for UBOs yet, still sticking with the general
purpose ldg.  Maybe we should switch to get rid of some address
calculation.)

> > Anyways, since we don't try to do (and I'd rather not do) any sort of
> > CSE post nir->ir3 I think starting to introduce more ir3 specific
> > nir->nir lowering seems like a thing we need, so I'm pretty happy that
> > someone is looking at this :-)
> >
>
> Thanks, that's encouraging. Lets see how far we can get :).
>

cool

BR,
-R