[Mesa-dev] [RFC] nir: compiler options for addressing modes

Wed Apr 15 08:16:28 PDT 2015

On Wed, Apr 15, 2015 at 10:32 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>
>> But more immediately, I hit a sort of snag:  I cannot seem to narrow
>> from 32b to 16b at the same time I move to address register.  Which
>> ends up meaning I need a mov from 32b to 16b followed by a 2nd mov to
>> get it into address register...  which sort of defeats the purpose of
>> this whole exercise.  I need to do some more r/e around this, but it
>> may end up being better the way it was before.  And if we end up
>> needing to do the shl in half-precision registers, then solving this
>> in NIR would (I think) require NIR to be aware of half-precision.
>> Which sounds useful, but -EBIGGER_FIRES
>
> I don't quite understand... is this just a problem with using
> registers? Would the entire sequence of operations need to be in 16
> bits, or can you have whatever instruction computed your address do
> the conversion to 16-bit as part of the output? If it's the latter,
> you can just re-emit a 16-bit-outputting version of it and use that,
> although it's a bit of a hack.

Well, the problem if the shl is done in NIR, I commenly end up with:

  cov.f32s32 r0.x, r0.x
  shl.b r0.x, r0.x, 2
  cov.s32s16 hr0.x, r0.x
  mova a0.x, hr0.x

vs

  cov.f32s16 hr0.x, r0.x
  shl.b hr0.x, hr0.x, 2
  mova a0.x, hr0.x

(in both cases, with four nop's between each if I don't have other
instructions I can schedule in between, so the extra instruction here
could translate to 4 cycles in a lot of cases)

I tried to see if I could do something like 'shl.b hr0.x, r0.x, 2' but
that didn't seem to work.. possibly the narrowing on alu op's only
works for float.  (The ISA has all these fun holes like that, where
the instruction encoding supports things, but the designers apparently
didn't think it was worth it to spend transistors for less common
cases)

> Long-term, support for half-precision in NIR is definitely in the
> cards, but it'll probably have to wait until fp64 support as they're
> both very similar wrt changes we have to make in the IR. Unless
> someone has a burning desire to do half-precision first :).

It would be useful..  or at least my understanding is half-precision
should be faster for me.  Although I don't see a particularly good way
to solve that until we figure out how to get TGSI out from the middle.
I guess I need to take a look at the tie-ins between
vbo/uniform/texture/etc state in mesa st and tgsi, and figure out how
to make that work without a glsl_to_tgsi pass so drivers can request
NIR directly.  That probably comes after flow control in ir3 backend,
but *eventually* I'll be really interested in proper mediump support.

BR,
-R