[Mesa-dev] [RFC] nir: compiler options for addressing modes

Tue Apr 14 13:59:18 PDT 2015

On Tue, Apr 14, 2015 at 12:56 PM, Rob Clark <robdclark at gmail.com> wrote:
> On Tue, Apr 14, 2015 at 3:32 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark <robdclark at gmail.com> wrote:
>>> On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt <eric at anholt.net> wrote:
>>>> Rob Clark <robdclark at gmail.com> writes:
>>>>
>>>>> From: Rob Clark <robclark at freedesktop.org>
>>>>>
>>>>> Add compiler options so driver can request the units for address value,
>>>>> for the various _indirect intrinsics.  This way, the front end can
>>>>> insert the appropriate multiply/shift operations to get the address into
>>>>> the units that the driver needs, avoiding need for driver to insert
>>>>> these on the backend (and loose out on some optimizing potential).
>>>>>
>>>>> The addressing modes are specified per var/input/output/uniform/ubo.
>>>>> I'm not sure about other drivers, but for freedreno we want byte
>>>>> addressing for ubo's and register addressing for the rest.
>>>>>
>>>>> NOTE: so far just updated ttn and freedreno, and included everything in
>>>>> one commit to make it easier to see how it fits together.  The driver
>>>>> patches would naturally be separate (since drivers not setting these
>>>>> options get the current/default behavior, ie. splitting out won't hurt
>>>>> bisectability).  Also the ttn part could be split out as long as it
>>>>> comes before freedreno or vc4 parts.
>>>>
>>>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>>>> index 6531237..e8bce9f 100644
>>>>> --- a/src/glsl/nir/nir.h
>>>>> +++ b/src/glsl/nir/nir.h
>>>>> @@ -1374,6 +1374,23 @@ typedef struct nir_function {
>>>>>     exec_node_data(nir_function_overload, \
>>>>>                    exec_list_get_head(&(func)->overload_list), node)
>>>>>
>>>>> +/**
>>>>> + * Addressing mode used by the backend for various sorts of indirect
>>>>> + * access.  This allows the frontend to generate the appropriate
>>>>> + * multiply or shift operations to convert the address param for the
>>>>> + * corresponding _indirect intrinsic, rather than relying on the
>>>>> + * driver to insert them after the various other NIR optimization
>>>>> + * passes.
>>>>> + */
>>>>> +typedef enum {
>>>>> +   /** No address conversion, units in elements of array member: */
>>>>> +   nir_addressing_mode_none = 0,
>>>>> +   /** Address converted to units of registers (ie. float/int32): */
>>>>> +   nir_addressing_mode_register = 1,
>>>>> +   /** Address converted to units of bytes: */
>>>>> +   nir_addressing_mode_byte = 2,
>>>>> +} nir_addressing_mode;
>>>>> +
>>>>>  typedef struct nir_shader_compiler_options {
>>>>>     bool lower_ffma;
>>>>>     bool lower_flrp;
>>>>> @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
>>>>>      * are simulated by floats.)
>>>>>      */
>>>>>     bool native_integers;
>>>>> +
>>>>> +   /**
>>>>> +    * Addressing mode for corresponding _indirect intrinsics:
>>>>> +    */
>>>>> +   nir_addressing_mode var_addressing_mode;
>>>>> +   nir_addressing_mode input_addressing_mode;
>>>>> +   nir_addressing_mode output_addressing_mode;
>>>>> +   nir_addressing_mode uniform_addressing_mode;
>>>>> +   nir_addressing_mode ubo_addressing_mode;

What is var_addressing_mode?  Sorry for not bringing that up before.

>>>>> +
>>>>>  } nir_shader_compiler_options;
>>>>>
>>>>>  typedef struct nir_shader {
>>>>> --
>>>>> 2.1.0
>>>>
>>>> This seems like a good idea, as it'll mean we can have passes that do
>>>> things like scalarizing uniform loads, which I think could be productive
>>>> for us.
>>>>
>>>> However, I find the "register" size name confusing -- it seems you mean
>>>> a channel in a register, while on both the pieces of hardware I've
>>>> worked on recently, a register is 16 dwords wide.  I think I'd call it
>>>> "dword" or "float", with a slight preference for dword.
>>>
>>> yeah, I was having trouble picking a good name for that one.. I'd
>>> originally had float, but that sounds funny when it could just as
>>> easily be uint.  (And also, I have half-precision registers that would
>>> be nice to use some day, etc.)  I think "dword" is a better
>>> alternative.
>>
>> I'm not sure what I think of this.  However, having things better
>> defined is a good idea and this may be a step in the right direction.
>>
>> Yes, please use either dword or float for 32-bits.  Also, what is this
>> addressing_mode_none?  If we want a vec4 addressing mode, we should
>> just call it vec4.
>
> well, tbh, it is vec4 for ttn, but I wasn't completely clear on how it
> behaves if you actually have vec2/vec3..  so "none" just means "do
> what you were doing before, whatever that was"..
>
> if glsl_to_nir is vec4 based, then I can rename it to vec4..
>
>>
>>>> Also, should this apply to just the indirect variants? I'd think they
>>>> would apply to the const_index component, as well.  Either that, or we
>>>> make const_index always be bytes, maybe.
>>>
>>> Hmm.. possibly.. since const_index is already bytes on ubo's, which
>>> isn't very consistent w/ dwords/slots in the other cases, it might
>>> make sense..  otoh, that wouldn't change the code the driver emitted.
>>> So I guess I could really go either way.
>>
>> Yes.  The indirect source and const_index should have the same units.
>> If your hardware takes a const_index in bytes and an indirect in
>> dwords, that's not NIR's problem.  We need to keep it consistent.
>
> well, only the compiler takes const_index (so you can easily convert
> to whatever units your hw wants at compile time), so fairly trivial to
> make that in units of whatever we want.  Making it the same for
> consistency sounds reasonable.
>
>>>> This patch should probably edit the comment in nir_intrinsics.h about
>>>> addressing, too.
>>>
>>> hmm, yeah, that comment doesn't really make any sense since (afaict)
>>> NIR doesn't *really* know if you are vec or scalar backend..
>>
>> We also need to update nir_lower_io, nir_lower_locals_to_regs and
>> nir_lower_globals_to_regs.  Having these things in global NIR
>> structures and then just implementing it for TGSI -> NIR -> your
>> backend is going to lead to a lot of confusion.
>
> hmm, I can't say I know completely what to do there.. in particular
> the whole lower_io is a bit strange.. why doesn't the f/e just
> directly product the correct intrinsics?  The whole either
> load_var{mode=uniform} vs load_uniform intrinsic thing seems strange..

Ok, that was a bit of a red hairing.  For now, a comment that lower_io
doesn't respect them is probably sufficient.