[Mesa-dev] [RFC] nir: compiler options for addressing modes

Tue Apr 14 12:56:09 PDT 2015

On Tue, Apr 14, 2015 at 3:32 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark <robdclark at gmail.com> wrote:
>> On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt <eric at anholt.net> wrote:
>>> Rob Clark <robdclark at gmail.com> writes:
>>>
>>>> From: Rob Clark <robclark at freedesktop.org>
>>>>
>>>> Add compiler options so driver can request the units for address value,
>>>> for the various _indirect intrinsics.  This way, the front end can
>>>> insert the appropriate multiply/shift operations to get the address into
>>>> the units that the driver needs, avoiding need for driver to insert
>>>> these on the backend (and loose out on some optimizing potential).
>>>>
>>>> The addressing modes are specified per var/input/output/uniform/ubo.
>>>> I'm not sure about other drivers, but for freedreno we want byte
>>>> addressing for ubo's and register addressing for the rest.
>>>>
>>>> NOTE: so far just updated ttn and freedreno, and included everything in
>>>> one commit to make it easier to see how it fits together.  The driver
>>>> patches would naturally be separate (since drivers not setting these
>>>> options get the current/default behavior, ie. splitting out won't hurt
>>>> bisectability).  Also the ttn part could be split out as long as it
>>>> comes before freedreno or vc4 parts.
>>>
>>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>>> index 6531237..e8bce9f 100644
>>>> --- a/src/glsl/nir/nir.h
>>>> +++ b/src/glsl/nir/nir.h
>>>> @@ -1374,6 +1374,23 @@ typedef struct nir_function {
>>>>     exec_node_data(nir_function_overload, \
>>>>                    exec_list_get_head(&(func)->overload_list), node)
>>>>
>>>> +/**
>>>> + * Addressing mode used by the backend for various sorts of indirect
>>>> + * access.  This allows the frontend to generate the appropriate
>>>> + * multiply or shift operations to convert the address param for the
>>>> + * corresponding _indirect intrinsic, rather than relying on the
>>>> + * driver to insert them after the various other NIR optimization
>>>> + * passes.
>>>> + */
>>>> +typedef enum {
>>>> +   /** No address conversion, units in elements of array member: */
>>>> +   nir_addressing_mode_none = 0,
>>>> +   /** Address converted to units of registers (ie. float/int32): */
>>>> +   nir_addressing_mode_register = 1,
>>>> +   /** Address converted to units of bytes: */
>>>> +   nir_addressing_mode_byte = 2,
>>>> +} nir_addressing_mode;
>>>> +
>>>>  typedef struct nir_shader_compiler_options {
>>>>     bool lower_ffma;
>>>>     bool lower_flrp;
>>>> @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
>>>>      * are simulated by floats.)
>>>>      */
>>>>     bool native_integers;
>>>> +
>>>> +   /**
>>>> +    * Addressing mode for corresponding _indirect intrinsics:
>>>> +    */
>>>> +   nir_addressing_mode var_addressing_mode;
>>>> +   nir_addressing_mode input_addressing_mode;
>>>> +   nir_addressing_mode output_addressing_mode;
>>>> +   nir_addressing_mode uniform_addressing_mode;
>>>> +   nir_addressing_mode ubo_addressing_mode;
>>>> +
>>>>  } nir_shader_compiler_options;
>>>>
>>>>  typedef struct nir_shader {
>>>> --
>>>> 2.1.0
>>>
>>> This seems like a good idea, as it'll mean we can have passes that do
>>> things like scalarizing uniform loads, which I think could be productive
>>> for us.
>>>
>>> However, I find the "register" size name confusing -- it seems you mean
>>> a channel in a register, while on both the pieces of hardware I've
>>> worked on recently, a register is 16 dwords wide.  I think I'd call it
>>> "dword" or "float", with a slight preference for dword.
>>
>> yeah, I was having trouble picking a good name for that one.. I'd
>> originally had float, but that sounds funny when it could just as
>> easily be uint.  (And also, I have half-precision registers that would
>> be nice to use some day, etc.)  I think "dword" is a better
>> alternative.
>
> I'm not sure what I think of this.  However, having things better
> defined is a good idea and this may be a step in the right direction.
>
> Yes, please use either dword or float for 32-bits.  Also, what is this
> addressing_mode_none?  If we want a vec4 addressing mode, we should
> just call it vec4.

well, tbh, it is vec4 for ttn, but I wasn't completely clear on how it
behaves if you actually have vec2/vec3..  so "none" just means "do
what you were doing before, whatever that was"..

if glsl_to_nir is vec4 based, then I can rename it to vec4..

>
>>> Also, should this apply to just the indirect variants? I'd think they
>>> would apply to the const_index component, as well.  Either that, or we
>>> make const_index always be bytes, maybe.
>>
>> Hmm.. possibly.. since const_index is already bytes on ubo's, which
>> isn't very consistent w/ dwords/slots in the other cases, it might
>> make sense..  otoh, that wouldn't change the code the driver emitted.
>> So I guess I could really go either way.
>
> Yes.  The indirect source and const_index should have the same units.
> If your hardware takes a const_index in bytes and an indirect in
> dwords, that's not NIR's problem.  We need to keep it consistent.

well, only the compiler takes const_index (so you can easily convert
to whatever units your hw wants at compile time), so fairly trivial to
make that in units of whatever we want.  Making it the same for
consistency sounds reasonable.

>>> This patch should probably edit the comment in nir_intrinsics.h about
>>> addressing, too.
>>
>> hmm, yeah, that comment doesn't really make any sense since (afaict)
>> NIR doesn't *really* know if you are vec or scalar backend..
>
> We also need to update nir_lower_io, nir_lower_locals_to_regs and
> nir_lower_globals_to_regs.  Having these things in global NIR
> structures and then just implementing it for TGSI -> NIR -> your
> backend is going to lead to a lot of confusion.

hmm, I can't say I know completely what to do there.. in particular
the whole lower_io is a bit strange.. why doesn't the f/e just
directly product the correct intrinsics?  The whole either
load_var{mode=uniform} vs load_uniform intrinsic thing seems strange..

BR,
-R

> --Jason