[Mesa-dev] [RFC] nir: compiler options for addressing modes

Tue Apr 14 12:55:36 PDT 2015

On Tue, Apr 14, 2015 at 3:47 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Tue, Apr 14, 2015 at 12:44 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Tue, Apr 14, 2015 at 3:32 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>> On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark <robdclark at gmail.com> wrote:
>>>> On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt <eric at anholt.net> wrote:
>>>>> Rob Clark <robdclark at gmail.com> writes:
>>>>>
>>>>>> From: Rob Clark <robclark at freedesktop.org>
>>>>>>
>>>>>> Add compiler options so driver can request the units for address value,
>>>>>> for the various _indirect intrinsics.  This way, the front end can
>>>>>> insert the appropriate multiply/shift operations to get the address into
>>>>>> the units that the driver needs, avoiding need for driver to insert
>>>>>> these on the backend (and loose out on some optimizing potential).
>>>>>>
>>>>>> The addressing modes are specified per var/input/output/uniform/ubo.
>>>>>> I'm not sure about other drivers, but for freedreno we want byte
>>>>>> addressing for ubo's and register addressing for the rest.
>>>>>>
>>>>>> NOTE: so far just updated ttn and freedreno, and included everything in
>>>>>> one commit to make it easier to see how it fits together.  The driver
>>>>>> patches would naturally be separate (since drivers not setting these
>>>>>> options get the current/default behavior, ie. splitting out won't hurt
>>>>>> bisectability).  Also the ttn part could be split out as long as it
>>>>>> comes before freedreno or vc4 parts.
>>>>>
>>>>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>>>>> index 6531237..e8bce9f 100644
>>>>>> --- a/src/glsl/nir/nir.h
>>>>>> +++ b/src/glsl/nir/nir.h
>>>>>> @@ -1374,6 +1374,23 @@ typedef struct nir_function {
>>>>>>     exec_node_data(nir_function_overload, \
>>>>>>                    exec_list_get_head(&(func)->overload_list), node)
>>>>>>
>>>>>> +/**
>>>>>> + * Addressing mode used by the backend for various sorts of indirect
>>>>>> + * access.  This allows the frontend to generate the appropriate
>>>>>> + * multiply or shift operations to convert the address param for the
>>>>>> + * corresponding _indirect intrinsic, rather than relying on the
>>>>>> + * driver to insert them after the various other NIR optimization
>>>>>> + * passes.
>>>>>> + */
>>>>>> +typedef enum {
>>>>>> +   /** No address conversion, units in elements of array member: */
>>>>>> +   nir_addressing_mode_none = 0,
>>>>>> +   /** Address converted to units of registers (ie. float/int32): */
>>>>>> +   nir_addressing_mode_register = 1,
>>>>>> +   /** Address converted to units of bytes: */
>>>>>> +   nir_addressing_mode_byte = 2,
>>>>>> +} nir_addressing_mode;
>>>>>> +
>>>>>>  typedef struct nir_shader_compiler_options {
>>>>>>     bool lower_ffma;
>>>>>>     bool lower_flrp;
>>>>>> @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
>>>>>>      * are simulated by floats.)
>>>>>>      */
>>>>>>     bool native_integers;
>>>>>> +
>>>>>> +   /**
>>>>>> +    * Addressing mode for corresponding _indirect intrinsics:
>>>>>> +    */
>>>>>> +   nir_addressing_mode var_addressing_mode;
>>>>>> +   nir_addressing_mode input_addressing_mode;
>>>>>> +   nir_addressing_mode output_addressing_mode;
>>>>>> +   nir_addressing_mode uniform_addressing_mode;
>>>>>> +   nir_addressing_mode ubo_addressing_mode;
>>>>>> +
>>>>>>  } nir_shader_compiler_options;
>>>>>>
>>>>>>  typedef struct nir_shader {
>>>>>> --
>>>>>> 2.1.0
>>>>>
>>>>> This seems like a good idea, as it'll mean we can have passes that do
>>>>> things like scalarizing uniform loads, which I think could be productive
>>>>> for us.
>>>>>
>>>>> However, I find the "register" size name confusing -- it seems you mean
>>>>> a channel in a register, while on both the pieces of hardware I've
>>>>> worked on recently, a register is 16 dwords wide.  I think I'd call it
>>>>> "dword" or "float", with a slight preference for dword.
>>>>
>>>> yeah, I was having trouble picking a good name for that one.. I'd
>>>> originally had float, but that sounds funny when it could just as
>>>> easily be uint.  (And also, I have half-precision registers that would
>>>> be nice to use some day, etc.)  I think "dword" is a better
>>>> alternative.
>>>
>>> I'm not sure what I think of this.  However, having things better
>>> defined is a good idea and this may be a step in the right direction.
>>>
>>> Yes, please use either dword or float for 32-bits.  Also, what is this
>>> addressing_mode_none?  If we want a vec4 addressing mode, we should
>>> just call it vec4.
>>>
>>>>> Also, should this apply to just the indirect variants? I'd think they
>>>>> would apply to the const_index component, as well.  Either that, or we
>>>>> make const_index always be bytes, maybe.
>>>>
>>>> Hmm.. possibly.. since const_index is already bytes on ubo's, which
>>>> isn't very consistent w/ dwords/slots in the other cases, it might
>>>> make sense..  otoh, that wouldn't change the code the driver emitted.
>>>> So I guess I could really go either way.
>>>
>>> Yes.  The indirect source and const_index should have the same units.
>>> If your hardware takes a const_index in bytes and an indirect in
>>> dwords, that's not NIR's problem.  We need to keep it consistent.
>>
>> One issue is that for UBO's, adreno (or at least the way I implemented
>> it) wants the offsets in bytes, but for regular uniforms, it wants it
>> in 32-bit units. UBO loads are done from memory, while regular
>> uniforms are loaded from a register-like file. Seems like this sort of
>> thing should be in target-specific lowering (done before
>> optimizations), which will then be able to properly propagate/combine
>> any operations.
>
> Yeah.  If you want to have a backend-specific pass that goes through
> and adds a multiply on all the load/store intrinsics and then re-run
> optimizations, that's toatally fine.  However, I don't think we want
> all of NIR to support the "base offset in bytes but indirect offsets
> in owords" addressing mode.

Actually I don't think that was ever desired -- UBOs want both base
offset and indirect offset in the same units, as do uniforms. Just
different units for uniforms and UBO's.