[Mesa-dev] [PATCH v3 002/104] nir: Add a deref instruction type

Sun Apr 8 21:54:21 UTC 2018

On Sun, Apr 8, 2018 at 11:40 PM, Rob Clark <robdclark at gmail.com> wrote:
> On Sun, Apr 8, 2018 at 5:20 PM, Bas Nieuwenhuizen
> <bas at basnieuwenhuizen.nl> wrote:
>>>>>>>>>> +
>>>>>>>>>> +   /** The mode of the underlying variable */
>>>>>>>>>> +   nir_variable_mode mode;
>>>>>>>>>
>>>>>>>>> In fact, it seems like deref->mode is unused outside of nir_print and
>>>>>>>>> nir_validate.. for logical addressing we can get the mode from the
>>>>>>>>> deref_var->var at the start of the chain, and deref->mode has no
>>>>>>>>> meaning for physical addressing (where the mode comes from the
>>>>>>>>> pointer).
>>>>>>>>>
>>>>>>>>> So maybe just drop deref->mode?
>>>>>>>>
>>>>>>>> Isn't it still useful with logical addressing in case a var is not
>>>>>>>> immediately available? (think VK_KHR_variable_pointers)
>>>>>>>
>>>>>>> not sure, maybe this should just also use fat-pointers like physical
>>>>>>> addressing does??
>>>>>>>
>>>>>>>> Also I could see this being useful in physical addressing too to avoid
>>>>>>>> all passes working with derefs needing to do the constant folding?
>>>>>>>
>>>>>>> The problem is that you don't necessarily know the type at compile
>>>>>>> time (and in the case where you do, you need to do constant folding to
>>>>>>> figure it out)
>>>>>>
>>>>>> So I have two considerations here
>>>>>>
>>>>>> 1) for vulkan you always know the mode, even when you don't know the var.
>>>>>> 2)  In CL the mode can still get annotated in the source program (CL C
>>>>>> non-generic pointers) in cases in which we cannot reasonably figure it
>>>>>> out with just constant folding. In those cases the mode is extra
>>>>>> information that you really lose.
>>>>>
>>>>> so, even in cl 1.x, you could do things like 'somefxn(foo ? global_ptr
>>>>> : local_ptr)'.. depending on how much we inline all the things, that
>>>>> might not get CF'd away.
>>
>> How does this even work btw? somefxn has a definition, and the
>> definition specifies a mode for the argument right? (which is
>> implicitly __private if the app does not specify anything?)
>
> iirc, the cl spec has an example something along these lines..
>
> it doesn't require *physical* storage for anything where you don't
> know what the ptr type is, however.. so fat ptrs in ssa space works
> out
>
>>>>
>>>> But something like
>>>> __constant int *ptr_value = ...;
>>>> store ptr in complex data structure.
>>>> __constant int* ptr2 = load from complex data structure.
>>>>
>>>> Without explicitly annotating ptr2 it is unlikely that constant
>>>> folding would find that ptr2 is pointing to __constant address space.
>>>> Hence removing the modes loses valuable information that you cannot
>>>> get back by constant folding. However, if you have a pointer with
>>>> unknown mode, we could have a special mode (or mode_all?) and you can
>>>> use the uvec2 representation in that case?
>>>
>>> hmm, I'm not really getting how deref->mode could magically have
>>> information that fatptr.y doesn't have.. if the mode is known, vtn
>>> could stash it in fatptr.y and everyone is happy?  If vtn doesn't know
>>> this, then I don't see how deref->mode helps..
>>
>> You mean insert it into the fatptr every time deref_cast is called?
>>
>> Wouldn't that blow up the IR size significantly for very little benefit?
>
> in an easy to clean up way, so meh?

We can't clean it up if we want to keep the information. Also nir is
pretty slow to compile already, so I'd like not to add a significant
number of instruction for very little benefit.

>
>>
>>>
>>>>>
>>>>> I think I'm leaning towards using fat ptrs for the vk case, since I
>>>>> guess that is a case where you could always expect
>>>>> nir_src_as_const_value() to work, to get the variable mode.  If for no
>>>>> other reason than I guess these deref's, if the var is not known,
>>>>> start w/ deref_cast, and it would be ugly for deref_cast to have to
>>>>> work differently for compute vs vk.  But maybe Jason already has some
>>>>> thoughts about it?
>>>>
>>>> I'd like to avoid fat pointers alltogether on AMD since we would not
>>>> use it even for CL. a generic pointer is just a uint64_t for us, with
>>>> no bitfield in there for the address space.
>>>>
>>>> I think we may need to think a bit more about representation however,
>>>> as e.g. for AMD a pointer is typically 64-bits (but we can do e.g.
>>>> 32-bits for known workgroup pointers), the current deref instructions
>>>> return 32-bit, and you want something like a uvec2 as pointer
>>>> representation?
>>>
>>> afaiu, newer AMD (and NV) hw can remap shared/private into a single
>>> global address space..  But I guess that is an easy subset of the
>>> harder case where drivers need to use different instructions.. so a
>>> pretty simple lowering pass run before lower_io could remap things
>>> that use fatptrs into something that ignores fatptr.y.  Then opt
>>> passes make fatptr.y go away.  So both AMD and hw that doesn't have a
>>> flat address space are happy.
>>
>> But then you run into other issues, like how are you going to stuff a
>> 64-bit fatptr.x + a ?-bit fatptr.y into a 64-bit value for Physical64
>> addressing? Also this means we have to track to the sources back to
>> the cast/var any time we want to do anything at all with any deref
>> which seems less efficient to me than just stuffing the deref in
>> there.
>
> so fat ptrs only have to exist in ssa space, not be stored to
> something with a physically defined size..

how does storing __generic pointers work then? those still need the
fat bit for your hw right?

>
> As far as tracking things to the head of the chain of deref
> instructions, that is a matter of a simple helper or two.  Not like
> the chain of deref's is going to be 1000's of instructions..
>
>> Also, what would the something which ignores fatptr.y be? I'd assume
>> that would be the normal deref based stuff, but requiring fatptr
>> contradicts that?
>
> if you have a flat address space, maybe a pass (or option for
> lower_io) to just convert everything to load/store_global (since
> essentially what these GPUs are doing is remapping shared/private into
> the global address space)

but I'd like to only do that if we really don't know the mode
statically since it is somewhat slower/less flexible. (also radv is
unlikely to use nir_lower_io for a lot of this stuff since we can
lower derefs into LLVM GEPs directly)

Hence if we want the cases where we know the mode statically we need
to not lower the fatptr, but then we have the whole fatptr mess.

>
> BR,
> -R