[Mesa-dev] [PATCH v3 002/104] nir: Add a deref instruction type

Rob Clark robdclark at gmail.com
Sun Apr 8 21:40:10 UTC 2018


On Sun, Apr 8, 2018 at 5:20 PM, Bas Nieuwenhuizen
<bas at basnieuwenhuizen.nl> wrote:
>>>>>>>>> +
>>>>>>>>> +   /** The mode of the underlying variable */
>>>>>>>>> +   nir_variable_mode mode;
>>>>>>>>
>>>>>>>> In fact, it seems like deref->mode is unused outside of nir_print and
>>>>>>>> nir_validate.. for logical addressing we can get the mode from the
>>>>>>>> deref_var->var at the start of the chain, and deref->mode has no
>>>>>>>> meaning for physical addressing (where the mode comes from the
>>>>>>>> pointer).
>>>>>>>>
>>>>>>>> So maybe just drop deref->mode?
>>>>>>>
>>>>>>> Isn't it still useful with logical addressing in case a var is not
>>>>>>> immediately available? (think VK_KHR_variable_pointers)
>>>>>>
>>>>>> not sure, maybe this should just also use fat-pointers like physical
>>>>>> addressing does??
>>>>>>
>>>>>>> Also I could see this being useful in physical addressing too to avoid
>>>>>>> all passes working with derefs needing to do the constant folding?
>>>>>>
>>>>>> The problem is that you don't necessarily know the type at compile
>>>>>> time (and in the case where you do, you need to do constant folding to
>>>>>> figure it out)
>>>>>
>>>>> So I have two considerations here
>>>>>
>>>>> 1) for vulkan you always know the mode, even when you don't know the var.
>>>>> 2)  In CL the mode can still get annotated in the source program (CL C
>>>>> non-generic pointers) in cases in which we cannot reasonably figure it
>>>>> out with just constant folding. In those cases the mode is extra
>>>>> information that you really lose.
>>>>
>>>> so, even in cl 1.x, you could do things like 'somefxn(foo ? global_ptr
>>>> : local_ptr)'.. depending on how much we inline all the things, that
>>>> might not get CF'd away.
>
> How does this even work btw? somefxn has a definition, and the
> definition specifies a mode for the argument right? (which is
> implicitly __private if the app does not specify anything?)

iirc, the cl spec has an example something along these lines..

it doesn't require *physical* storage for anything where you don't
know what the ptr type is, however.. so fat ptrs in ssa space works
out

>>>
>>> But something like
>>> __constant int *ptr_value = ...;
>>> store ptr in complex data structure.
>>> __constant int* ptr2 = load from complex data structure.
>>>
>>> Without explicitly annotating ptr2 it is unlikely that constant
>>> folding would find that ptr2 is pointing to __constant address space.
>>> Hence removing the modes loses valuable information that you cannot
>>> get back by constant folding. However, if you have a pointer with
>>> unknown mode, we could have a special mode (or mode_all?) and you can
>>> use the uvec2 representation in that case?
>>
>> hmm, I'm not really getting how deref->mode could magically have
>> information that fatptr.y doesn't have.. if the mode is known, vtn
>> could stash it in fatptr.y and everyone is happy?  If vtn doesn't know
>> this, then I don't see how deref->mode helps..
>
> You mean insert it into the fatptr every time deref_cast is called?
>
> Wouldn't that blow up the IR size significantly for very little benefit?

in an easy to clean up way, so meh?

>
>>
>>>>
>>>> I think I'm leaning towards using fat ptrs for the vk case, since I
>>>> guess that is a case where you could always expect
>>>> nir_src_as_const_value() to work, to get the variable mode.  If for no
>>>> other reason than I guess these deref's, if the var is not known,
>>>> start w/ deref_cast, and it would be ugly for deref_cast to have to
>>>> work differently for compute vs vk.  But maybe Jason already has some
>>>> thoughts about it?
>>>
>>> I'd like to avoid fat pointers alltogether on AMD since we would not
>>> use it even for CL. a generic pointer is just a uint64_t for us, with
>>> no bitfield in there for the address space.
>>>
>>> I think we may need to think a bit more about representation however,
>>> as e.g. for AMD a pointer is typically 64-bits (but we can do e.g.
>>> 32-bits for known workgroup pointers), the current deref instructions
>>> return 32-bit, and you want something like a uvec2 as pointer
>>> representation?
>>
>> afaiu, newer AMD (and NV) hw can remap shared/private into a single
>> global address space..  But I guess that is an easy subset of the
>> harder case where drivers need to use different instructions.. so a
>> pretty simple lowering pass run before lower_io could remap things
>> that use fatptrs into something that ignores fatptr.y.  Then opt
>> passes make fatptr.y go away.  So both AMD and hw that doesn't have a
>> flat address space are happy.
>
> But then you run into other issues, like how are you going to stuff a
> 64-bit fatptr.x + a ?-bit fatptr.y into a 64-bit value for Physical64
> addressing? Also this means we have to track to the sources back to
> the cast/var any time we want to do anything at all with any deref
> which seems less efficient to me than just stuffing the deref in
> there.

so fat ptrs only have to exist in ssa space, not be stored to
something with a physically defined size..

As far as tracking things to the head of the chain of deref
instructions, that is a matter of a simple helper or two.  Not like
the chain of deref's is going to be 1000's of instructions..

> Also, what would the something which ignores fatptr.y be? I'd assume
> that would be the normal deref based stuff, but requiring fatptr
> contradicts that?

if you have a flat address space, maybe a pass (or option for
lower_io) to just convert everything to load/store_global (since
essentially what these GPUs are doing is remapping shared/private into
the global address space)

BR,
-R


More information about the mesa-dev mailing list