[Mesa-dev] [PATCH v3 06/19] RFC: nir/vtn: "raw" pointer support
Karol Herbst
kherbst at redhat.com
Sun Mar 25 10:35:39 UTC 2018
On Sun, Mar 25, 2018 at 12:18 AM, Rob Clark <robdclark at gmail.com> wrote:
> On Fri, Mar 23, 2018 at 5:18 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Fri, Mar 23, 2018 at 2:15 PM, Karol Herbst <kherbst at redhat.com> wrote:
>>>
>>> On Fri, Mar 23, 2018 at 10:07 PM, Jason Ekstrand <jason at jlekstrand.net>
>>> wrote:
>>> > +list
>>> >
>>> > On Fri, Mar 23, 2018 at 1:45 PM, Karol Herbst <kherbst at redhat.com>
>>> > wrote:
>>> >>
>>> >> On Fri, Mar 23, 2018 at 9:30 PM, Jason Ekstrand <jason at jlekstrand.net>
>>> >> wrote:
>>> >> > As I've been rewriting core NIR deref handling, I've been thinking
>>> >> > about
>>> >> > this problem quite a bit. One objective I have is to actually make
>>> >> > UBO
>>> >> > and
>>> >> > SSBO access go through derefs instead of just being an offset and
>>> >> > index
>>> >> > so
>>> >> > that the compiler can better reason about them. In particular, I
>>> >> > want
>>> >> > to be
>>> >> > able to start doing load/store elimination on SSBOs, SLM, and
>>> >> > whatever
>>> >> > CL
>>> >> > has which would be great for everyone's compute performance (GL,
>>> >> > Vulkan,
>>> >> > CL,
>>> >> > etc.).
>>> >> >
>>> >> > I would be lying if I said I had a full plan but I do have part of a
>>> >> > plan.
>>> >> > In my patch which adds the deref instructions, I add a new "cast"
>>> >> > deref
>>> >> > type
>>> >> > which takes an arbitrary value as it's source and kicks out a deref
>>> >> > with
>>> >> > a
>>> >> > type. Whenever we discover that the source of the cast is actually
>>> >> > another
>>> >> > deref which is compatible (same type etc.), copy propagation gets rid
>>> >> > of
>>> >> > the
>>> >> > cast for you. The idea is that, instead of doing a
>>> >> > load_raw(raw_ptr),
>>> >> > you
>>> >> > would do a load((type *)raw_ptr).
>>> >> >
>>> >> > Right now, most of the core NIR optimizations will throw a fit if
>>> >> > they
>>> >> > ever
>>> >> > see a cast. This is intentional because it requires us to manually
>>> >> > go
>>> >> > through and handle casts. This would mean that, at the moment, you
>>> >> > would
>>> >> > have to lower to load_raw intrinsics almost immediately after coming
>>> >> > out
>>> >> > of
>>> >> > SPIR-V.
>>> >> >
>>> >>
>>> >> Well it gets more fun with OpenCL 2.0 where you can have generic
>>> >> pointer where you only know the type at creation type. You can also
>>> >> declare generic pointers as function inputs in a way, that you never
>>> >> actually know from where you have to load if you only have that one
>>> >> function. So the actual load operation depends on when you create the
>>> >> initial pointer variable (you can cast from X to generic, but not the
>>> >> other way around).
>>> >>
>>> >> Which in the end means you can end up with load(generic_ptr) and only
>>> >> following the chain up to it's creation (with function inlining in
>>> >> mind) you know the actual memory target.
>>> >
>>> >
>>> > Yup. And there will always be crazy cases where you can't actually
>>> > follow
>>> > it and you have to emit a pile of code to load different ways depending
>>> > on
>>> > some bits somewhere that tell you how to load it. I'm well aware of the
>>> > insanity. :-) This is part of the reason why I'm glad I'm not trying to
>>> > write an OpenCL 2.0 driver.
>>> >
>>> > This insanity is exactly why I'm suggesting the pointer casting. Sure,
>>> > you
>>> > may not know the data type until the actual load. In that case, you end
>>> > up
>>> > with the cast being right before the load. If you don't know the
>>> > storage
>>> > class, maybe you have to switch and do multiple casts based on some
>>> > bits.
>>> > Alternatively, if you don't know the storage class, we can just let the
>>> > deref mode be 0 for "I don't know". or maybe multiple bits for "these
>>> > are
>>> > the things it might be". In any case, I think we can handle it.
>>> >
>>>
>>> there shouldn't be a situation where we don't know, except when you
>>> don't inline all functions. I think Rob had the idea of fat pointers
>>> where a pointer is a vec2 and the 2nd component contains the actual
>>> pointer type and you end up with a switch over the type to get the
>>> correct storage class. And if the compiler inlines all functions, it
>>> should be able to optimize that switch away.
>>
>>
>> Right. Today, we live in a world where all functions are inlined. Sadly, I
>> fear that world may come to and end one of these days. :(
>>
>
> fwiw, so far I'm mostly caring about the inline-all-the-fxns case..
>
> for the cases where we don't know what sort of pointer we have, Karol
> (iirc?) suggested name-mangling functions, which seems semi-sane.. but
> I've mostly tried to ignore that for now until we have more basic
> things working.
>
> Possibly we need a compiler option to lower everything to
> load/store_global (or maybe "raw" is a better name?) for hw that can
> remap local memory into a single address space and use the same
> load/store instructions. I think that should be at least enough to
> move forward with nv hw + fxn calls. Less so for intel/adreno but
> from my PoV I'm willing to solve that problem later.
>
I don't think this works out, because it isn't only about local vs
global. We also have private memory pointers you can assign to generic
pointers. And I am sure most compilers will use registers for private
memory if they can.
private memory pointers are used if you for example get the pointer of
a stack variable.
> BR,
> -R
More information about the mesa-dev
mailing list