[Mesa-dev] [PATCH v3 06/19] RFC: nir/vtn: "raw" pointer support

Sat Mar 24 23:18:27 UTC 2018

On Fri, Mar 23, 2018 at 5:18 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Fri, Mar 23, 2018 at 2:15 PM, Karol Herbst <kherbst at redhat.com> wrote:
>>
>> On Fri, Mar 23, 2018 at 10:07 PM, Jason Ekstrand <jason at jlekstrand.net>
>> wrote:
>> > +list
>> >
>> > On Fri, Mar 23, 2018 at 1:45 PM, Karol Herbst <kherbst at redhat.com>
>> > wrote:
>> >>
>> >> On Fri, Mar 23, 2018 at 9:30 PM, Jason Ekstrand <jason at jlekstrand.net>
>> >> wrote:
>> >> > As I've been rewriting core NIR deref handling, I've been thinking
>> >> > about
>> >> > this problem quite a bit.  One objective I have is to actually make
>> >> > UBO
>> >> > and
>> >> > SSBO access go through derefs instead of just being an offset and
>> >> > index
>> >> > so
>> >> > that the compiler can better reason about them.  In particular, I
>> >> > want
>> >> > to be
>> >> > able to start doing load/store elimination on SSBOs, SLM, and
>> >> > whatever
>> >> > CL
>> >> > has which would be great for everyone's compute performance (GL,
>> >> > Vulkan,
>> >> > CL,
>> >> > etc.).
>> >> >
>> >> > I would be lying if I said I had a full plan but I do have part of a
>> >> > plan.
>> >> > In my patch which adds the deref instructions, I add a new "cast"
>> >> > deref
>> >> > type
>> >> > which takes an arbitrary value as it's source and kicks out a deref
>> >> > with
>> >> > a
>> >> > type.  Whenever we discover that the source of the cast is actually
>> >> > another
>> >> > deref which is compatible (same type etc.), copy propagation gets rid
>> >> > of
>> >> > the
>> >> > cast for you.  The idea is that, instead of doing a
>> >> > load_raw(raw_ptr),
>> >> > you
>> >> > would do a load((type *)raw_ptr).
>> >> >
>> >> > Right now, most of the core NIR optimizations will throw a fit if
>> >> > they
>> >> > ever
>> >> > see a cast.  This is intentional because it requires us to manually
>> >> > go
>> >> > through and handle casts.  This would mean that, at the moment, you
>> >> > would
>> >> > have to lower to load_raw intrinsics almost immediately after coming
>> >> > out
>> >> > of
>> >> > SPIR-V.
>> >> >
>> >>
>> >> Well it gets more fun with OpenCL 2.0 where you can have generic
>> >> pointer where you only know the type at creation type. You can also
>> >> declare generic pointers as function inputs in a way, that you never
>> >> actually know from where you have to load if you only have that one
>> >> function. So the actual load operation depends on when you create the
>> >> initial pointer variable (you can cast from X to generic, but not the
>> >> other way around).
>> >>
>> >> Which in the end means you can end up with load(generic_ptr) and only
>> >> following the chain up to it's creation (with function inlining in
>> >> mind) you know the actual memory target.
>> >
>> >
>> > Yup.  And there will always be crazy cases where you can't actually
>> > follow
>> > it and you have to emit a pile of code to load different ways depending
>> > on
>> > some bits somewhere that tell you how to load it.  I'm well aware of the
>> > insanity. :-)  This is part of the reason why I'm glad I'm not trying to
>> > write an OpenCL 2.0 driver.
>> >
>> > This insanity is exactly why I'm suggesting the pointer casting.  Sure,
>> > you
>> > may not know the data type until the actual load.  In that case, you end
>> > up
>> > with the cast being right before the load.  If you don't know the
>> > storage
>> > class, maybe you have to switch and do multiple casts based on some
>> > bits.
>> > Alternatively, if you don't know the storage class, we can just let the
>> > deref mode be 0 for "I don't know". or maybe multiple bits for "these
>> > are
>> > the things it might be".  In any case, I think we can handle it.
>> >
>>
>> there shouldn't be a situation where we don't know, except when you
>> don't inline all functions. I think Rob had the idea of fat pointers
>> where a pointer is a vec2 and the 2nd component contains the actual
>> pointer type and you end up with a switch over the type to get the
>> correct storage class. And if the compiler inlines all functions, it
>> should be able to optimize that switch away.
>
>
> Right.  Today, we live in a world where all functions are inlined.  Sadly, I
> fear that world may come to and end one of these days. :(
>

fwiw, so far I'm mostly caring about the inline-all-the-fxns case..

for the cases where we don't know what sort of pointer we have, Karol
(iirc?) suggested name-mangling functions, which seems semi-sane.. but
I've mostly tried to ignore that for now until we have more basic
things working.

Possibly we need a compiler option to lower everything to
load/store_global (or maybe "raw" is a better name?) for hw that can
remap local memory into a single address space and use the same
load/store instructions.  I think that should be at least enough to
move forward with nv hw + fxn calls.  Less so for intel/adreno but
from my PoV I'm willing to solve that problem later.

BR,
-R