[Mesa-dev] [PATCH v3 057/104] nir,spirv: Rework function calls

Tue Apr 10 18:53:45 UTC 2018

On Tue, Apr 10, 2018 at 1:50 PM, Rob Clark <robdclark at gmail.com> wrote:
> On Tue, Apr 10, 2018 at 1:17 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Tue, Apr 10, 2018 at 9:59 AM, Rob Clark <robdclark at gmail.com> wrote:
>>>
>>> On Tue, Apr 10, 2018 at 11:55 AM, Jason Ekstrand <jason at jlekstrand.net>
>>> wrote:
>>> > On Tue, Apr 10, 2018 at 8:17 AM, Rob Clark <robdclark at gmail.com> wrote:
>>> >>
>>> >> On Tue, Apr 10, 2018 at 11:04 AM, Jason Ekstrand <jason at jlekstrand.net>
>>> >> wrote:
>>> >> > On Tue, Apr 10, 2018 at 6:20 AM, Rob Clark <robdclark at gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> On Mon, Apr 9, 2018 at 10:52 PM, Jason Ekstrand
>>> >> >> <jason at jlekstrand.net>
>>> >> >> wrote:
>>> >> >> > + A bunch of potentially interested parties.
>>> >> >> >
>>> >> >> > On Mon, Apr 9, 2018 at 4:25 PM, Caio Marcelo de Oliveira Filho
>>> >> >> > <caio.oliveira at intel.com> wrote:
>>> >> >> >>
>>> >> >> >> Hi,
>>> >> >> >>
>>> >> >> >> >  typedef struct {
>>> >> >> >> > -   nir_parameter_type param_type;
>>> >> >> >> > -   const struct glsl_type *type;
>>> >> >> >> > +   uint8_t num_components;
>>> >> >> >> > +   uint8_t bit_size;
>>> >> >> >> >  } nir_parameter;
>>> >> >> >>
>>> >> >> >> (...)
>>> >> >> >>
>>> >> >> >> > @@ -683,18 +692,12 @@ validate_tex_instr(nir_tex_instr *instr,
>>> >> >> >> > validate_state *state)
>>> >> >> >> >  static void
>>> >> >> >> >  validate_call_instr(nir_call_instr *instr, validate_state
>>> >> >> >> > *state)
>>> >> >> >> >  {
>>> >> >> >> > -   if (instr->return_deref == NULL) {
>>> >> >> >> > -      validate_assert(state,
>>> >> >> >> > glsl_type_is_void(instr->callee->return_type));
>>> >> >> >> > -   } else {
>>> >> >> >> > -      validate_assert(state, instr->return_deref->deref.type
>>> >> >> >> > ==
>>> >> >> >> > instr->callee->return_type);
>>> >> >> >> > -      validate_deref_var(instr, instr->return_deref, state);
>>> >> >> >> > -   }
>>> >> >> >> > -
>>> >> >> >> >     validate_assert(state, instr->num_params ==
>>> >> >> >> > instr->callee->num_params);
>>> >> >> >> >
>>> >> >> >> >     for (unsigned i = 0; i < instr->num_params; i++) {
>>> >> >> >> > -      validate_assert(state, instr->callee->params[i].type ==
>>> >> >> >> > instr->params[i]->deref.type);
>>> >> >> >> > -      validate_deref_var(instr, instr->params[i], state);
>>> >> >> >> > +      validate_src(&instr->params[i], state,
>>> >> >> >> > +                   instr->callee->params[i].bit_size,
>>> >> >> >> > +                   instr->callee->params[i].num_components);
>>> >> >> >> >     }
>>> >> >> >> >  }
>>> >> >> >>
>>> >> >> >> Question: I might be misreading, but it seems like we are losing
>>> >> >> >> the
>>> >> >> >> type information for functions. Isn't that something worth
>>> >> >> >> keeping,
>>> >> >> >> maybe in some other way, e.g. load_param specifying the expected
>>> >> >> >> type?
>>> >> >> >
>>> >> >> >
>>> >> >> > That's a very good question!  To be honest, I'm not sure what the
>>> >> >> > answer
>>> >> >> > is.
>>> >> >> > At the moment, the type information is fairly useless for most of
>>> >> >> > what
>>> >> >> > we
>>> >> >> > use functions for.  Really, all we need is something that NIR can
>>> >> >> > inline.
>>> >> >> > As it is, we're not really preserving the types from SPIR-V
>>> >> >> > because
>>> >> >> > of
>>> >> >> > the
>>> >> >> > gymnastics we're doing to handle pointers.
>>> >> >> >
>>> >> >> > If we did want to preserve types, we'd need to have more detailed
>>> >> >> > type
>>> >> >> > information.  In particular, we'd need to be able to provide
>>> >> >> > pointer
>>> >> >> > types
>>> >> >> > and maybe combined image-sampler types.  And along with those
>>> >> >> > pointer
>>> >> >> > types,
>>> >> >> > we'd need to somehow express those pointer's storage requirements.
>>> >> >> >
>>> >> >> > The philosophy behind this commit is that, if we don't have a good
>>> >> >> > match
>>> >> >> > to
>>> >> >> > SPIR-V anyway, we might as well just chuck that information and do
>>> >> >> > whatever
>>> >> >> > makes our lives the easiest.  My philosophy here may be flawed and
>>> >> >> > I'm
>>> >> >> > happy
>>> >> >> > to hear arguments in favor of keeping the information.  The best
>>> >> >> > argument I
>>> >> >> > can come up with for keeping the information is if we find
>>> >> >> > ourselves
>>> >> >> > wanting
>>> >> >> > to do some sort of linking in the future where we have to match
>>> >> >> > functions by
>>> >> >> > both name and type.  If we want to do that, however, we'll need
>>> >> >> > all
>>> >> >> > the
>>> >> >> > SPIR-V type information.
>>> >> >> >
>>> >> >>
>>> >> >> We do end up wanting the type information for cl kernels.  This is
>>> >> >> maybe a slightly different case from calls within shader code (ie.
>>> >> >> when both caller and callee are in shader).
>>> >> >
>>> >> >
>>> >> > Yes, I think it is.  Question: Is there a distinction in CL between
>>> >> > functions which are entrypoints callable from the API and functions
>>> >> > which
>>> >> > are helpers?  i.e. Can you call an entrypoint as a helper?
>>> >> >
>>> >>
>>> >> There is the __kernel annotation.  And you know the entry point name
>>> >> when compiling.  However I'm not sure anything prevents one entry
>>> >> point from calling another.
>>> >
>>> >
>>> > That would be worth investigating.
>>> >
>>>
>>> fwiw, at least the cl to spv compiler seems to allow it.  (Although in
>>> my simple examples it also inlines the called function before things
>>> end up in spv.)
>>
>>
>> Interesting.
>>
>>>
>>> >>
>>> >> I'm not sure we want the calling convention to be the same internally
>>> >> as for kernel entry points so in that case, if we aren't inlining
>>> >> everything, we might end up generating two versions of a function (or
>>> >> possibly a shim.. or possibly between the two based on size.. or??)
>>> >
>>> >
>>> > Having a shim seems like a reasonable plan.
>>> >
>>> >>
>>> >> >>
>>> >> >> Although I'd kinda like
>>> >> >> to think that we don't need to make vtn aware of this distinction.
>>> >> >
>>> >> >
>>> >> > Someone has to be aware of it. :-)  There are lots of places in
>>> >> > spirv_to_nir
>>> >> > were we take the SPIR-V and do something slightly different with it
>>> >> > than
>>> >> > the
>>> >> > obvious translation.  Also, using function parameters for this is a
>>> >> > significant anachronism because no other shader I/O in NIR has ever
>>> >> > worked
>>> >> > that way.
>>> >> >
>>> >> >>
>>> >> >> So just to throw out an idea.  What if vtn just used load_deref for
>>> >> >> everything, and in the case of fxn params it just points to a local
>>> >> >> var with type nir_var_param?  (Or something roughly like that.)
>>> >> >> Then
>>> >> >> lower_io lowers this to load_param.
>>> >> >
>>> >> >
>>> >> > That's kind-of what the original thing did.  However, for SPIR-V
>>> >> > helper
>>> >> > functions we have to be able to pass through pointers, SSA values
>>> >> > with
>>> >> > arbitrary type, and image/sampler pointers.  SSA values can be
>>> >> > handled
>>> >> > by
>>> >> > just making a variable and storing them to it.  Pointers are tricky
>>> >> > because
>>> >> > they're not really copy-in/out.  For images, samplers, and pointers,
>>> >> > we
>>> >> > have
>>> >> > a pile of "try to patch up the deref chain" code in
>>> >> > nir_inline_functions
>>> >> > that's rather tricky.  The moral of the story is that "just use
>>> >> > variables"
>>> >> > is not nearly as obvious of a choice as it looks.
>>> >>
>>> >> So, I'm considering just adding nir_variable ptr back to
>>> >> nir_parameter.  That would be rather easy, and gives something that
>>> >> some lowering pass could fixup data.driver_loc with parameter
>>> >> position.  That would be rather easy for driver backends:
>>> >>
>>> >>   param_loc = fxn->params[param_idx]->var->data.driver_loc
>>> >
>>> >
>>> > Why can't you just add a mode nir_var_kernel_param and keep a running
>>> > uint32_t offset in the builder.  When you process an OpParameter in an
>>> > entrypoint, create a nir_var_kernel_param variable and set
>>> > var->data.location = offset and add the size of the param to offset.
>>> > Then
>>>
>>> So far, with the new deref chain, I've managed to keep knowledge of
>>> physical layout of things in lower_io and out of vtn.. it would be
>>> nice to keep it that way.  Otherwise vtn needs to know about
>>> size/alignment rules.
>>
>>
>> Another option (which maybe you've already brought up without me
>> understanding) would be to make nir_var_kernel_param a local variable type
>> and have the list of params stored in the function next to the locals.  Then
>> you would know exactly what params are associated with a given function impl
>> and can assign locations at-will.
>
> That sounds fairly close to what I was suggesting with adding the var
> to nir_parameter, just in nir_function_impl instead of nir_function
>
>>
>>>
>>> > the OpParameter can return an access chain to the newly created
>>> > interface
>>> > variable.  I fail to see why load_param is needed at all here.
>>>
>>> I'd prefer not to have this end up like load_input (or load_uniform)
>>> in the driver (if that is what you had in mind).. *maybe* we could end
>>> up treating this like a UBO, although blob does try to use uniforms
>>> when possible (like when it doesn't need a pointer or byte
>>> addressing), since that is faster.  So I think we should have a
>>> distinct intrinsic.. whether that be load_param or something
>>> different.
>>
>>
>> What is it about load_input and load_uniform that you don't like?  I'm
>> really trying to understand what's going on here.  Clearly, there's
>> something but I don't know what it is.
>
> Well, a couple things..
>
> 1) for (perhaps hysterical) reasons, for gallium drivers
> load_input/load_uniform have offset in units of 32b dwords.  Basically
> this matched how TGSI works (and therefore, I guess, how tgsi_to_nir
> worked, and the st_glsl_to_nir path after that..)
>
> (This actually works fine for ir3, since hw sticks VS inputs into
> registers for us, and non-ubo uniforms are basically just a different
> bank of read-only registers.)
>
> 2) my current thinking to be able to use uniforms when possible (since
> they are faster, can be directly used as src to other instructions in
> many cases, etc), but still be able to do pointers and byte
> addressing, I was thinking about using the same ubo as both a ubo and
> uniform upload buffer.  So the benefit of a different intrinsic is
> that I don't have to shoe-horn that special compute kernel logic into
> load_input/load_uniform handling
>
>
>>
>>>
>>> Also, fwiw, we need to be able to do pointer math on function parameters
>>> ;-)
>>>
>>> __kernel void test_fn(__global int *res, __constant struct SomeStruct
>>> b[5], int c)
>>> {
>>>     __constant struct SomeStruct *s = &b[c];
>>>     if (c < 3)
>>>         s++;
>>>     *res = s->s.b;
>>> }
>>
>>
>> I figured you would.  That's part of my confusion about why you want to
>> treat them the same as parameters you pass from one function to another.
>> They don't work at all like local variables.

So, actually although it looks like pointer math on kernel params,
what it looks like in spirv is more like:

__kernel void test_fn(__global int *res, __constant struct SomeStruct
*b, int c) ...

ie. it is already a pointer, so we don't need to generate a pointer
from something that isn't one already.

I *think* this matches how clover works.  I'm not 100% sure, because
there is enough indirection and templates and c++ trickiness that it
is a bit hard to tell exactly what is going on, and without fxn params
implemented I can't get far enough to just run under gdb and see ;-)

I think the nir_variable pointer in nir_parameter would work just
fine.  (And I guess it is kind of ok if we don't get this perfect the
first time and want to change it later, since unlike deref chains to
instr, it is a pretty minor flag day sorta thing.)

BR,
-R

>
> it does make it possibly attractive to fit this into the deref
> scheme.. I think..
>
> BR,
> -R