[Mesa-dev] [PATCH 16/78] i965/nir/vec4: Implement store_output intrinsic

Eduardo Lima Mitev elima at igalia.com
Fri Jul 24 04:10:55 PDT 2015


On 07/23/2015 05:39 PM, Jason Ekstrand wrote:
> On Thu, Jul 23, 2015 at 1:01 AM, Eduardo Lima Mitev <elima at igalia.com> wrote:
>> On 07/23/2015 05:20 AM, Jason Ekstrand wrote:
>>> On Wed, Jul 22, 2015 at 4:37 AM, Eduardo Lima Mitev <elima at igalia.com> wrote:
>>>> On 07/13/2015 01:57 PM, Jason Ekstrand wrote:
>>>>> On Wed, Jul 8, 2015 at 11:54 PM, Eduardo Lima Mitev <elima at igalia.com> wrote:
>>>>>> On 06/30/2015 06:51 PM, Jason Ekstrand wrote:
>>>>>>> On Fri, Jun 26, 2015 at 1:06 AM, Eduardo Lima Mitev <elima at igalia.com> wrote:
>>>>>>>> The index into the output_reg array where to store the destination register is
>>>>>>>> fetched from the nir_outputs map built during nir_setup_outputs stage.
>>>>>>>>
>>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580
>>>>>>>> ---
>>>>>>>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 17 +++++++++++++++--
>>>>>>>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>>>>>>> index 8a2d335..55d4490 100644
>>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>>>>>>> @@ -520,10 +520,23 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     case nir_intrinsic_store_output_indirect:
>>>>>>>> +      has_indirect = true;
>>>>>>>>        /* fallthrough */
>>>>>>>> -   case nir_intrinsic_store_output:
>>>>>>>> -      /* @TODO: Not yet implemented */
>>>>>>>> +   case nir_intrinsic_store_output: {
>>>>>>>> +      int offset = instr->const_index[0];
>>>>>>>> +      int output = nir_outputs[offset];
>>>>>>>> +
>>>>>>>> +      src = get_nir_src(instr->src[0], nir_output_types[offset]);
>>>>>>>> +      dest = dst_reg(src);
>>>>>>>> +
>>>>>>>> +      dest.writemask = brw_writemask_for_size(instr->num_components);
>>>>>>>> +
>>>>>>>> +      if (has_indirect)
>>>>>>>> +         dest.reladdr = new(mem_ctx) src_reg(get_nir_src(instr->src[1]));
>>>>>>>> +
>>>>>>>> +      output_reg[output] = dest;
>>>>>>>
>>>>>>> I'm very confused about the amount of indirection going on here.  It
>>>>>>> seems to me that we should be setting these outputs up in
>>>>>>> setup_outputs() rather than storring off a map from ints to other ints
>>>>>>> and setting it up here.  I didn't make this comment on the patch for
>>>>>>> setup_outputs() because I wanted to wait to see it used before I
>>>>>>> commented on it.
>>>>>>>
>>>>>>> I'm guessing you did it this way because the nir_assign_var_locations
>>>>>>> is giving you bogus values.  If so, then it might be better to just
>>>>>>> assign variable locations in setup_outputs() rather than having a
>>>>>>> remap table.  The whole point of nir_lower_io is to make IO easy for
>>>>>>> the back-end.  If you need a re-map table, then it's no longer making
>>>>>>> it easy and we need to think more about what's going on.
>>>>>>> --Jason
>>>>>>>
>>>>>>
>>>>>> That double indirection felt bad since the beginning, but it was needed
>>>>>> to store the original variable's location (var->data.location). Let me
>>>>>> explain:
>>>>>>
>>>>>> We are (re)using the plumbering in vec4_visitor to setup URB, so the
>>>>>> only thing we need to do is to store the out register in "output_reg"
>>>>>> map at the correct location. And that location is given by the original
>>>>>> location in the shader (var->data.location).
>>>>>>
>>>>>> So, in this case, "nir_assign_var_locations" pass, which constructs
>>>>>> var->data.driver_location, is not useful to us, except to give us
>>>>>> consecutive indexes to construct the other map we have, the type map,
>>>>>> which is needed to carry the correct type from the original variable to
>>>>>> the output register.
>>>>>
>>>>> If nir_assign_var_locations isn't doing anything for you, don't call
>>>>> it.  You'll need to do something with var->data.driver_location.  If
>>>>> what you really want is var->data.location, then just copy that to
>>>>> var->data.driver_location when you do nir_setup_outputs.  Or
>>>>> (depending on how the URB setup works, I don't actually know), put the
>>>>> actual URB location in var->data.driver_location when you walk the
>>>>> outputs.
>>>>>
>>>>> From there, you have two options.  One would be to setup output_reg at
>>>>> the same time with the correct types right away and emit a MOV when
>>>>> you get a store_output.  (Copy propagation should clean up the MOV.)
>>>>> For what it's worth, I don't think the type matters; a URB write just
>>>>> writes data to something so as long as you don't have a type mismatch
>>>>> in a MOV, the hardware won't care.
>>>>>
>>>>> The other option, would be to directly emit the URB write in
>>>>> store_output.  At the moment, it may be better to take the first
>>>>> option since that better matches what the FS does right now.  But both
>>>>> should work fine.
>>>>>
>>>>
>>>> Thanks for these hints, they were very useful.
>>>>
>>>> I rewrote the implementation of store_output intrinsic to avoid the
>>>> setup phase completely. The type, as you suggested, was not important as
>>>> long as they match while MOVing the contents of output_reg. To guarantee
>>>> that, I had to patch the emit_urb_slot() to guarantee the types always
>>>> match. This code is shared with vec4_visitor, so it makes sense to move
>>>> the safeguards there instead of having both backends provide the correct
>>>> register type in output_reg entries.
>>>> For reference, this is the patch that implements it:
>>>> https://github.com/Igalia/mesa/commit/8c703937f285c0b3a1e7bf6681c7ed7fe09815aa
>>>
>>> Seems reasonable.
>>>
>>>> I also put var->data.location in const_index[1] of the intrinsic op, and
>>>> disabled nir_assign_var_locations() for output variables, since I don't
>>>> need var->data.driver_location. I could have used const_index[0], but I
>>>> prefer to leave driver_location there, and use const_index[1], to avoid
>>>> breaking any driver that rely on current layout of const_index (like
>>>> FS-nir). I think it is a safer approach.
>>>
>>> You're not going to break anything by going through the output
>>> variables and setting driver_location equal to location.  The whole
>>> point of driver_location is to store some backend-specific index for
>>> the variable.  In other words, to do exactly what you're doing.  The
>>> assign_var_locations calls are simply convenience functions for
>>> setting the driver_location field.  In other words, using
>>> driver_location and const_index[0] is *exactly* what you should do.
>>>
>>
>> Well, FS-nir relies on const_index[0] being data.driver_location. So at
>> the very least I have to put a condition like:
>>
>> if (scalar)
>>    const_index[0] = var->data.driver_location
>> else
>>    const_index[0] = var->data.location
>>
>> Otherwise we directly break our own FS-nir pass.
>>
>> My first implementation did that, but since this is common NIR code
>> (theoretically) shared with other backends, putting var->data.location
>> in const_index[0] for all non-scalar backends seemed like a bad idea.
>> Specially considering that this is very dependent on the implementation
>> of URB file in vec4_visitor, with the output_reg intermediate map and
>> all. That's why I decided to play safe on pure-NIR side, having both
>> driver_location and location available to backends.
>>
>> But if you think I can ignore this then I'm all for it too.
> 
> What I meant was more like the following in brw_nir.c:
> 
> if (is_scalar) {
>    assign_var_locations(&nir->outputs, &nir->num_outputs, true);
> } else {
>    foreach_list_typed(nir_variable, var, node, &nir->outputs)
>       var->data.driver_location = var->data.location;
> }
> 
> and then just let nir_lower_io use the driver location all the time.
> 
> Does that make more sense?
> 

It does, indeed. I will update the patch and send it again to the list
as part of the v2 series that you are already reviewing.

Thanks!

>>> --Jason
>>>
>>>> All in all, the store_output implementation got much simpler.
>>>>
>>>>>> So, before knowing that I could modify nir_lower_io, my best shot at
>>>>>> transferring the original variable location was to create this
>>>>>> nir_outputs map. Now, what I have done is to put that value in
>>>>>> const_index[1] of the intrinsic instruction, which was previously
>>>>>> unused. What do you think?
>>>>>>
>>>>>> That removes the offset to offset map, but we still need the type map.
>>>>>>
>>>>>> About your comment on initializing the register during setup stage, I'm
>>>>>> a bit confused: the register that we need to store is not available
>>>>>> during setup stage, because we still don't have local registers allocated.
>>>>>
>>>>> What do you mean?  Because you don't have the destination of the
>>>>> output_write intrinsic allocated?  Even if the register has a file of
>>>>> BAD_FILE, you could still store the type there.  Also, as I said
>>>>> above, the hardware shouldn't care about the types of data.  As long
>>>>> as the URB write code doesn't accidentally do a float -> int
>>>>> conversion or something, we should be fine.
>>>>> --Jason
>>>>>
>>>>>>>>        break;
>>>>>>>> +   }
>>>>>>>>
>>>>>>>>     case nir_intrinsic_load_vertex_id:
>>>>>>>>        unreachable("should be lowered by lower_vertex_id()");
>>>>>>>> --
>>>>>>
>>>>>
>>>>
>>>
>>
> 



More information about the mesa-dev mailing list