[Mesa-dev] [PATCH v2 00/31] Nir support for Nouveau

Thu Jan 4 19:56:44 UTC 2018

On January 4, 2018 12:51:15 Karol Herbst <kherbst at redhat.com> wrote:

> On Thu, Jan 4, 2018 at 7:06 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Thu, Jan 4, 2018 at 10:01 AM, Karol Herbst <kherbst at redhat.com> wrote:
>>> significant changes to last series:
>>> * arb_gpu_shader5 interpolateat* (those nir ops don't map well to nvir)
>>>   no good plan on how to properly implement those
>>
>> What's the issue? They should map as well as the TGSI ones. (Since the
>> TGSI ones are just the GLSL ones.)
>>
>
> it is a bit ugly, because usually all inputs vars are lowered away, so
> that they are inputs. So they need special handling;
>
> lowered (input is centroid):
> vec1 32 ssa_25 = intrinsic load_input (ssa_24) () (0, 0) /* base=0 */
> /* component=0 */ /* packed:centroid_qualified */
> vec1 32 ssa_27 = intrinsic load_input (ssa_26) () (0, 1) /* base=0 */
> /* component=1 */ /* packed:centroid_qualified */
>
> not lowered:
> decl_var  INTERP_MODE_NONE vec2 in at unqualified-temp
> vec2 32 ssa_11 = intrinsic interp_var_at_centroid () (in at unqualified-temp) ()
>
> I kind of wished I could have a load_input intrinsic with a flag or
> load_input_at_centroid, so that I end up with the same code in the
> end.

In i965, we use the NIR explicit input interpolation intrinsics.  I'm on my 
phone so I can't give more details easily.

>>> * arb_gpu_shader5.texturegatheroffsets (nir internal assert)
>>>   glsl_to_nir.cpp:2082: virtual void 
>>>   {anonymous}::nir_visitor::visit(ir_texture*): Assertion 
>>>   `ir->offset->type->is_vector() || ir->offset->type->is_scalar()' failed.
>>
>> This is because nir doesn't support the 4-offset tg4 variant. This is
>> expected (by nir) to be lowered in GLSL to 4 separate gathers, but
>> isn't because nvc0 doesn't set the caps to make st/mesa do that.
>> Either set that cap based on whether NIR is used, or teach nir about
>> the 4-offset tg4 (which the nvidia hw supports directly btw).
>>
>
> well I would prefer the last one obviously, but nir gives me a
> nir_texop_tg4 in other tests, it is just those mentioned above where
> it fails.

I would prefer that as well.  There's no reason NIR can't support it so we 
may as well add support.  We should also move the lowering from 
spirv_to_nit to nir_lower_tex so that spirv_to_nir can give you the 
unlowered version you want.

>>> * some int64 stuff related to compound types
>>
>> As I mentioned, you either have to fix RA (I don't recommend this), or
>> you have to stop using 64-bit Value's for storage. Use 32-bit Value's,
>> and merge/split them all the time around 64-bit ops like the TGSI FE
>> does (which was implemented that way largely due to the way TGSI
>> works, but is a happy coincidence that it also works around some of
>> the RA shortcomings). And additionally you may need to improve the
>> merge splits pass to avoid some of the pain.
>>
>> You could also just disable int64 for now - it's not important.
>>
>>> * various extensions
>>> * variable-indexing (related to above mentioned packing issue)
>>> * glsl-4.20.execution.vs_in
>>> * some variable-indexing issues related to unaligned memory accesses
>>
>> The variable-indexing stuff is extremely important to work out, since
>> it belies a fundamental problem in some approach to the conversion.
>>
>
> well the normal variable indexing stuff works if I disable
> nir_compact_varyings, which we might want to do anyway for nouveau for
> now. Or I teach memorOpt to not merge things for unaligned addresses.
>
> I have to take a more focused look at the fails anyway
>
>>> * some geometry shader fails
>>
>> Have you done any testing with nv50? It should largely work out, but
>> there are some things you have to be careful about. The TGSI frontend
>> generates IR that is capable of being processed by both the nv50 and
>> nvc0 lowering/RA/emission logic, would want to ensure that an nir
>> frontend would be able to do this too. If you don't have access to a
>> Tesla-era GPU, I can act as a tester in a limited capacity.
>>
>
> I have a tesla GPU.
>
>> Sounds like this is still all pretty experimental and has a lot of
>> deep issues given the fail/crash count... IMHO not ready for merging.
>> Also you really need to come up with a workable solution to the
>> immediates issue.
>>
>
> well I could just store them like it is done with TGSI and just put
> loadImms where accessed, but this doesn't really fit the NIR logic
> here. Maybe there is a NIR pass to move them around, so that the issue
> is less significant. Or maybe I always check if the source contains a
> const value and use loadImm instead of getting the stored immediate
> value. Yeah I think the last idea would be less painful, we just end
> up with more dead instructions after converting.

What is the nature of the immediate problem?  We may have a similar issue.