[Mesa-dev] [PATCH v2 00/31] Nir support for Nouveau

Thu Jan 4 19:57:59 UTC 2018

On Thu, Jan 4, 2018 at 8:56 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On January 4, 2018 12:51:15 Karol Herbst <kherbst at redhat.com> wrote:
>
>> On Thu, Jan 4, 2018 at 7:06 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>
>>> On Thu, Jan 4, 2018 at 10:01 AM, Karol Herbst <kherbst at redhat.com> wrote:
>>>>
>>>> significant changes to last series:
>>>> * arb_gpu_shader5 interpolateat* (those nir ops don't map well to nvir)
>>>>   no good plan on how to properly implement those
>>>
>>>
>>> What's the issue? They should map as well as the TGSI ones. (Since the
>>> TGSI ones are just the GLSL ones.)
>>>
>>
>> it is a bit ugly, because usually all inputs vars are lowered away, so
>> that they are inputs. So they need special handling;
>>
>> lowered (input is centroid):
>> vec1 32 ssa_25 = intrinsic load_input (ssa_24) () (0, 0) /* base=0 */
>> /* component=0 */ /* packed:centroid_qualified */
>> vec1 32 ssa_27 = intrinsic load_input (ssa_26) () (0, 1) /* base=0 */
>> /* component=1 */ /* packed:centroid_qualified */
>>
>> not lowered:
>> decl_var  INTERP_MODE_NONE vec2 in at unqualified-temp
>> vec2 32 ssa_11 = intrinsic interp_var_at_centroid () (in at unqualified-temp)
>> ()
>>
>> I kind of wished I could have a load_input intrinsic with a flag or
>> load_input_at_centroid, so that I end up with the same code in the
>> end.
>
>
> In i965, we use the NIR explicit input interpolation intrinsics.  I'm on my
> phone so I can't give more details easily.
>
>>>> * arb_gpu_shader5.texturegatheroffsets (nir internal assert)
>>>>   glsl_to_nir.cpp:2082: virtual void
>>>> {anonymous}::nir_visitor::visit(ir_texture*): Assertion
>>>> `ir->offset->type->is_vector() || ir->offset->type->is_scalar()' failed.
>>>
>>>
>>> This is because nir doesn't support the 4-offset tg4 variant. This is
>>> expected (by nir) to be lowered in GLSL to 4 separate gathers, but
>>> isn't because nvc0 doesn't set the caps to make st/mesa do that.
>>> Either set that cap based on whether NIR is used, or teach nir about
>>> the 4-offset tg4 (which the nvidia hw supports directly btw).
>>>
>>
>> well I would prefer the last one obviously, but nir gives me a
>> nir_texop_tg4 in other tests, it is just those mentioned above where
>> it fails.
>
>
> I would prefer that as well.  There's no reason NIR can't support it so we
> may as well add support.  We should also move the lowering from spirv_to_nit
> to nir_lower_tex so that spirv_to_nir can give you the unlowered version you
> want.
>
>
>>>> * some int64 stuff related to compound types
>>>
>>>
>>> As I mentioned, you either have to fix RA (I don't recommend this), or
>>> you have to stop using 64-bit Value's for storage. Use 32-bit Value's,
>>> and merge/split them all the time around 64-bit ops like the TGSI FE
>>> does (which was implemented that way largely due to the way TGSI
>>> works, but is a happy coincidence that it also works around some of
>>> the RA shortcomings). And additionally you may need to improve the
>>> merge splits pass to avoid some of the pain.
>>>
>>> You could also just disable int64 for now - it's not important.
>>>
>>>> * various extensions
>>>> * variable-indexing (related to above mentioned packing issue)
>>>> * glsl-4.20.execution.vs_in
>>>> * some variable-indexing issues related to unaligned memory accesses
>>>
>>>
>>> The variable-indexing stuff is extremely important to work out, since
>>> it belies a fundamental problem in some approach to the conversion.
>>>
>>
>> well the normal variable indexing stuff works if I disable
>> nir_compact_varyings, which we might want to do anyway for nouveau for
>> now. Or I teach memorOpt to not merge things for unaligned addresses.
>>
>> I have to take a more focused look at the fails anyway
>>
>>>> * some geometry shader fails
>>>
>>>
>>> Have you done any testing with nv50? It should largely work out, but
>>> there are some things you have to be careful about. The TGSI frontend
>>> generates IR that is capable of being processed by both the nv50 and
>>> nvc0 lowering/RA/emission logic, would want to ensure that an nir
>>> frontend would be able to do this too. If you don't have access to a
>>> Tesla-era GPU, I can act as a tester in a limited capacity.
>>>
>>
>> I have a tesla GPU.
>>
>>> Sounds like this is still all pretty experimental and has a lot of
>>> deep issues given the fail/crash count... IMHO not ready for merging.
>>> Also you really need to come up with a workable solution to the
>>> immediates issue.
>>>
>>
>> well I could just store them like it is done with TGSI and just put
>> loadImms where accessed, but this doesn't really fit the NIR logic
>> here. Maybe there is a NIR pass to move them around, so that the issue
>> is less significant. Or maybe I always check if the source contains a
>> const value and use loadImm instead of getting the stored immediate
>> value. Yeah I think the last idea would be less painful, we just end
>> up with more dead instructions after converting.
>
>
> What is the nature of the immediate problem?  We may have a similar issue.
>
>

 we don't do rescheduling, so all the immediates are at the top of the shader.