[Mesa-dev] [PATCH v2 00/31] Nir support for Nouveau

Thu Jan 4 20:01:28 UTC 2018

On January 4, 2018 13:58:00 Karol Herbst <kherbst at redhat.com> wrote:

> On Thu, Jan 4, 2018 at 8:56 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On January 4, 2018 12:51:15 Karol Herbst <kherbst at redhat.com> wrote:
>>
>>> On Thu, Jan 4, 2018 at 7:06 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>
>>>> On Thu, Jan 4, 2018 at 10:01 AM, Karol Herbst <kherbst at redhat.com> wrote:
>>>>>
>>>>> significant changes to last series:
>>>>> * arb_gpu_shader5 interpolateat* (those nir ops don't map well to nvir)
>>>>>   no good plan on how to properly implement those
>>>>
>>>>
>>>> What's the issue? They should map as well as the TGSI ones. (Since the
>>>> TGSI ones are just the GLSL ones.)
>>>>
>>>
>>> it is a bit ugly, because usually all inputs vars are lowered away, so
>>> that they are inputs. So they need special handling;
>>>
>>> lowered (input is centroid):
>>> vec1 32 ssa_25 = intrinsic load_input (ssa_24) () (0, 0) /* base=0 */
>>> /* component=0 */ /* packed:centroid_qualified */
>>> vec1 32 ssa_27 = intrinsic load_input (ssa_26) () (0, 1) /* base=0 */
>>> /* component=1 */ /* packed:centroid_qualified */
>>>
>>> not lowered:
>>> decl_var  INTERP_MODE_NONE vec2 in at unqualified-temp
>>> vec2 32 ssa_11 = intrinsic interp_var_at_centroid () (in at unqualified-temp)
>>> ()
>>>
>>> I kind of wished I could have a load_input intrinsic with a flag or
>>> load_input_at_centroid, so that I end up with the same code in the
>>> end.
>>
>>
>> In i965, we use the NIR explicit input interpolation intrinsics.  I'm on my
>> phone so I can't give more details easily.
>>
>>>>> * arb_gpu_shader5.texturegatheroffsets (nir internal assert)
>>>>>   glsl_to_nir.cpp:2082: virtual void
>>>>> {anonymous}::nir_visitor::visit(ir_texture*): Assertion
>>>>> `ir->offset->type->is_vector() || ir->offset->type->is_scalar()' failed.
>>>>
>>>>
>>>> This is because nir doesn't support the 4-offset tg4 variant. This is
>>>> expected (by nir) to be lowered in GLSL to 4 separate gathers, but
>>>> isn't because nvc0 doesn't set the caps to make st/mesa do that.
>>>> Either set that cap based on whether NIR is used, or teach nir about
>>>> the 4-offset tg4 (which the nvidia hw supports directly btw).
>>>>
>>>
>>> well I would prefer the last one obviously, but nir gives me a
>>> nir_texop_tg4 in other tests, it is just those mentioned above where
>>> it fails.
>>
>>
>> I would prefer that as well.  There's no reason NIR can't support it so we
>> may as well add support.  We should also move the lowering from spirv_to_nit
>> to nir_lower_tex so that spirv_to_nir can give you the unlowered version you
>> want.
>>
>>
>>>>> * some int64 stuff related to compound types
>>>>
>>>>
>>>> As I mentioned, you either have to fix RA (I don't recommend this), or
>>>> you have to stop using 64-bit Value's for storage. Use 32-bit Value's,
>>>> and merge/split them all the time around 64-bit ops like the TGSI FE
>>>> does (which was implemented that way largely due to the way TGSI
>>>> works, but is a happy coincidence that it also works around some of
>>>> the RA shortcomings). And additionally you may need to improve the
>>>> merge splits pass to avoid some of the pain.
>>>>
>>>> You could also just disable int64 for now - it's not important.
>>>>
>>>>> * various extensions
>>>>> * variable-indexing (related to above mentioned packing issue)
>>>>> * glsl-4.20.execution.vs_in
>>>>> * some variable-indexing issues related to unaligned memory accesses
>>>>
>>>>
>>>> The variable-indexing stuff is extremely important to work out, since
>>>> it belies a fundamental problem in some approach to the conversion.
>>>>
>>>
>>> well the normal variable indexing stuff works if I disable
>>> nir_compact_varyings, which we might want to do anyway for nouveau for
>>> now. Or I teach memorOpt to not merge things for unaligned addresses.
>>>
>>> I have to take a more focused look at the fails anyway
>>>
>>>>> * some geometry shader fails
>>>>
>>>>
>>>> Have you done any testing with nv50? It should largely work out, but
>>>> there are some things you have to be careful about. The TGSI frontend
>>>> generates IR that is capable of being processed by both the nv50 and
>>>> nvc0 lowering/RA/emission logic, would want to ensure that an nir
>>>> frontend would be able to do this too. If you don't have access to a
>>>> Tesla-era GPU, I can act as a tester in a limited capacity.
>>>>
>>>
>>> I have a tesla GPU.
>>>
>>>> Sounds like this is still all pretty experimental and has a lot of
>>>> deep issues given the fail/crash count... IMHO not ready for merging.
>>>> Also you really need to come up with a workable solution to the
>>>> immediates issue.
>>>>
>>>
>>> well I could just store them like it is done with TGSI and just put
>>> loadImms where accessed, but this doesn't really fit the NIR logic
>>> here. Maybe there is a NIR pass to move them around, so that the issue
>>> is less significant. Or maybe I always check if the source contains a
>>> const value and use loadImm instead of getting the stored immediate
>>> value. Yeah I think the last idea would be less painful, we just end
>>> up with more dead instructions after converting.
>>
>>
>> What is the nature of the immediate problem?  We may have a similar issue.
>>
>>
>
>  we don't do rescheduling, so all the immediates are at the top of the shader.

Ah.  In that case, re-emitting seems like a reasonable pan.  You could also 
run global code motion though that may not get you quite what you want either.