[Mesa-dev] [PATCH v2 00/31] Nir support for Nouveau

Thu Jan 4 18:50:40 UTC 2018

On Thu, Jan 4, 2018 at 7:06 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Thu, Jan 4, 2018 at 10:01 AM, Karol Herbst <kherbst at redhat.com> wrote:
>> significant changes to last series:
>> * arb_gpu_shader5 interpolateat* (those nir ops don't map well to nvir)
>>   no good plan on how to properly implement those
>
> What's the issue? They should map as well as the TGSI ones. (Since the
> TGSI ones are just the GLSL ones.)
>

it is a bit ugly, because usually all inputs vars are lowered away, so
that they are inputs. So they need special handling;

lowered (input is centroid):
vec1 32 ssa_25 = intrinsic load_input (ssa_24) () (0, 0) /* base=0 */
/* component=0 */ /* packed:centroid_qualified */
vec1 32 ssa_27 = intrinsic load_input (ssa_26) () (0, 1) /* base=0 */
/* component=1 */ /* packed:centroid_qualified */

not lowered:
decl_var  INTERP_MODE_NONE vec2 in at unqualified-temp
vec2 32 ssa_11 = intrinsic interp_var_at_centroid () (in at unqualified-temp) ()

I kind of wished I could have a load_input intrinsic with a flag or
load_input_at_centroid, so that I end up with the same code in the
end.

>> * arb_gpu_shader5.texturegatheroffsets (nir internal assert)
>>   glsl_to_nir.cpp:2082: virtual void {anonymous}::nir_visitor::visit(ir_texture*): Assertion `ir->offset->type->is_vector() || ir->offset->type->is_scalar()' failed.
>
> This is because nir doesn't support the 4-offset tg4 variant. This is
> expected (by nir) to be lowered in GLSL to 4 separate gathers, but
> isn't because nvc0 doesn't set the caps to make st/mesa do that.
> Either set that cap based on whether NIR is used, or teach nir about
> the 4-offset tg4 (which the nvidia hw supports directly btw).
>

well I would prefer the last one obviously, but nir gives me a
nir_texop_tg4 in other tests, it is just those mentioned above where
it fails.

>> * some int64 stuff related to compound types
>
> As I mentioned, you either have to fix RA (I don't recommend this), or
> you have to stop using 64-bit Value's for storage. Use 32-bit Value's,
> and merge/split them all the time around 64-bit ops like the TGSI FE
> does (which was implemented that way largely due to the way TGSI
> works, but is a happy coincidence that it also works around some of
> the RA shortcomings). And additionally you may need to improve the
> merge splits pass to avoid some of the pain.
>
> You could also just disable int64 for now - it's not important.
>
>> * various extensions
>> * variable-indexing (related to above mentioned packing issue)
>> * glsl-4.20.execution.vs_in
>> * some variable-indexing issues related to unaligned memory accesses
>
> The variable-indexing stuff is extremely important to work out, since
> it belies a fundamental problem in some approach to the conversion.
>

well the normal variable indexing stuff works if I disable
nir_compact_varyings, which we might want to do anyway for nouveau for
now. Or I teach memorOpt to not merge things for unaligned addresses.

I have to take a more focused look at the fails anyway

>> * some geometry shader fails
>
> Have you done any testing with nv50? It should largely work out, but
> there are some things you have to be careful about. The TGSI frontend
> generates IR that is capable of being processed by both the nv50 and
> nvc0 lowering/RA/emission logic, would want to ensure that an nir
> frontend would be able to do this too. If you don't have access to a
> Tesla-era GPU, I can act as a tester in a limited capacity.
>

I have a tesla GPU.

> Sounds like this is still all pretty experimental and has a lot of
> deep issues given the fail/crash count... IMHO not ready for merging.
> Also you really need to come up with a workable solution to the
> immediates issue.
>

well I could just store them like it is done with TGSI and just put
loadImms where accessed, but this doesn't really fit the NIR logic
here. Maybe there is a NIR pass to move them around, so that the issue
is less significant. Or maybe I always check if the source contains a
const value and use loadImm instead of getting the stored immediate
value. Yeah I think the last idea would be less painful, we just end
up with more dead instructions after converting.

>   -ilia