[Bug 92760] Add FP64 support to the i965 shader backends
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Tue Feb 16 07:41:20 UTC 2016
https://bugs.freedesktop.org/show_bug.cgi?id=92760
--- Comment #55 from Iago Toral <itoral at igalia.com> ---
(In reply to Jason Ekstrand from comment #54)
> (In reply to Iago Toral from comment #53)
> > (In reply to Iago Toral from comment #52)
> > > (In reply to Iago Toral from comment #51)
> > > > I noticed that nir_lower_locals_to_regs can insert MOVs of 64-bit things and
> > > > we need to catch these in our double splitting pass for the vec4 backend.
> > > > However, I am a bit confused here because nir_lower_locals_to_regs injects
> > > > nir_registers and not SSA definitions so the double splitting pass can't
> > > > handle the generated NIR after it at the moment:
> > > >
> > > > decl_reg vec4 64 r0[4]
> > > > (...)
> > > > vec4 64 ssa_6 = intrinsic load_ubo (ssa_0, ssa_5) () ()
> > > > r0[3] = imov ssa_6
> > > > (...)
> > > > vec4 64 ssa_12 = imov r0[0 + ssa_11]
> > > >
> > > > If this is correct and expected, then I guess we will have to amend the
> > > > double splitting pass to handle nir_registers as well, right?
> > >
> > > Or maybe we should split dvec3/4 loads into two dvec2 loads plus a 64-bit
> > > vec3/4 operation. So far I was working with the assumption that vecN
> > > operations and dvec loads where the two cases where the vec4 backend could
> > > see writes bigger than a dvec2. I actually implemented that for UBOs and
> > > SSBOs but seeing this, maybe it is better to split them into dvec2 loads.
> >
> > FYI, I went ahead and implemented this, but the problem persists, as you can
> > see here:
> >
> > vec2 64 ssa_6 = intrinsic load_ubo (ssa_0, ssa_5) () ()
> > vec1 32 ssa_7 = load_const (0x00000010 /* 0.000000 */)
> > vec1 32 ssa_8 = iadd ssa_5, ssa_7
> > vec2 64 ssa_9 = intrinsic load_ubo (ssa_0, ssa_8) () ()
> > vec4 64 ssa_10 = vec4 ssa_6, ssa_6.y, ssa_9, ssa_9.y
> > r0[3] = imov ssa_10
> >
> > so it seems like we really need to run the splitting pass after
> > nir_lower_locals_to_regs and make it handle nir_register destinations too.
> >
> > That said, are we still interested in breaking ubo/ssbo/shared-variable
> > loads in dvec2 chunks? That would make the backend implementation slightly
> > easier, but not too much. The current implementation I have can already
> > handle loads of dvec3/dvec4 without much effort:
> >
> > https://github.com/Igalia/mesa/commit/
> > efe6233e7bb00fc583d393058b559091e73729f9
>
> I think the real solution here is to make the back-end handle dvec4 movs.
> You wouldn't uaveto handle any other dvec4 ALU ops but MOV may be a special
> case We could try and do some sort of splitting operation on variables
> prior to running vads_to_regs, but I think that may not be worth it.
Yeah, that could be a way to go about this. I'll do that and see how it works.
Thanks Jason!
How about dvec3/4 loads (ubo, ssbo, shared variables)? Do we want to split
these or is it fine to handle them in the backend? I think they are easy enough
to handle in the backend (I am doing that already), but it is not difficult to
have NIR split them either.
--
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20160216/ec3ab6cf/attachment.html>
More information about the intel-3d-bugs
mailing list