[Bug 92760] Add FP64 support to the i965 shader backends
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Tue Feb 2 00:53:01 PST 2016
https://bugs.freedesktop.org/show_bug.cgi?id=92760
--- Comment #45 from Iago Toral <itoral at igalia.com> ---
(In reply to Connor Abbott from comment #44)
> (In reply to Jason Ekstrand from comment #43)
> > (In reply to Iago Toral from comment #42)
> > > (In reply to Iago Toral from comment #41)
> > > > Hi Connor, I have a question about the brw_nir_split_doubles pass that you
> > > > wrote for the vec4 backend. The pass does not lower nir_op_vec3/4 on purpose
> > > > with this comment:
> > > >
> > > > /* These ops are the ones that group up dvec2's and doubles into dvec3's
> > > > * and dvec4's when necessary, so we don't lower them. If they're
> > > > * unnecessary, copy propagation will clean them up.
> > > > */
> > > >
> > > > However, this obviously leads to 64-bit instructions writing to channels ZW,
> > > > which we don't want to have since our Nir->vec4 pass expects that any 64-bit
> > > > operation won't have a writemask including channels other than XY.
> > > >
> > > > Right now, the lower_vec_to_movs pass that we run right after the
> > > > nir_from_ssa pass seems to generate MOVs that write to each channel of the
> > > > vecN instruction dest, so with this, it generates MOVs with 64-bit things
> > > > that write to components Z and W of a dvec3/4.
> > > >
> > > > I suppose your idea was to break up ALU operations, then group them back as
> > > > vec3/vec4 operations so we don't lose track of the original size of the data
> > > > elements involved in the operations. If that is the case, I think we can
> > > > disable lower_vec_to_movs() on dvec3/dvec4 and let the nir-vec4 pass handle
> > > > those. Does this make sense to you? Did you have a different idea about how
> > > > this should work?
> > >
> > > Or maybe you expected that the MOVs in lower_vec_to_movs would always be
> > > coalesced so we would never really emit instructions to generate the vec3/4
> > > at all? This is not happening because of the presence of source modifiers in
> > > the instructions that use the result of the vecN operation. I suppose we can
> > > detect these cases and fix them by inserting a MOV to a temporary with the
> > > source modifier and then rewriting the instruction to consume this instead
> > > of the original value.
> >
> > I believe that was the intention. With full scalarizing, this works because
> > every instruction reads or writes exactly one component by the time you're
> > done. With breaking things into vec2's, you can have things cross channels
> > and, as you observed, this creates problems. Adding movs seems like the
> > right idea, but I think you want to add vecN instructions instead. That
> > way, you get the nice property that each source of the vecN only reads one
> > channel so the vec that combines results gets copy-propagated into the
> > source of the vec that sets up the source. If you don't need the vecN on
> > the source and can just swizzle (what you want), copy-prop should take care
> > of it.
>
> I think that what I intended was actually to disable lower_vec_to_mov's for
> dvec3/dvec4. I *think* this is essentially what you're saying. The issue
> with representing this with MOV's is that some of the MOV's that
> lower_vec_to_movs creates may straddle the boundary between the first and
> second half of the dvec4, which would be more work to handle in the backend
> than a single dvec3/4 operation with a separate source for each component.
> This is pretty similar to how scalarizing works in the FS backend, with the
> rub being that copy propagation can still decide to give you, say, a yz
> swizzle of a dvec4 as a source. This is similar to how, in FS, you can get a
> component of a larger vector as a source, except there we can just offset it
> but here we have to do extra work if the source straddles the boundary. We
> can still be guaranteed that all destinations are dvec2 or smaller, though,
> since everything larger than a dvec2 is only going to be created by a load
> op or a dvec3/4 op.
Thank you both for the fast replies! Based on your comments it looks like I
should be experimenting with breaking up the vec3/4 operations into vec2's in
the NIR->vec4 translator, so I'll start there and see how that works out.
--
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20160202/f9b61282/attachment.html>
More information about the intel-3d-bugs
mailing list