[Bug 92760] Add FP64 support to the i965 shader backends

Mon Feb 1 08:22:53 PST 2016

https://bugs.freedesktop.org/show_bug.cgi?id=92760

--- Comment #44 from Connor Abbott <cwabbott0 at gmail.com> ---
(In reply to Jason Ekstrand from comment #43)
> (In reply to Iago Toral from comment #42)
> > (In reply to Iago Toral from comment #41)
> > > Hi Connor, I have a question about the brw_nir_split_doubles pass that you
> > > wrote for the vec4 backend. The pass does not lower nir_op_vec3/4 on purpose
> > > with this comment:
> > > 
> > > /* These ops are the ones that group up dvec2's and doubles into dvec3's
> > >  * and dvec4's when necessary, so we don't lower them. If they're
> > >  * unnecessary, copy propagation will clean them up.
> > >  */
> > > 
> > > However, this obviously leads to 64-bit instructions writing to channels ZW,
> > > which we don't want to have since our Nir->vec4 pass expects that any 64-bit
> > > operation won't have a writemask including channels other than XY.
> > > 
> > > Right now, the lower_vec_to_movs pass that we run right after the
> > > nir_from_ssa pass seems to generate MOVs that write to each channel of the
> > > vecN instruction dest, so with this, it generates MOVs with 64-bit things
> > > that write to components Z and W of a dvec3/4.
> > > 
> > > I suppose your idea was to break up ALU operations, then group them back as
> > > vec3/vec4 operations so we don't lose track of the original size of the data
> > > elements involved in the operations. If that is the case, I think we can
> > > disable lower_vec_to_movs() on dvec3/dvec4 and let the nir-vec4 pass handle
> > > those. Does this make sense to you? Did you have a different idea about how
> > > this should work?
> > 
> > Or maybe you expected that the MOVs in lower_vec_to_movs would always be
> > coalesced so we would never really emit instructions to generate the vec3/4
> > at all? This is not happening because of the presence of source modifiers in
> > the instructions that use the result of the vecN operation. I suppose we can
> > detect these cases and fix them by inserting a MOV to a temporary with the
> > source modifier and then rewriting the instruction to consume this instead
> > of the original value.
> 
> I believe that was the intention.  With full scalarizing, this works because
> every instruction reads or writes exactly one component by the time you're
> done.  With breaking things into vec2's, you can have things cross channels
> and, as you observed, this creates problems.  Adding movs seems like the
> right idea, but I think you want to add vecN instructions instead.  That
> way, you get the nice property that each source of the vecN only reads one
> channel so the vec that combines results gets copy-propagated into the
> source of the vec that sets up the source.  If you don't need the vecN on
> the source and can just swizzle (what you want), copy-prop should take care
> of it.

I think that what I intended was actually to disable lower_vec_to_mov's for
dvec3/dvec4. I *think* this is essentially what you're saying. The issue with
representing this with MOV's is that some of the MOV's that lower_vec_to_movs
creates may straddle the boundary between the first and second half of the
dvec4, which would be more work to handle in the backend than a single dvec3/4
operation with a separate source for each component. This is pretty similar to
how scalarizing works in the FS backend, with the rub being that copy
propagation can still decide to give you, say, a yz swizzle of a dvec4 as a
source. This is similar to how, in FS, you can get a component of a larger
vector as a source, except there we can just offset it but here we have to do
extra work if the source straddles the boundary. We can still be guaranteed
that all destinations are dvec2 or smaller, though, since everything larger
than a dvec2 is only going to be created by a load op or a dvec3/4 op.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20160201/7d4bdcd4/attachment.html>