<html> <head> <base href="https://bugs.freedesktop.org/" /> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Add FP64 support to the i965 shader backends" href="https://bugs.freedesktop.org/show_bug.cgi?id=92760#c45">Comment # 45</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Add FP64 support to the i965 shader backends" href="https://bugs.freedesktop.org/show_bug.cgi?id=92760">bug 92760</a> from <a class="email" href="mailto:itoral@igalia.com" title="Iago Toral <itoral@igalia.com>"> Iago Toral</a> <pre>(In reply to Connor Abbott from <a href="show_bug.cgi?id=92760#c44">comment #44</a>) > (In reply to Jason Ekstrand from <a href="show_bug.cgi?id=92760#c43">comment #43</a>) > > (In reply to Iago Toral from <a href="show_bug.cgi?id=92760#c42">comment #42</a>) > > > (In reply to Iago Toral from <a href="show_bug.cgi?id=92760#c41">comment #41</a>) > > > > Hi Connor, I have a question about the brw_nir_split_doubles pass that you > > > > wrote for the vec4 backend. The pass does not lower nir_op_vec3/4 on purpose > > > > with this comment: > > > > > > > > /* These ops are the ones that group up dvec2's and doubles into dvec3's > > > > * and dvec4's when necessary, so we don't lower them. If they're > > > > * unnecessary, copy propagation will clean them up. > > > > */ > > > > > > > > However, this obviously leads to 64-bit instructions writing to channels ZW, > > > > which we don't want to have since our Nir->vec4 pass expects that any 64-bit > > > > operation won't have a writemask including channels other than XY. > > > > > > > > Right now, the lower_vec_to_movs pass that we run right after the > > > > nir_from_ssa pass seems to generate MOVs that write to each channel of the > > > > vecN instruction dest, so with this, it generates MOVs with 64-bit things > > > > that write to components Z and W of a dvec3/4. > > > > > > > > I suppose your idea was to break up ALU operations, then group them back as > > > > vec3/vec4 operations so we don't lose track of the original size of the data > > > > elements involved in the operations. If that is the case, I think we can > > > > disable lower_vec_to_movs() on dvec3/dvec4 and let the nir-vec4 pass handle > > > > those. Does this make sense to you? Did you have a different idea about how > > > > this should work? > > > > > > Or maybe you expected that the MOVs in lower_vec_to_movs would always be > > > coalesced so we would never really emit instructions to generate the vec3/4 > > > at all? This is not happening because of the presence of source modifiers in > > > the instructions that use the result of the vecN operation. I suppose we can > > > detect these cases and fix them by inserting a MOV to a temporary with the > > > source modifier and then rewriting the instruction to consume this instead > > > of the original value. > > > > I believe that was the intention. With full scalarizing, this works because > > every instruction reads or writes exactly one component by the time you're > > done. With breaking things into vec2's, you can have things cross channels > > and, as you observed, this creates problems. Adding movs seems like the > > right idea, but I think you want to add vecN instructions instead. That > > way, you get the nice property that each source of the vecN only reads one > > channel so the vec that combines results gets copy-propagated into the > > source of the vec that sets up the source. If you don't need the vecN on > > the source and can just swizzle (what you want), copy-prop should take care > > of it. > > I think that what I intended was actually to disable lower_vec_to_mov's for > dvec3/dvec4. I *think* this is essentially what you're saying. The issue > with representing this with MOV's is that some of the MOV's that > lower_vec_to_movs creates may straddle the boundary between the first and > second half of the dvec4, which would be more work to handle in the backend > than a single dvec3/4 operation with a separate source for each component. > This is pretty similar to how scalarizing works in the FS backend, with the > rub being that copy propagation can still decide to give you, say, a yz > swizzle of a dvec4 as a source. This is similar to how, in FS, you can get a > component of a larger vector as a source, except there we can just offset it > but here we have to do extra work if the source straddles the boundary. We > can still be guaranteed that all destinations are dvec2 or smaller, though, > since everything larger than a dvec2 is only going to be created by a load > op or a dvec3/4 op. Thank you both for the fast replies! Based on your comments it looks like I should be experimenting with breaking up the vec3/4 operations into vec2's in the NIR->vec4 translator, so I'll start there and see how that works out.</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the QA Contact for the bug.</li> </ul> </body> </html>