<html> <head> <base href="https://bugs.freedesktop.org/" /> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Add FP64 support to the i965 shader backends" href="https://bugs.freedesktop.org/show_bug.cgi?id=92760#c53">Comment # 53</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Add FP64 support to the i965 shader backends" href="https://bugs.freedesktop.org/show_bug.cgi?id=92760">bug 92760</a> from <a class="email" href="mailto:itoral@igalia.com" title="Iago Toral <itoral@igalia.com>"> Iago Toral</a> <pre>(In reply to Iago Toral from <a href="show_bug.cgi?id=92760#c52">comment #52</a>) > (In reply to Iago Toral from <a href="show_bug.cgi?id=92760#c51">comment #51</a>) > > I noticed that nir_lower_locals_to_regs can insert MOVs of 64-bit things and > > we need to catch these in our double splitting pass for the vec4 backend. > > However, I am a bit confused here because nir_lower_locals_to_regs injects > > nir_registers and not SSA definitions so the double splitting pass can't > > handle the generated NIR after it at the moment: > > > > decl_reg vec4 64 r0[4] > > (...) > > vec4 64 ssa_6 = intrinsic load_ubo (ssa_0, ssa_5) () () > > r0[3] = imov ssa_6 > > (...) > > vec4 64 ssa_12 = imov r0[0 + ssa_11] > > > > If this is correct and expected, then I guess we will have to amend the > > double splitting pass to handle nir_registers as well, right? > > Or maybe we should split dvec3/4 loads into two dvec2 loads plus a 64-bit > vec3/4 operation. So far I was working with the assumption that vecN > operations and dvec loads where the two cases where the vec4 backend could > see writes bigger than a dvec2. I actually implemented that for UBOs and > SSBOs but seeing this, maybe it is better to split them into dvec2 loads. FYI, I went ahead and implemented this, but the problem persists, as you can see here: vec2 64 ssa_6 = intrinsic load_ubo (ssa_0, ssa_5) () () vec1 32 ssa_7 = load_const (0x00000010 /* 0.000000 */) vec1 32 ssa_8 = iadd ssa_5, ssa_7 vec2 64 ssa_9 = intrinsic load_ubo (ssa_0, ssa_8) () () vec4 64 ssa_10 = vec4 ssa_6, ssa_6.y, ssa_9, ssa_9.y r0[3] = imov ssa_10 so it seems like we really need to run the splitting pass after nir_lower_locals_to_regs and make it handle nir_register destinations too. That said, are we still interested in breaking ubo/ssbo/shared-variable loads in dvec2 chunks? That would make the backend implementation slightly easier, but not too much. The current implementation I have can already handle loads of dvec3/dvec4 without much effort: <a href="https://github.com/Igalia/mesa/commit/efe6233e7bb00fc583d393058b559091e73729f9">https://github.com/Igalia/mesa/commit/efe6233e7bb00fc583d393058b559091e73729f9</a></pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the QA Contact for the bug.</li> </ul> </body> </html>