<html> <head> <base href="https://bugs.freedesktop.org/" /> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Add FP64 support to the i965 shader backends" href="https://bugs.freedesktop.org/show_bug.cgi?id=92760#c21">Comment # 21</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Add FP64 support to the i965 shader backends" href="https://bugs.freedesktop.org/show_bug.cgi?id=92760">bug 92760</a> from <a class="email" href="mailto:cwabbott0@gmail.com" title="Connor Abbott <cwabbott0@gmail.com>"> Connor Abbott</a> <pre>(In reply to Iago Toral from <a href="show_bug.cgi?id=92760#c20">comment #20</a>) > Connor, I have a question about the code we generate for the double unpack > x/y opcodes, which looks like this (in SIMD8): > > mov(8) g2<1>UD g18.1<8,4,2>UD { align1 1Q }; > mov(8) g28<2>UD g18<8,4,2>UD { align1 1Q }; > > Each of these instructions is intended to copy 4 UD elements from the source > the destination, however, the execution size of the instructions is set to > 8, not 4, which means that we have a a vertical dimension of 2 and we we are > actually operating on more data elements than we need. Shouldn't these two > instructions have an execution size of 4 instead? No, it's meant to copy 8 UD elements -- it's decomposing an 8-component fp64 value, which takes up 2 SIMD8 registers, into 2 32-bit values, each of which takes up 1 SIMD8 register. Unlike in the vec4 backend, in FS fp64 values have to have the same number of components as fp32 values (since we always operate on 8 or 16 pixels/vertices/things at a time, and there's no way to decompose that in NIR), which means that they're twice as large. If you're wondering about why the destination has a stride of 2 in the second instruction... I don't have the rest of the assembly, but it's probably because we're going to operate on the upper bits and then reconstruct the double in g28, so we're avoiding another copy here by copying the low bits directly to g28. Nice job, optimizer! > > The thing is that I tried to set the execution size to 4 (this also requires > that I set force_writemask_all to true) but that produces GPU hangs and some > regressions in the fp64 tests, so I guess I am missing something here... Well, I'm not sure why you got GPU hangs, but... meh. The current code is correct as-is. I wouldn't expect to find bugs in basic, used-all-the-time stuff like this, since I did get a large portion of the tests to pass already. The bugs exist, but they're not here :)</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the QA Contact for the bug.</li> </ul> </body> </html>