<html> <head> <base href="https://bugs.freedesktop.org/" /> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Add FP64 support to the i965 shader backends" href="https://bugs.freedesktop.org/show_bug.cgi?id=92760#c22">Comment # 22</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Add FP64 support to the i965 shader backends" href="https://bugs.freedesktop.org/show_bug.cgi?id=92760">bug 92760</a> from <a class="email" href="mailto:itoral@igalia.com" title="Iago Toral <itoral@igalia.com>"> Iago Toral</a> <pre>(In reply to Connor Abbott from <a href="show_bug.cgi?id=92760#c21">comment #21</a>) > (In reply to Iago Toral from <a href="show_bug.cgi?id=92760#c20">comment #20</a>) > > Connor, I have a question about the code we generate for the double unpack > > x/y opcodes, which looks like this (in SIMD8): > > > > mov(8) g2<1>UD g18.1<8,4,2>UD { align1 1Q }; > > mov(8) g28<2>UD g18<8,4,2>UD { align1 1Q }; > > > > Each of these instructions is intended to copy 4 UD elements from the source > > the destination, however, the execution size of the instructions is set to > > 8, not 4, which means that we have a a vertical dimension of 2 and we we are > > actually operating on more data elements than we need. Shouldn't these two > > instructions have an execution size of 4 instead? > > No, it's meant to copy 8 UD elements -- it's decomposing an 8-component fp64 > value, which takes up 2 SIMD8 registers, into 2 32-bit values, each of which > takes up 1 SIMD8 register. Unlike in the vec4 backend, in FS fp64 values > have to have the same number of components as fp32 values (since we always > operate on 8 or 16 pixels/vertices/things at a time, and there's no way to > decompose that in NIR), which means that they're twice as large. Right, I figured this out after thinking about it for a while, I lost track that this is in fact a DF operation done with integers and got confused. > If you're > wondering about why the destination has a stride of 2 in the second > instruction... I don't have the rest of the assembly, but it's probably > because we're going to operate on the upper bits and then reconstruct the > double in g28, so we're avoiding another copy here by copying the low bits > directly to g28. Nice job, optimizer! Yeah, I was curious about that too and inspecting the assembly it does seem to be what you suggest. > > > > The thing is that I tried to set the execution size to 4 (this also requires > > that I set force_writemask_all to true) but that produces GPU hangs and some > > regressions in the fp64 tests, so I guess I am missing something here... > > Well, I'm not sure why you got GPU hangs, but... meh. The current code is > correct as-is. I wouldn't expect to find bugs in basic, used-all-the-time > stuff like this, since I did get a large portion of the tests to pass > already. The bugs exist, but they're not here :) Agreed. Thanks for all the feedback! I think I am starting to have a better grasp on how things need to work with fp64 on the backend now, so hopefully I won't need to keep bugging you too much :)</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the QA Contact for the bug.</li> </ul> </body> </html>