[Bug 92760] Add FP64 support to the i965 shader backends
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Wed Dec 2 22:52:03 PST 2015
https://bugs.freedesktop.org/show_bug.cgi?id=92760
--- Comment #22 from Iago Toral <itoral at igalia.com> ---
(In reply to Connor Abbott from comment #21)
> (In reply to Iago Toral from comment #20)
> > Connor, I have a question about the code we generate for the double unpack
> > x/y opcodes, which looks like this (in SIMD8):
> >
> > mov(8) g2<1>UD g18.1<8,4,2>UD { align1 1Q };
> > mov(8) g28<2>UD g18<8,4,2>UD { align1 1Q };
> >
> > Each of these instructions is intended to copy 4 UD elements from the source
> > the destination, however, the execution size of the instructions is set to
> > 8, not 4, which means that we have a a vertical dimension of 2 and we we are
> > actually operating on more data elements than we need. Shouldn't these two
> > instructions have an execution size of 4 instead?
>
> No, it's meant to copy 8 UD elements -- it's decomposing an 8-component fp64
> value, which takes up 2 SIMD8 registers, into 2 32-bit values, each of which
> takes up 1 SIMD8 register. Unlike in the vec4 backend, in FS fp64 values
> have to have the same number of components as fp32 values (since we always
> operate on 8 or 16 pixels/vertices/things at a time, and there's no way to
> decompose that in NIR), which means that they're twice as large.
Right, I figured this out after thinking about it for a while, I lost track
that this is in fact a DF operation done with integers and got confused.
> If you're
> wondering about why the destination has a stride of 2 in the second
> instruction... I don't have the rest of the assembly, but it's probably
> because we're going to operate on the upper bits and then reconstruct the
> double in g28, so we're avoiding another copy here by copying the low bits
> directly to g28. Nice job, optimizer!
Yeah, I was curious about that too and inspecting the assembly it does seem to
be what you suggest.
> >
> > The thing is that I tried to set the execution size to 4 (this also requires
> > that I set force_writemask_all to true) but that produces GPU hangs and some
> > regressions in the fp64 tests, so I guess I am missing something here...
>
> Well, I'm not sure why you got GPU hangs, but... meh. The current code is
> correct as-is. I wouldn't expect to find bugs in basic, used-all-the-time
> stuff like this, since I did get a large portion of the tests to pass
> already. The bugs exist, but they're not here :)
Agreed.
Thanks for all the feedback! I think I am starting to have a better grasp on
how things need to work with fp64 on the backend now, so hopefully I won't need
to keep bugging you too much :)
--
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20151203/a456d2bf/attachment.html>
More information about the intel-3d-bugs
mailing list