[Bug 92760] Add FP64 support to the i965 shader backends

Mon Nov 30 23:02:39 PST 2015

https://bugs.freedesktop.org/show_bug.cgi?id=92760

--- Comment #17 from Iago Toral <itoral at igalia.com> ---
Heu Connor, thanks a lot for looking into this:

(In reply to Connor Abbott from comment #15)
> Hey, since you posted a link to a branch with some updated patches, I
> noticed a few things:
> 
> 1. Your patch " i965/fs: Fix register regioning for fp64" isn't correct. In
> particular, it's not correct that "The size of the type is irrelevant, since
> we manipulate DF as F with stride 2." The stride of a register is in units
> of the typesize, and we don't "manipulate DF as F with stride 2" anywhere at
> all in the backend (you can check for yourself!), so we need to take the
> type size into account when figuring out if the source crosses a SIMD8
> register boundary. I suspect this is just papering over bugs due to
> reverting "XXX remove exec size hack" -- the right thing to do would be to
> fix the exec_size of any instructions we create that would otherwise hit
> this hack so we can remove it. The other possibility I can think of is that
> the crashes are due to Matt's regioning validation falling over, in which
> case you'll need to fix that. In any case, as-is you're going to produce a
> region of <8,8,1> for doubles which is incorrect -- you want a region of
> <4,4,1> since only the vertical stride can cross the SIMD8 register
> boundary, which my original i965-fp64-v3 branch does correctly.

Ok, I actually created a new testing branch because I wasn't too sure about
this patch yet. The thing that confused me is that text from the BDW docs:

"In Align1 mode, all regioning parameters must use the syntax of a pair of
packed floats, including channel selects and channel enables.

// Example:
mov (8) r10.0.xyzw:df r11.0.xyzw:df
// The above instruction moves four double floats. The .x picks the
// low 32 bits and the .y picks the high 32 bits of the double float."

This talks about align1 mode, but it actually seems to mean Align16... I am
surprised that this actually fixed so much stuff and did not seem to create any
trouble, I was thinking that if we are producing invalid regioning parameters
we would either hang the GPU or at least see incorrect results somewhere...

Anyway, I think we should go back to trying to remove the exec_size hack as you
suggest before trying to fix more tests, otherwise it seems that we are likely
to be papering over broken things.

> 2. "i965/fs: fix ceil/floor for doubles" -- sorry, you're not going to get
> off that easily :) You can't use floats here, since you're going to run into
> range and precision issues. If the piglit tests don't test for those, then
> they need to be fixed. Since we don't have HW support for
> ceil/floor/trunc/etc., you're going to need to implement them using integer
> operations. I was planning on doing this in nir_lower_double_ops.c. I think
> trunc is the easiest to implement using integer ops -- you don't need to
> worry about the sign, you just have to mask off the appropriate number of
> bits in the mantissa based on the exponent -- but the rub is that 64-bit
> integer arithmetic isn't implemented yet in NIR or the backend, and it
> doesn't even exist prior to Broadwell, so you have to split the double into
> 32-bit chunks and operate on those. Then once you've implemented trunc, you
> can implement all the other missing operations in terms  of it. Have fun :)

Yeah, I noticed that a bit later, piglit tests don't really check for this, so
they need to be fixed, there is only one test for trunc that catches this kind
of thing. I'll dig more, thanks!

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20151201/8fba2fe4/attachment-0001.html>