[Bug 92760] Add FP64 support to the i965 shader backends

Tue Dec 1 18:23:49 PST 2015

https://bugs.freedesktop.org/show_bug.cgi?id=92760

--- Comment #19 from Connor Abbott <cwabbott0 at gmail.com> ---
(In reply to Iago Toral from comment #17)
> Heu Connor, thanks a lot for looking into this:
> 
> (In reply to Connor Abbott from comment #15)
> > Hey, since you posted a link to a branch with some updated patches, I
> > noticed a few things:
> > 
> > 1. Your patch " i965/fs: Fix register regioning for fp64" isn't correct. In
> > particular, it's not correct that "The size of the type is irrelevant, since
> > we manipulate DF as F with stride 2." The stride of a register is in units
> > of the typesize, and we don't "manipulate DF as F with stride 2" anywhere at
> > all in the backend (you can check for yourself!), so we need to take the
> > type size into account when figuring out if the source crosses a SIMD8
> > register boundary. I suspect this is just papering over bugs due to
> > reverting "XXX remove exec size hack" -- the right thing to do would be to
> > fix the exec_size of any instructions we create that would otherwise hit
> > this hack so we can remove it. The other possibility I can think of is that
> > the crashes are due to Matt's regioning validation falling over, in which
> > case you'll need to fix that. In any case, as-is you're going to produce a
> > region of <8,8,1> for doubles which is incorrect -- you want a region of
> > <4,4,1> since only the vertical stride can cross the SIMD8 register
> > boundary, which my original i965-fp64-v3 branch does correctly.
> 
> Ok, I actually created a new testing branch because I wasn't too sure about
> this patch yet. The thing that confused me is that text from the BDW docs:
> 
> "In Align1 mode, all regioning parameters must use the syntax of a pair of
> packed floats, including channel selects and channel enables.
> 
> // Example:
> mov (8) r10.0.xyzw:df r11.0.xyzw:df
> // The above instruction moves four double floats. The .x picks the
> // low 32 bits and the .y picks the high 32 bits of the double float."
> 
> This talks about align1 mode, but it actually seems to mean Align16...

Yeah, seems like a mistake. The PRM's can be... less than reliable at times.

> I am
> surprised that this actually fixed so much stuff and did not seem to create
> any trouble, I was thinking that if we are producing invalid regioning
> parameters we would either hang the GPU or at least see incorrect results
> somewhere...

While working on bringing stuff up, I also ran into a lot of scenarios like
that. "The test shows green, so it must work!" At one point, I thought that you
actually could do sqrt/round/etc. on doubles, but it turned out that I was just
doing double comparisons wrong :) and I only wrote the patch that yours
(mostly) reverts after hitting an assertion in the simulator. A stride of
<8,8,1> may work, but it's much safer to follow the documented restrictions in
the PRM (unless the PRM is just plain wrong!). It's a little unfortunate that
Matt's assembly validator didn't catch this... it might be nice to fix it up to
test for this.

> 
> Anyway, I think we should go back to trying to remove the exec_size hack as
> you suggest before trying to fix more tests, otherwise it seems that we are
> likely to be papering over broken things.
> 
> > 2. "i965/fs: fix ceil/floor for doubles" -- sorry, you're not going to get
> > off that easily :) You can't use floats here, since you're going to run into
> > range and precision issues. If the piglit tests don't test for those, then
> > they need to be fixed. Since we don't have HW support for
> > ceil/floor/trunc/etc., you're going to need to implement them using integer
> > operations. I was planning on doing this in nir_lower_double_ops.c. I think
> > trunc is the easiest to implement using integer ops -- you don't need to
> > worry about the sign, you just have to mask off the appropriate number of
> > bits in the mantissa based on the exponent -- but the rub is that 64-bit
> > integer arithmetic isn't implemented yet in NIR or the backend, and it
> > doesn't even exist prior to Broadwell, so you have to split the double into
> > 32-bit chunks and operate on those. Then once you've implemented trunc, you
> > can implement all the other missing operations in terms  of it. Have fun :)
> 
> Yeah, I noticed that a bit later, piglit tests don't really check for this,
> so they need to be fixed, there is only one test for trunc that catches this
> kind of thing. I'll dig more, thanks!

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20151202/fb119107/attachment.html>