[Bug 92760] Add FP64 support to the i965 shader backends

Wed Mar 2 07:32:41 UTC 2016

https://bugs.freedesktop.org/show_bug.cgi?id=92760

--- Comment #62 from Iago Toral <itoral at igalia.com> ---
(In reply to Jason Ekstrand from comment #60)
> (In reply to Iago Toral from comment #58)
> > (In reply to Jason Ekstrand from comment #30)
> > > Created attachment 120957 [details]
> > > NIR indirect lowering pass
> > > 
> > > (In reply to Iago Toral from comment #29)
> > > > Hey Jason/Connor,
> > > > 
> > > > the lowering of trunc for doubles has some code that looks like this
> > > > (pseudo-code):
> > > > 
> > > > if (exponent < 0) {
> > > >    mask = 0x0
> > > > } else if (exponent > 52) {
> > > >    mask = 0x7fffffffffffffff;
> > > > } else {
> > > >    /* This is a 64-bit integer op, needs to be split into hi/lo 32-bit ops */
> > > >    mask =  (1LL << frac_bits) - 1;
> > > > }
> > > > 
> > > > The current implementation I have works fine using bcsel. It looks something
> > > > like this (again, pseudo-code):
> > > > 
> > > > mask = bcsel(exponent < 0,
> > > >              0x7fffffffffffffff,
> > > >              bcsel(exponent > 52,
> > > >                    0x0000000000000000,
> > > >                    (1LL << frac_bits) -1))
> > > > 
> > > > My problem with this is that "(1LL << frac_bits) - 1" is a 64-bit integer
> > > > operation that we have to implement in terms of hi/lo 32-bit integer
> > > > operations (at least until we support 64-bit integers), so it is really a
> > > > bunch of instructions. Because I use bcsel, it means that we generate code
> > > > for that even if exponent is not in [1..51], which is not ideal.
> > > 
> > > Right.  I would encourage you not to use if's too much because branching may
> > > be more expensive than bcsel depending on what paths different invocations
> > > take.  However, if one side of the if is overwhelmingly more likely than the
> > > other, then control-flow is probably a good idea.
> > 
> > I have been revisiting this. Because if statements in NIR are strictly
> > scalar, this lowering needs to be scalarized as well. I wonder if the
> > scalarized code resulting of this defeats the purpose of using the if
> > statement for the vec4 backend, since we lose the ability to use vector
> > instructions.
> > 
> > Some quick experiments with a simple trunc() test show these results
> > (#instructions):
> > 
> > backed         bcsel      if (unscalarized)     if (scalarized)
> > ----------------------------------------------------------------
> > vec4            65              69                   102
> > fs (simd8)      67              85                    85
> > fs (simd16)     95             119                   119
> > 
> > bcsel implementations have less overall instructions as expected, although
> > as discussed before, if implementations may be better in some cases since
> > they might end up executing less instructions in some cases. However, it is
> > clear that the required scalarization for the if statement in the vec4
> > backend makes things much worse, to a point that  I am not sure any more
> > that this is a win in this scenario.
> > 
> > So we have 2 options again:
> > 
> > 1) Go back to the bcsel implementation for both backends.
> > 2) Pass an is_scalar flag to the lowering pass, choose the bcsel
> > implementation for non scalar backends and the scalarized if implementation
> > for scalar.
> > 
> > 2) _might_ be better overall from a performance standpoint but I wonder if
> > it is worth having two different implementations of this. This decision
> > would also affect the implementation of roundEven().
> > 
> > What do you think?
> 
> I'm sorry that it has taken so long for me to get back to you.  We've had a
> lot going on at the office lately.

Don't worry at all, we were not blocked on this. We have both implementations
so it is only a matter of deciding which one to go with.

> I think (1) is probably the best option for a couple of reasons:
> 
>  a) Simplicity, as you mentioned, and the ability to vectorize in vec4.
> 
>  b) Even in simd8 mode, you are executing 8 threads at a time and you will
> only actually skip instructions if all 8 threads take the same side of the
> branch.  While this may happen fairly frequently depending on the context,
> there's still a decent chance that all of the instructions will get executed
> anyway.
> 
>  c) Even if (b) isn't a problem, using bcsel gives us much more freedom when
> scheduling instructions.  Any sort of control flow acts as an instruction
> scheduling barrier and prevents us from moving things around.  This is a
> problem for both register pressure (causing spilling) and instruction
> latency.  If we use bcsel, there's a decent chance that we can move
> instructions around and hide latency enough to make up for the extra
> instructions being executed.

Yep, that makes a good case for 1), let's go with that.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20160302/8d68fdcc/attachment.html>