[Bug 92760] Add FP64 support to the i965 shader backends

Tue Feb 23 08:13:50 UTC 2016

https://bugs.freedesktop.org/show_bug.cgi?id=92760

--- Comment #58 from Iago Toral <itoral at igalia.com> ---
(In reply to Jason Ekstrand from comment #30)
> Created attachment 120957 [details]
> NIR indirect lowering pass
> 
> (In reply to Iago Toral from comment #29)
> > Hey Jason/Connor,
> > 
> > the lowering of trunc for doubles has some code that looks like this
> > (pseudo-code):
> > 
> > if (exponent < 0) {
> >    mask = 0x0
> > } else if (exponent > 52) {
> >    mask = 0x7fffffffffffffff;
> > } else {
> >    /* This is a 64-bit integer op, needs to be split into hi/lo 32-bit ops */
> >    mask =  (1LL << frac_bits) - 1;
> > }
> > 
> > The current implementation I have works fine using bcsel. It looks something
> > like this (again, pseudo-code):
> > 
> > mask = bcsel(exponent < 0,
> >              0x7fffffffffffffff,
> >              bcsel(exponent > 52,
> >                    0x0000000000000000,
> >                    (1LL << frac_bits) -1))
> > 
> > My problem with this is that "(1LL << frac_bits) - 1" is a 64-bit integer
> > operation that we have to implement in terms of hi/lo 32-bit integer
> > operations (at least until we support 64-bit integers), so it is really a
> > bunch of instructions. Because I use bcsel, it means that we generate code
> > for that even if exponent is not in [1..51], which is not ideal.
> 
> Right.  I would encourage you not to use if's too much because branching may
> be more expensive than bcsel depending on what paths different invocations
> take.  However, if one side of the if is overwhelmingly more likely than the
> other, then control-flow is probably a good idea.

I have been revisiting this. Because if statements in NIR are strictly scalar,
this lowering needs to be scalarized as well. I wonder if the scalarized code
resulting of this defeats the purpose of using the if statement for the vec4
backend, since we lose the ability to use vector instructions.

Some quick experiments with a simple trunc() test show these results
(#instructions):

backed         bcsel      if (unscalarized)     if (scalarized)
----------------------------------------------------------------
vec4            65              69                   102
fs (simd8)      67              85                    85
fs (simd16)     95             119                   119

bcsel implementations have less overall instructions as expected, although as
discussed before, if implementations may be better in some cases since they
might end up executing less instructions in some cases. However, it is clear
that the required scalarization for the if statement in the vec4 backend makes
things much worse, to a point that  I am not sure any more that this is a win
in this scenario.

So we have 2 options again:

1) Go back to the bcsel implementation for both backends.
2) Pass an is_scalar flag to the lowering pass, choose the bcsel implementation
for non scalar backends and the scalarized if implementation for scalar.

2) _might_ be better overall from a performance standpoint but I wonder if it
is worth having two different implementations of this. This decision would also
affect the implementation of roundEven().

What do you think?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20160223/48825dbc/attachment-0001.html>