<br><br><div class="gmail_quote">On Mon, Feb 22, 2010 at 1:39 PM, Siarhei Siamashka <span dir="ltr"><<a href="mailto:siarhei.siamashka@gmail.com">siarhei.siamashka@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="im">On Friday 19 February 2010, Luca Barbato wrote:<br>
> On 02/19/2010 12:57 PM, Siarhei Siamashka wrote:<br>
> > Adding small increments to the values at the end of loop iteration could<br>
> > be the biggest source of precision loss. Replacing this with explicit<br>
> > calculation like 'pdx = pdx0 + cx * n' should improve precision and maybe<br>
> > allow to use floats freely. And floats work better with SIMD on any<br>
> > platforms.<br>
><br>
> And all the SIMD we are covering have a multiply-accumulate instruction<br>
> that would be in use, I'm a bit more concerned about the sqrt usage<br>
> though...<br>
<br>
</div>The usage of sqrt is probably not a fatal performance problem.<br>
<br>
ARM11 VFP has a separate DS pipeline which can calculate divides or square<br>
roots simultaneously with the other operations. So it's only a matter of<br>
hiding very high square root calculation latency.<br>
<br>
ARM Cortex-A8 has special SIMD instructions intended to help calculating<br>
reciprocals and reciprocal square roots using Newton-Raphson method.<br>
<br>
SSE has SIMD instructions for calculating square roots.<br>
<br></blockquote><div><br>Note that SSE functions for square root and reciprocal are unfit for this task as they have<br>very little precision (12 bits). At least one Newton-Raphson iteration must be done to have<br>usable values.<br>
<br><br></div></div>