[Mesa-dev] [PATCH] swr/rast: Use gather instruction for i32gather_ps on simd16/avx512
Cherniak, Bruce
bruce.cherniak at intel.com
Tue Nov 14 17:36:33 UTC 2017
Reviewed-by: Bruce Cherniak <bruce.cherniak at intel.com>
> On Nov 13, 2017, at 8:03 PM, Tim Rowley <timothy.o.rowley at intel.com> wrote:
>
> Speed up avx512 platforms; fixes performance regression caused
> by swithc to simdlib.
>
> Cc: mesa-stable at lists.freedesktop.org
> ---
> .../drivers/swr/rasterizer/common/simdlib_512_avx512.inl | 12 +-----------
> 1 file changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/src/gallium/drivers/swr/rasterizer/common/simdlib_512_avx512.inl b/src/gallium/drivers/swr/rasterizer/common/simdlib_512_avx512.inl
> index 95e4c31909..c13b9f616a 100644
> --- a/src/gallium/drivers/swr/rasterizer/common/simdlib_512_avx512.inl
> +++ b/src/gallium/drivers/swr/rasterizer/common/simdlib_512_avx512.inl
> @@ -484,17 +484,7 @@ SIMD_WRAPPER_2(unpacklo_ps);
> template<ScaleFactor ScaleT>
> static SIMDINLINE Float SIMDCALL i32gather_ps(float const* p, Integer idx) // return *(float*)(((int8*)p) + (idx * ScaleT))
> {
> - uint32_t *pOffsets = (uint32_t*)&idx;
> - Float vResult;
> - float* pResult = (float*)&vResult;
> - for (uint32_t i = 0; i < SIMD_WIDTH; ++i)
> - {
> - uint32_t offset = pOffsets[i];
> - offset = offset * static_cast<uint32_t>(ScaleT);
> - pResult[i] = *(float const*)(((uint8_t const*)p + offset));
> - }
> -
> - return vResult;
> + return _mm512_i32gather_ps(idx, p, static_cast<int>(ScaleT));
> }
>
> static SIMDINLINE Float SIMDCALL load1_ps(float const *p) // return *p (broadcast 1 value to all elements)
> --
> 2.14.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list