[Intel-gfx] [PATCH 1/7] drm: Relax alignment constraint for destination address
Lucas De Marchi
lucas.demarchi at intel.com
Tue Mar 1 07:28:31 UTC 2022
On Tue, Feb 22, 2022 at 08:22:00PM +0530, Balasubramani Vivekanandan wrote:
>There is no need for the destination address to be aligned to 16 byte
>boundary to be able to use the non-temporal instructions while copying.
>Non-temporal instructions are used only for loading from the source
>address which has alignment constraints.
>We only need to take care of using the right instructions, based on
>whether destination address is aligned or not, while storing the data to
>the destination address.
>
>__memcpy_ntdqu is copied from i915/i915_memcpy.c
>
>Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>Cc: Maxime Ripard <mripard at kernel.org>
>Cc: Thomas Zimmermann <tzimmermann at suse.de>
>Cc: David Airlie <airlied at linux.ie>
>Cc: Daniel Vetter <daniel at ffwll.ch>
>Cc: Chris Wilson <chris.p.wilson at intel.com>
>
>Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan at intel.com>
>---
> drivers/gpu/drm/drm_cache.c | 44 ++++++++++++++++++++++++++++++++-----
> 1 file changed, 38 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
>index c3e6e615bf09..a21c1350eb09 100644
>--- a/drivers/gpu/drm/drm_cache.c
>+++ b/drivers/gpu/drm/drm_cache.c
>@@ -278,18 +278,50 @@ static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
> kernel_fpu_end();
> }
>
>+static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len)
>+{
>+ kernel_fpu_begin();
>+
>+ while (len >= 4) {
>+ asm("movntdqa (%0), %%xmm0\n"
>+ "movntdqa 16(%0), %%xmm1\n"
>+ "movntdqa 32(%0), %%xmm2\n"
>+ "movntdqa 48(%0), %%xmm3\n"
>+ "movups %%xmm0, (%1)\n"
>+ "movups %%xmm1, 16(%1)\n"
>+ "movups %%xmm2, 32(%1)\n"
>+ "movups %%xmm3, 48(%1)\n"
>+ :: "r" (src), "r" (dst) : "memory");
>+ src += 64;
>+ dst += 64;
>+ len -= 4;
>+ }
>+ while (len--) {
>+ asm("movntdqa (%0), %%xmm0\n"
>+ "movups %%xmm0, (%1)\n"
>+ :: "r" (src), "r" (dst) : "memory");
>+ src += 16;
>+ dst += 16;
ok, this takes care of the tail
>+ }
>+
>+ kernel_fpu_end();
>+}
>+
> /*
> * __drm_memcpy_from_wc copies @len bytes from @src to @dst using
>- * non-temporal instructions where available. Note that all arguments
>- * (@src, @dst) must be aligned to 16 bytes and @len must be a multiple
>- * of 16.
>+ * non-temporal instructions where available. Note that @src must be aligned to
>+ * 16 bytes and @len must be a multiple of 16.
> */
> static void __drm_memcpy_from_wc(void *dst, const void *src, unsigned long len)
> {
>- if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15))
>+ if (unlikely(((unsigned long)src | len) & 15)) {
> memcpy(dst, src, len);
>- else if (likely(len))
>- __memcpy_ntdqa(dst, src, len >> 4);
>+ } else if (likely(len)) {
>+ if (IS_ALIGNED((unsigned long)dst, 16))
we may want to just extend this function to deal with dst not being
aligned. But this may be done on top
Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>
Lucas De Marchi
>+ __memcpy_ntdqa(dst, src, len >> 4);
>+ else
>+ __memcpy_ntdqu(dst, src, len >> 4);
>+ }
> }
>
> /**
>--
>2.25.1
>
More information about the Intel-gfx
mailing list