[PATCH 34/36] x86-64: Always use copy_user_generic_unrolled for small copies

Chris Wilson chris at chris-wilson.co.uk
Wed May 31 20:28:00 UTC 2017

As tested on Broadwell, for small cache-hot copies, the unrolled assembly
code is more than twice as fast as copy_user_enhanced_fast_string, though
beyond L1 cache the fast-string routine excels. Preferring to use
copy_user_generic_unrolled() for small copies gives a 10% performance
increase for a lightweight, frequently used, ioctl that reads/write an
8 byte struct - though as it is called through a generic ioctl dispatcher
the struct size is not known at compile time.

Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
 arch/x86/include/asm/uaccess_64.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index ff2d65baa988..10f83cba3285 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
 	unsigned ret;
+	if (len <= 512)
+		return copy_user_generic_unrolled(to, from, len);
 	 * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
 	 * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.

More information about the Intel-gfx-trybot mailing list