[PATCH 34/36] x86-64: Always use copy_user_generic_unrolled for small copies
chris at chris-wilson.co.uk
Wed May 31 20:28:00 UTC 2017
As tested on Broadwell, for small cache-hot copies, the unrolled assembly
code is more than twice as fast as copy_user_enhanced_fast_string, though
beyond L1 cache the fast-string routine excels. Preferring to use
copy_user_generic_unrolled() for small copies gives a 10% performance
increase for a lightweight, frequently used, ioctl that reads/write an
8 byte struct - though as it is called through a generic ioctl dispatcher
the struct size is not known at compile time.
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
arch/x86/include/asm/uaccess_64.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index ff2d65baa988..10f83cba3285 100644
@@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
+ if (len <= 512)
+ return copy_user_generic_unrolled(to, from, len);
* If CPU has ERMS feature, use copy_user_enhanced_fast_string.
* Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
More information about the Intel-gfx-trybot