[RFC] drm: Optimise drm_ioctl() for small user args

Joonas Lahtinen joonas.lahtinen at linux.intel.com
Tue May 30 13:24:23 UTC 2017


On ti, 2017-05-30 at 13:55 +0100, Chris Wilson wrote:
> When looking at simple ioctls coupled with conveniently small user
> parameters, the overhead of the syscall and drm_ioctl() present large
> low hanging fruit. Profiling trivial microbenchmarks around
> i915_gem_busy_ioctl, the low hanging fruit comprises of the call to
> copy_user(). Those calls are only inlined by the macro where the
> constant is known at compile-time, but the ioctl argument size depends
> on the ioctl number. To help the compiler, explicitly add switches for
> the small sizes that expand to simple moves to/from user. Doing the
> multiple inlines does add significant code bloat, so it is very
> debatable as to its value. Back to the trivial, but frequently used,
> example of i915_gem_busy_ioctl() on a Broadwell avoiding the call gives
> us a 15-25% improvement:
> 
> 			 before		  after
> 	single		100.173ns	 84.496ns
> 	parallel (x4)	204.275ns	152.957ns
> 
> On a baby Broxton nuc:
> 
> 			before           after
> 	single		245.355ns	199.477ns
> 	parallel (x2)	280.892ns	232.726ns
> 
> Looking at the cost distribution by moving an equivalent switch into
> arch/x86/lib/usercopy, the overhead to the copy user is split almost
> equally between the function call and the actual copy itself. It seems
> copy_user_enhanced_fast_string simply is not that good at small (single
> register) copies. Something as simple as
> 
> @@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
>  {
>         unsigned ret;
> 
> +       if (len <= 16)
> +               return copy_user_generic_unrolled(to, from, len);
> 
> is enough to speed up i915_gem_busy_ioctl() by 10% :|
> 
> Note that this overhead may entirely be x86 specific.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>

I think this should be integrated into __copy_{to,from}_user directly,
but in the meanwhile the code is;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


More information about the dri-devel mailing list