[Intel-gfx] [PATCH 2/4] drm/cache: Try to be smarter about clflushing on x86

Chris Wilson chris at chris-wilson.co.uk
Sun Dec 14 04:59:41 PST 2014


On Sat, Dec 13, 2014 at 07:08:22PM -0800, Ben Widawsky wrote:
> Any GEM driver which has very large objects and a slow CPU is subject to very
> long waits simply for clflushing incoherent objects. Generally, each individual
> object is not a problem, but if you have very large objects, or very many
> objects, the flushing begins to show up in profiles. Because on x86 we know the
> cache size, we can easily determine when an object will use all the cache, and
> forego iterating over each cacheline.
> 
> We need to be careful when using wbinvd. wbinvd() is itself potentially slow
> because it requires synchronizing the flush across all CPUs so they have a
> coherent view of memory. This can result in either stalling work being done on
> other CPUs, or this call itself stalling while waiting for a CPU to accept the
> interrupt. Also, wbinvd() also has the downside of invalidating all cachelines,
> so we don't want to use it unless we're sure we already own most of the
> cachelines.
> 
> The current algorithm is very naive. I think it can be tweaked more, and it
> would be good if someone else gave it some thought. I am pretty confident in
> i915, we can even skip the IPI in the execbuf path with minimal code change (or
> perhaps just some verifying of the existing code). It would be nice to hear what
> other developers who depend on this code think.

One of the things wbinvd is considered evil for is that it blocks the
CPU for an indeterminate amount of time - upsetting latency critcial
aspects of the OS. For example, the x86/mm has similar code to use
wbinvd for large clflushes that caused a bit of controversy with RT:

http://linux-kernel.2935.n7.nabble.com/PATCH-x86-Use-clflush-instead-of-wbinvd-whenever-possible-when-changing-mapping-td493751.html

and also the use of wbinvd in the nvidia driver has also been noted as
evil by RT folks.

However as the wbinvd still exists, it can't be all that bad...

> Cc: Intel GFX <intel-gfx at lists.freedesktop.org>
> Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> ---
>  drivers/gpu/drm/drm_cache.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index d7797e8..6009c2d 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -64,6 +64,20 @@ static void drm_cache_flush_clflush(struct page *pages[],
>  		drm_clflush_page(*pages++);
>  	mb();
>  }
> +
> +static bool
> +drm_cache_should_clflush(unsigned long num_pages)
> +{
> +	const int cache_size = boot_cpu_data.x86_cache_size;

Maybe const int cpu_cache_size = boot_cpu_data.x86_cache_size >> 2; /* in pages */

Just to make it clear where the factor of 4 is required.

How stand alone is this patch? What happens if you just apply this by
itself? I presume it wasn't all that effective since you needed the
additional patches to prevent superfluous flushes. But it should have an
effect to reduce the time it takes to bind framebuffers, etc. I expect
it to overall worsen performance as we do the repeated wbinvd in
execbuffer.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the dri-devel mailing list