[Mesa-dev] [PATCH] anv/query: Use snooping on !LLC platforms

Fri Apr 7 08:31:35 UTC 2017

On Thu, Apr 06, 2017 at 08:49:54PM -0700, Jason Ekstrand wrote:
> Commit b2c97bc789198427043cd902bc76e194e7e81c7d which made us start
> using a busy-wait for individual query results also messed up cache
> flushing on !LLC platforms.  For one thing, I forgot the mfence after
> the clflush so memory access wasn't properly getting fenced.

Ah, I thought it was confidence in the cacheline access from the same
cpu would be forced to wait. It wasn't completely broken as the second
clflush to the same cacheline is a full barrier, and there was always a
second blocking clflush prior to the final check. :)

Besides, mfence is not as strong a barrier of clflush vs gpu as the cpu
SDM imply (affects byt, bsw, later atoms unknown but since they share
other memory characteristics problably also affected). In the kernel I
resorted to doing a for_each_cacheline { clflush } clflush; mfence to
ensure the invalidate was successful.

> More
> importantly, however, was that we were clflushing the whole query range
> and then waiting for individual queries and then trying to read the
> results without clflushing again.  Getting the clflushing both correct
> and efficient is very subtle and painful.  Instead, let's side-step the
> problem by just snooping.
> ---
>  src/intel/vulkan/genX_query.c | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/src/intel/vulkan/genX_query.c b/src/intel/vulkan/genX_query.c
> index 235da8b..2c70b4f 100644
> --- a/src/intel/vulkan/genX_query.c
> +++ b/src/intel/vulkan/genX_query.c
> @@ -90,6 +90,14 @@ VkResult genX(CreateQueryPool)(
>     if (result != VK_SUCCESS)
>        goto fail;
>  
> +   /* For query pools, we set the caching mode to I915_CACHING_CACHED.  On LLC
> +    * platforms, this does nothing.  On non-LLC platforms, this means snooping
> +    * which comes at a slight cost.  However, the buffers aren't big, won't be
> +    * written frequently, and trying to handle the flushing manually without
> +    * doing too much flushing is extremely painful.
> +    */
> +   anv_gem_set_caching(device, pool->bo.gem_handle, I915_CACHING_CACHED);
> +
>     pool->bo.map = anv_gem_mmap(device, pool->bo.gem_handle, 0, size, 0);
>  
>     *pQueryPool = anv_query_pool_to_handle(pool);
> @@ -132,11 +140,8 @@ cpu_write_query_result(void *dst_slot, VkQueryResultFlags flags,
>  }
>  
>  static bool
> -query_is_available(struct anv_device *device, uint64_t *slot)
> +query_is_available(uint64_t *slot)
>  {
> -   if (!device->info.has_llc)
> -      __builtin_ia32_clflush(slot);
> -
>     return *(volatile uint64_t *)slot;
>  }
>  
> @@ -145,7 +150,7 @@ wait_for_available(struct anv_device *device,
>                     struct anv_query_pool *pool, uint64_t *slot)
>  {
>     while (true) {
> -      if (query_is_available(device, slot))
> +      if (query_is_available(slot))
>           return VK_SUCCESS;
>  
>        int ret = anv_gem_busy(device, pool->bo.gem_handle);
> @@ -159,7 +164,7 @@ wait_for_available(struct anv_device *device,
>        } else {
>           assert(ret == 0);
>           /* The BO is no longer busy. */
> -         if (query_is_available(device, slot)) {
> +         if (query_is_available(slot)) {
>              return VK_SUCCESS;
>           } else {
>              VkResult status = anv_device_query_status(device);
> @@ -204,13 +209,6 @@ VkResult genX(GetQueryPoolResults)(
>  
>     void *data_end = pData + dataSize;
>  
> -   if (!device->info.has_llc) {
> -      uint64_t offset = firstQuery * pool->stride;
> -      uint64_t size = queryCount * pool->stride;
> -      anv_invalidate_range(pool->bo.map + offset,
> -                           MIN2(size, pool->bo.size - offset));
> -   }
> -
>     VkResult status = VK_SUCCESS;
>     for (uint32_t i = 0; i < queryCount; i++) {
>        uint64_t *slot = pool->bo.map + (firstQuery + i) * pool->stride;

Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre