[Mesa-dev] [PATCH] anv/query: Busy-wait for available query entries

Thu Apr 6 06:24:09 UTC 2017

On Wed, 2017-04-05 at 09:31 -0700, Jason Ekstrand wrote:
> On Wed, Apr 5, 2017 at 12:24 AM, Iago Toral <itoral at igalia.com>
> wrote:
> > On Tue, 2017-04-04 at 19:21 -0700, Jason Ekstrand wrote:
> > > Before, we were just looking at whether or not the user wanted us
> > to
> > > wait and waiting on the BO.  This instead makes us busy-loop on
> > each
> > > query until it's available.  This reduces some of the pipeline
> > > bubbles
> > > we were getting and improves performance of The Talos Principle
> > on
> > > medium settings (where the GPU isn't overloaded) by around 20% on
> > my
> > > SkyLake gt4.
> > 
> > Do you think busy looping is a good strategy in general? In
> > scenarios
> > where the GPU is busier, isn't this expected to have a negative
> > effect
> > on performance?
> The reason why this gets better performance is not because it's a
> busy wait rather than asking the kernel to wait.  It's because it
> lets us wait on a single query rather than on the whole buffer. 
> Talos packs hundreds of queries into a single pool and the uses of
> that pool may be split across multiple command buffers.  Waiting for
> the whole BO causes us to basically wait for the GPU to be idle in
> order to get the one query result we want.
Ok,  that makes a lot more sense :)
> > 
> > > ---
> > 
> > >  src/intel/vulkan/genX_query.c | 72
> > 
> > > ++++++++++++++++++++++++++++++> > +++++++++----
> > 
> > >  1 file changed, 66 insertions(+), 6 deletions(-)
> > 
> > >
> > 
> > > diff --git a/src/intel/vulkan/genX_query.> > c
> > 
> > > b/src/intel/vulkan/genX_query.> > c
> > 
> > > index 7ea9404..ebf99d2 100644
> > 
> > > --- a/src/intel/vulkan/genX_query.> > c
> > 
> > > +++ b/src/intel/vulkan/genX_query.> > c
> > 
> > > @@ -131,6 +131,64 @@ cpu_write_query_result(void *dst_slot,
> > 
> > > VkQueryResultFlags flags,
> > 
> > >     }
> > 
> > >  }
> > 
> > >  
> > 
> > > +#define NSEC_PER_SEC 1000000000
> > 
> > > +
> > 
> > > +static bool
> > 
> > > +query_is_available(struct anv_device *device, uint64_t *slot)
> > 
> > > +{
> > 
> > > +   if (!device->info.has_llc)
> > 
> > > +      __builtin_ia32_clflush(> > slot);
> > 
> > > +
> > 
> > > +   return slot[0];
> > 
> > > +}
> > 
> > > +
> > 
> > > +static VkResult
> > 
> > > +wait_for_available(struct anv_device *device,
> > 
> > > +                   struct anv_query_pool *pool, uint64_t *slot)
> > 
> > > +{
> > 
> > > +   while (true) {
> > 
> > > +      struct timespec start;
> > 
> > > +      clock_gettime(CLOCK_> > MONOTONIC, &start);
> > 
> > > +
> > 
> > > +      while (true) {
> > 
> > > +         if (!device->info.has_llc)
> > 
> > > +            __builtin_ia32_> > clflush(slot);
> > 
> > > +
> > 
> > > +         if (query_is_available(device, slot))
> > 
> > > +            return VK_SUCCESS;
> > 
> > > +
> > 
> > > +         struct timespec current;
> > 
> > > +         clock_gettime(CLOCK_> > MONOTONIC, ¤t);
> > 
> > > +
> > 
> > > +         if (current.tv_nsec < start.tv_nsec) {
> > 
> > > +            current.tv_nsec += NSEC_PER_SEC;
> > 
> > > +            current.tv_sec -= 1;
> > 
> > > +         }
> > 
> > > +
> > 
> > > +         /* If we've been looping for more than 1 ms, break out of
> > 
> > > the busy
> > 
> > > +          * loop and ask the kernel if the buffer is actually busy.
> > 
> > > +          */
> > 
> > > +         if (current.tv_sec > start.tv_sec ||
> > 

> > 

> > This should be && instead of ||, right?
> > No.  Thanks to the subtraction above, if current.tv_sec > start.tv_sec, then it's been at least one whole second.
Right, apparently I can't read properly any more...
> > 
> > > +             current.tv_nsec - start.tv_nsec > 1000000)
> > 
> > > +            break;
> > 
> > > +      }
> > 
> > > +
> > 
> > > +      VkResult result = anv_device_wait(device, &pool->bo, 0);
> > 
> > > +      switch (result) {
> > 
> > > +      case VK_SUCCESS:
> > 
> > > +         /* The BO is no longer busy.  If we haven't seen
> > 
> > > availability yet,
> > 
> > > +          * then we never will.
> > 
> > > +          */
> > 
> > > +         return query_is_available(device, slot) ? VK_SUCCESS :
> > 
> > > VK_NOT_READY;
> > 

> > 
> > By the comment above it sounds like returning VK_NOT_READY isn't what
> > 
> > we want here. If we don't expect the query to ever be available at this
> > 
> > point, returning VK_NOT_READY will only prompt the application to call
> > 
> > us again... shouldn't we return an error in this case?

> > This case shouldn't happen.  If the client is using the API correctly, all of the query results it's trying to get should already have Begin/End submitted to the GPU.  However, if the client does something bogus, I don't want to land them in an infinite busy-loop.  Returning VK_NOT_READY seemed reasonable, but we could make it some error or other.
If this can only happen in the case of incorrect API usage I guess it
doesn't really matter much.
Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
> > 
> > > +      case VK_TIMEOUT:
> > 
> > > +         /* The BO is still busy, keep waiting. */
> > 
> > > +         continue;
> > 
> > > +      default:
> > 
> > > +         return result;
> > 
> > > +      }
> > 
> > > +   }
> > 
> > > +}
> > 
> > > +
> > 
> > >  VkResult genX(GetQueryPoolResults)(
> > 
> > >      VkDevice                 > >                    _device,
> > 
> > >      VkQueryPool              > >                    queryPool,
> > 
> > > @@ -154,12 +212,6 @@ VkResult genX(GetQueryPoolResults)(
> > 
> > >     if (pData == NULL)
> > 
> > >        return VK_SUCCESS;
> > 
> > >  
> > 
> > > -   if (flags & VK_QUERY_RESULT_WAIT_BIT) {
> > 
> > > -      VkResult result = anv_device_wait(device, &pool->bo,
> > 
> > > INT64_MAX);
> > 
> > > -      if (result != VK_SUCCESS)
> > 
> > > -         return result;
> > 
> > > -   }
> > 
> > > -
> > 
> > >     void *data_end = pData + dataSize;
> > 
> > >  
> > 
> > >     if (!device->info.has_llc) {
> > 
> > > @@ -176,6 +228,14 @@ VkResult genX(GetQueryPoolResults)(
> > 
> > >        /* Availability is always at the start of the slot */
> > 
> > >        bool available = slot[0];
> > 
> > >  
> > 
> > > +      if (!available && (flags & VK_QUERY_RESULT_WAIT_BIT)) {
> > 
> > > +         status = wait_for_available(device, pool, slot);
> > 
> > > +         if (status != VK_SUCCESS)
> > 
> > > +            return status;
> > 
> > > +
> > 
> > > +         available = true;
> > 
> > > +      }
> > 
> > > +
> > 
> > >        /* From the Vulkan 1.0.42 spec:
> > 
> > >         *
> > 
> > >         *    "If VK_QUERY_RESULT_WAIT_BIT and
> > 
> > > VK_QUERY_RESULT_PARTIAL_BIT are


> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170406/e725774f/attachment.html>