[Mesa-dev] [PATCH v3 2/2] anv/query: Busy-wait for available query entries

Wed Apr 5 18:02:18 UTC 2017

On Wed, Apr 5, 2017 at 10:45 AM, Chris Wilson <chris at chris-wilson.co.uk>
wrote:

> On Wed, Apr 05, 2017 at 10:28:53AM -0700, Jason Ekstrand wrote:
> > Before, we were just looking at whether or not the user wanted us to
> > wait and waiting on the BO.  Some clients, such as the Serious engine,
> > use a single query pool for hundreds of individual query results where
> > the writes for those queries may be split across several command
> > buffers.  In this scenario, the individual query we're looking for may
> > become available long before the BO is idle so waiting on the query pool
> > BO to be finished is wasteful. This commit makes us instead busy-loop on
> > each query until it's available.
> >
> > This significantly reduces pipeline bubbles and improves performance of
> > The Talos Principle on medium settings (where the GPU isn't overloaded
> > with drawing) by around 20% on my SkyLake gt4.
> > ---
> >  src/intel/vulkan/genX_query.c | 52 ++++++++++++++++++++++++++++++
> ++++++++-----
> >  1 file changed, 46 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/intel/vulkan/genX_query.c
> b/src/intel/vulkan/genX_query.c
> > index 7ea9404..0d303a6 100644
> > --- a/src/intel/vulkan/genX_query.c
> > +++ b/src/intel/vulkan/genX_query.c
> > @@ -131,6 +131,44 @@ cpu_write_query_result(void *dst_slot,
> VkQueryResultFlags flags,
> >     }
> >  }
> >
> > +static bool
> > +query_is_available(struct anv_device *device, volatile uint64_t *slot)
> > +{
> > +   if (!device->info.has_llc)
> > +      __builtin_ia32_clflush(slot);
> > +
> > +   return slot[0];
> > +}
> > +
> > +static VkResult
> > +wait_for_available(struct anv_device *device,
> > +                   struct anv_query_pool *pool, uint64_t *slot)
> > +{
> > +   while (true) {
> > +      if (query_is_available(device, slot))
> > +         return VK_SUCCESS;
> > +
> > +      VkResult result = anv_device_bo_busy(device, &pool->bo);
>
> Ah, but you can use the simpler check here because you follow up with a
> query_is_available() so you know whether or not the hang clobbered the
> result.
>
> If the query is not available but the bo is idle, you might then went to
> check for a reset in case it was due to a lost device. GEM_BUSY is
> lockless, but GEM_RESET_STATS currently takes the big struct_mutex and
> so has non-deterministic and often quite large latencies.
>

anv_device_bo_busy does that for us. :-)  In particular, it queries
GEM_RESET_STATS but only if the BO is not busy.  This way, so long as the
BO is busy, we only do GEM_BUSY but as soon as it is not busy we do a
single GEM_RESET_STATS and return VK_ERROR_DEVICE_LOST if we've gotten a
hang.  The theory (the same for wait) is that, so long as the BO is still
busy, we either haven't hung or we haven't figured out that we've hung.  We
could, in theory have two batches where the first one hangs and the second
is ok, so this isn't quite true.  However, the important thing is to never
return VK_SUCCESS to the user when results are invalid.  Both
anv_device_wait and anv_device_bo_busy only do an early return of
VK_NOT_READY or VK_TIMEOUT and don't return VK_SUCCESS until after they've
called GEM_RESET_STATS to ensuring that we haven't hung.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170405/3c135b5b/attachment.html>