[Mesa-dev] [PATCH v3 2/2] anv/query: Busy-wait for available query entries

Wed Apr 5 20:00:16 UTC 2017

On Wed, Apr 5, 2017 at 11:18 AM, Chris Wilson <chris at chris-wilson.co.uk>
wrote:

> On Wed, Apr 05, 2017 at 11:02:18AM -0700, Jason Ekstrand wrote:
> >    On Wed, Apr 5, 2017 at 10:45 AM, Chris Wilson
> >    <[1]chris at chris-wilson.co.uk> wrote:
> >
> >      On Wed, Apr 05, 2017 at 10:28:53AM -0700, Jason Ekstrand wrote:
> >      > Before, we were just looking at whether or not the user wanted us
> to
> >      > wait and waiting on the BO.  Some clients, such as the Serious
> engine,
> >      > use a single query pool for hundreds of individual query results
> where
> >      > the writes for those queries may be split across several command
> >      > buffers.  In this scenario, the individual query we're looking
> for may
> >      > become available long before the BO is idle so waiting on the
> query
> >      pool
> >      > BO to be finished is wasteful. This commit makes us instead
> busy-loop
> >      on
> >      > each query until it's available.
> >      >
> >      > This significantly reduces pipeline bubbles and improves
> performance
> >      of
> >      > The Talos Principle on medium settings (where the GPU isn't
> overloaded
> >      > with drawing) by around 20% on my SkyLake gt4.
> >      > ---
> >      >  src/intel/vulkan/genX_query.c | 52
> >      ++++++++++++++++++++++++++++++++++++++-----
> >      >  1 file changed, 46 insertions(+), 6 deletions(-)
> >      >
> >      > diff --git a/src/intel/vulkan/genX_query.c
> >      b/src/intel/vulkan/genX_query.c
> >      > index 7ea9404..0d303a6 100644
> >      > --- a/src/intel/vulkan/genX_query.c
> >      > +++ b/src/intel/vulkan/genX_query.c
> >      > @@ -131,6 +131,44 @@ cpu_write_query_result(void *dst_slot,
> >      VkQueryResultFlags flags,
> >      >     }
> >      >  }
> >      >
> >      > +static bool
> >      > +query_is_available(struct anv_device *device, volatile uint64_t
> >      *slot)
> >      > +{
> >      > +   if (!device->info.has_llc)
> >      > +      __builtin_ia32_clflush(slot);
> >      > +
> >      > +   return slot[0];
> >      > +}
> >      > +
> >      > +static VkResult
> >      > +wait_for_available(struct anv_device *device,
> >      > +                   struct anv_query_pool *pool, uint64_t *slot)
> >      > +{
> >      > +   while (true) {
> >      > +      if (query_is_available(device, slot))
> >      > +         return VK_SUCCESS;
> >      > +
> >      > +      VkResult result = anv_device_bo_busy(device, &pool->bo);
> >
> >      Ah, but you can use the simpler check here because you follow up
> with a
> >      query_is_available() so you know whether or not the hang clobbered
> the
> >      result.
> >
> >      If the query is not available but the bo is idle, you might then
> went to
> >      check for a reset in case it was due to a lost device. GEM_BUSY is
> >      lockless, but GEM_RESET_STATS currently takes the big struct_mutex
> and
> >      so has non-deterministic and often quite large latencies.
> >
> >    anv_device_bo_busy does that for us. :-)  In particular, it queries
> >    GEM_RESET_STATS but only if the BO is not busy.  This way, so long as
> the
> >    BO is busy, we only do GEM_BUSY but as soon as it is not busy we do a
> >    single GEM_RESET_STATS and return VK_ERROR_DEVICE_LOST if we've
> gotten a
> >    hang.  The theory (the same for wait) is that, so long as the BO is
> still
> >    busy, we either haven't hung or we haven't figured out that we've
> hung.
> >    We could, in theory have two batches where the first one hangs and the
> >    second is ok, so this isn't quite true.  However, the important thing
> is
> >    to never return VK_SUCCESS to the user when results are invalid.  Both
> >    anv_device_wait and anv_device_bo_busy only do an early return of
> >    VK_NOT_READY or VK_TIMEOUT and don't return VK_SUCCESS until after
> they've
> >    called GEM_RESET_STATS to ensuring that we haven't hung.
>
> It's the ordering though. Currently it is
>
>         !query_is_available
>         !busy
>         get_reset_stats <-- slow
>         return query_is_available.
>
> I'm arguing that you want
>
>         !query_is_available
>         !busy
>         if (query_is_available) return SUCCESS
>         return get_reset_stats <-- slow.
>
> with the observation that if the gpu completed the available before the
> hang, it is not affected by the unlikely hang. You really, really want
> to avoid get_reset_stats on low latency paths (for the next few months
> at least).
>

Ok, that makes more sense.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170405/0d39428b/attachment.html>