[Mesa-dev] [PATCH v3 2/2] anv/query: Busy-wait for available query entries
Chris Wilson
chris at chris-wilson.co.uk
Wed Apr 5 18:18:00 UTC 2017
On Wed, Apr 05, 2017 at 11:02:18AM -0700, Jason Ekstrand wrote:
> On Wed, Apr 5, 2017 at 10:45 AM, Chris Wilson
> <[1]chris at chris-wilson.co.uk> wrote:
>
> On Wed, Apr 05, 2017 at 10:28:53AM -0700, Jason Ekstrand wrote:
> > Before, we were just looking at whether or not the user wanted us to
> > wait and waiting on the BO. Some clients, such as the Serious engine,
> > use a single query pool for hundreds of individual query results where
> > the writes for those queries may be split across several command
> > buffers. In this scenario, the individual query we're looking for may
> > become available long before the BO is idle so waiting on the query
> pool
> > BO to be finished is wasteful. This commit makes us instead busy-loop
> on
> > each query until it's available.
> >
> > This significantly reduces pipeline bubbles and improves performance
> of
> > The Talos Principle on medium settings (where the GPU isn't overloaded
> > with drawing) by around 20% on my SkyLake gt4.
> > ---
> > src/intel/vulkan/genX_query.c | 52
> ++++++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 46 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/intel/vulkan/genX_query.c
> b/src/intel/vulkan/genX_query.c
> > index 7ea9404..0d303a6 100644
> > --- a/src/intel/vulkan/genX_query.c
> > +++ b/src/intel/vulkan/genX_query.c
> > @@ -131,6 +131,44 @@ cpu_write_query_result(void *dst_slot,
> VkQueryResultFlags flags,
> > }
> > }
> >
> > +static bool
> > +query_is_available(struct anv_device *device, volatile uint64_t
> *slot)
> > +{
> > + if (!device->info.has_llc)
> > + __builtin_ia32_clflush(slot);
> > +
> > + return slot[0];
> > +}
> > +
> > +static VkResult
> > +wait_for_available(struct anv_device *device,
> > + struct anv_query_pool *pool, uint64_t *slot)
> > +{
> > + while (true) {
> > + if (query_is_available(device, slot))
> > + return VK_SUCCESS;
> > +
> > + VkResult result = anv_device_bo_busy(device, &pool->bo);
>
> Ah, but you can use the simpler check here because you follow up with a
> query_is_available() so you know whether or not the hang clobbered the
> result.
>
> If the query is not available but the bo is idle, you might then went to
> check for a reset in case it was due to a lost device. GEM_BUSY is
> lockless, but GEM_RESET_STATS currently takes the big struct_mutex and
> so has non-deterministic and often quite large latencies.
>
> anv_device_bo_busy does that for us. :-) In particular, it queries
> GEM_RESET_STATS but only if the BO is not busy. This way, so long as the
> BO is busy, we only do GEM_BUSY but as soon as it is not busy we do a
> single GEM_RESET_STATS and return VK_ERROR_DEVICE_LOST if we've gotten a
> hang. The theory (the same for wait) is that, so long as the BO is still
> busy, we either haven't hung or we haven't figured out that we've hung.
> We could, in theory have two batches where the first one hangs and the
> second is ok, so this isn't quite true. However, the important thing is
> to never return VK_SUCCESS to the user when results are invalid. Both
> anv_device_wait and anv_device_bo_busy only do an early return of
> VK_NOT_READY or VK_TIMEOUT and don't return VK_SUCCESS until after they've
> called GEM_RESET_STATS to ensuring that we haven't hung.
It's the ordering though. Currently it is
!query_is_available
!busy
get_reset_stats <-- slow
return query_is_available.
I'm arguing that you want
!query_is_available
!busy
if (query_is_available) return SUCCESS
return get_reset_stats <-- slow.
with the observation that if the gpu completed the available before the
hang, it is not affected by the unlikely hang. You really, really want
to avoid get_reset_stats on low latency paths (for the next few months
at least).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the mesa-dev
mailing list