[Mesa-dev] [PATCH] anv/query: Busy-wait for available query entries

Jason Ekstrand jason at jlekstrand.net
Wed Apr 5 16:37:16 UTC 2017


On Wed, Apr 5, 2017 at 1:27 AM, Chris Wilson <chris at chris-wilson.co.uk>
wrote:

> On Tue, Apr 04, 2017 at 07:21:38PM -0700, Jason Ekstrand wrote:
> > Before, we were just looking at whether or not the user wanted us to
> > wait and waiting on the BO.  This instead makes us busy-loop on each
> > query until it's available.  This reduces some of the pipeline bubbles
> > we were getting and improves performance of The Talos Principle on
> > medium settings (where the GPU isn't overloaded) by around 20% on my
> > SkyLake gt4.
>
> Hmm. The kernel also spins, but it limits itself to only spining on the
> active request and for a max of 2us within your process's timeslice.
> The ioctl overhead is ~100ns in this case, cheaper than a call to
> clock_gettime()! Looks like the advantage here is that you do not limit
> yourself. A much simpler loop doing the same would be
>

If clock_gettime() isn't gaining me anything, I'm very happy to drop it.
It just makes things more complicated as you say.  Am I better off just
calling gem_busy in my loop?


>         while (true) {
>                 if (query_is_available())
>                         return VK_SUCCESS;
>
>                 if (!gem_busy())
>                         return query_is_available() ? VK_SUCCESS :
> VK_NOT_READY;
>         }
>
> > ---
> >  src/intel/vulkan/genX_query.c | 72 ++++++++++++++++++++++++++++++
> +++++++++----
> >  1 file changed, 66 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/intel/vulkan/genX_query.c
> b/src/intel/vulkan/genX_query.c
> > index 7ea9404..ebf99d2 100644
> > --- a/src/intel/vulkan/genX_query.c
> > +++ b/src/intel/vulkan/genX_query.c
> > @@ -131,6 +131,64 @@ cpu_write_query_result(void *dst_slot,
> VkQueryResultFlags flags,
> >     }
> >  }
> >
> > +#define NSEC_PER_SEC 1000000000
> > +
> > +static bool
> > +query_is_available(struct anv_device *device, uint64_t *slot)
> > +{
> > +   if (!device->info.has_llc)
> > +      __builtin_ia32_clflush(slot);
>
> Make the target cacheable? Your query write will then do the cacheline
> invalidation, but there's obviously a tradeoff depending on the frequency
> of snooping.
>

I'd like to do that eventually, yes.  I'm planning to make some driver-wide
changes to use better maps and caching in the future.  For now, this is the
way the rest of the driver works.


> > +
> > +   return slot[0];
> > +}
> > +
> > +static VkResult
> > +wait_for_available(struct anv_device *device,
> > +                   struct anv_query_pool *pool, uint64_t *slot)
> > +{
> > +   while (true) {
> > +      struct timespec start;
> > +      clock_gettime(CLOCK_MONOTONIC, &start);
> > +
> > +      while (true) {
> > +         if (!device->info.has_llc)
> > +            __builtin_ia32_clflush(slot);
> > +
> > +         if (query_is_available(device, slot))
> > +            return VK_SUCCESS;
> > +
> > +         struct timespec current;
> > +         clock_gettime(CLOCK_MONOTONIC, &current);
> > +
> > +         if (current.tv_nsec < start.tv_nsec) {
> > +            current.tv_nsec += NSEC_PER_SEC;
> > +            current.tv_sec -= 1;
> > +         }
> > +
> > +         /* If we've been looping for more than 1 ms, break out of the
> busy
> > +          * loop and ask the kernel if the buffer is actually busy.
> > +          */
> > +         if (current.tv_sec > start.tv_sec ||
> > +             current.tv_nsec - start.tv_nsec > 1000000)
> > +            break;
> > +      }
> > +
> > +      VkResult result = anv_device_wait(device, &pool->bo, 0);
>
> Using the busy-ioctl is even cheaper than wait(0).
>

Sure.  I can add an anv_device_bo_busy or just make anv_device_wait with a
timeout of 0 use the busy ioctl.


> > +      switch (result) {
> > +      case VK_SUCCESS:
> > +         /* The BO is no longer busy.  If we haven't seen availability
> yet,
> > +          * then we never will.
> > +          */
> > +         return query_is_available(device, slot) ? VK_SUCCESS :
> VK_NOT_READY;
> > +      case VK_TIMEOUT:
> > +         /* The BO is still busy, keep waiting. */
> > +         continue;
> > +      default:
> > +         return result;
> > +      }
> > +   }
> > +}
>
> --
> Chris Wilson, Intel Open Source Technology Centre
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170405/e9994e04/attachment.html>


More information about the mesa-dev mailing list