[Mesa-dev] [PATCH] anv/query: Busy-wait for available query entries

Chris Wilson chris at chris-wilson.co.uk
Wed Apr 5 17:07:55 UTC 2017


On Wed, Apr 05, 2017 at 09:37:16AM -0700, Jason Ekstrand wrote:
>    On Wed, Apr 5, 2017 at 1:27 AM, Chris Wilson <[1]chris at chris-wilson.co.uk>
>    wrote:
> 
>      On Tue, Apr 04, 2017 at 07:21:38PM -0700, Jason Ekstrand wrote:
>      > Before, we were just looking at whether or not the user wanted us to
>      > wait and waiting on the BO.  This instead makes us busy-loop on each
>      > query until it's available.  This reduces some of the pipeline bubbles
>      > we were getting and improves performance of The Talos Principle on
>      > medium settings (where the GPU isn't overloaded) by around 20% on my
>      > SkyLake gt4.
> 
>      Hmm. The kernel also spins, but it limits itself to only spining on the
>      active request and for a max of 2us within your process's timeslice.
>      The ioctl overhead is ~100ns in this case, cheaper than a call to
>      clock_gettime()! Looks like the advantage here is that you do not limit
>      yourself. A much simpler loop doing the same would be
> 
>    If clock_gettime() isn't gaining me anything, I'm very happy to drop it. 
>    It just makes things more complicated as you say.  Am I better off just
>    calling gem_busy in my loop?

Yes, for your purpose I think you will not observe any difference in
latency by including the gem_busy() check inside your busy-wait. The
cost of the busy-ioctl should be quite small (100ns or your money back),
which once you have confirmed that !query_is_available is going to be
much less than the typical spin duration. On the other hand, if you
typically spin for 1us here, then the busy ioctl will be significant.
Ymmv, I'm guessing a typical delay of ~100us :)

> 
>              while (true) {
>                      if (query_is_available())

Hmm, make sure the compiler doesn't cache the value of the query. The
ioctl call here will prevent the compiler assuming memory is constant
inside the loop. If you do a loop around the query, you may have to do a
volatile cast.

>      > +      VkResult result = anv_device_wait(device, &pool->bo, 0);
> 
>      Using the busy-ioctl is even cheaper than wait(0).
> 
>    Sure.  I can add an anv_device_bo_busy or just make anv_device_wait with a
>    timeout of 0 use the busy ioctl.

Technically they report slightly different things - a wait will also
report upon third parties using the bo, busy restricts itself to
reporting on i915.ko users. Something to keep in mind.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the mesa-dev mailing list