[Mesa-dev] [PATCH] anv/query: Busy-wait for available query entries
Chris Wilson
chris at chris-wilson.co.uk
Wed Apr 5 08:27:15 UTC 2017
On Tue, Apr 04, 2017 at 07:21:38PM -0700, Jason Ekstrand wrote:
> Before, we were just looking at whether or not the user wanted us to
> wait and waiting on the BO. This instead makes us busy-loop on each
> query until it's available. This reduces some of the pipeline bubbles
> we were getting and improves performance of The Talos Principle on
> medium settings (where the GPU isn't overloaded) by around 20% on my
> SkyLake gt4.
Hmm. The kernel also spins, but it limits itself to only spining on the
active request and for a max of 2us within your process's timeslice.
The ioctl overhead is ~100ns in this case, cheaper than a call to
clock_gettime()! Looks like the advantage here is that you do not limit
yourself. A much simpler loop doing the same would be
while (true) {
if (query_is_available())
return VK_SUCCESS;
if (!gem_busy())
return query_is_available() ? VK_SUCCESS : VK_NOT_READY;
}
> ---
> src/intel/vulkan/genX_query.c | 72 +++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 66 insertions(+), 6 deletions(-)
>
> diff --git a/src/intel/vulkan/genX_query.c b/src/intel/vulkan/genX_query.c
> index 7ea9404..ebf99d2 100644
> --- a/src/intel/vulkan/genX_query.c
> +++ b/src/intel/vulkan/genX_query.c
> @@ -131,6 +131,64 @@ cpu_write_query_result(void *dst_slot, VkQueryResultFlags flags,
> }
> }
>
> +#define NSEC_PER_SEC 1000000000
> +
> +static bool
> +query_is_available(struct anv_device *device, uint64_t *slot)
> +{
> + if (!device->info.has_llc)
> + __builtin_ia32_clflush(slot);
Make the target cacheable? Your query write will then do the cacheline
invalidation, but there's obviously a tradeoff depending on the frequency
of snooping.
> +
> + return slot[0];
> +}
> +
> +static VkResult
> +wait_for_available(struct anv_device *device,
> + struct anv_query_pool *pool, uint64_t *slot)
> +{
> + while (true) {
> + struct timespec start;
> + clock_gettime(CLOCK_MONOTONIC, &start);
> +
> + while (true) {
> + if (!device->info.has_llc)
> + __builtin_ia32_clflush(slot);
> +
> + if (query_is_available(device, slot))
> + return VK_SUCCESS;
> +
> + struct timespec current;
> + clock_gettime(CLOCK_MONOTONIC, ¤t);
> +
> + if (current.tv_nsec < start.tv_nsec) {
> + current.tv_nsec += NSEC_PER_SEC;
> + current.tv_sec -= 1;
> + }
> +
> + /* If we've been looping for more than 1 ms, break out of the busy
> + * loop and ask the kernel if the buffer is actually busy.
> + */
> + if (current.tv_sec > start.tv_sec ||
> + current.tv_nsec - start.tv_nsec > 1000000)
> + break;
> + }
> +
> + VkResult result = anv_device_wait(device, &pool->bo, 0);
Using the busy-ioctl is even cheaper than wait(0).
> + switch (result) {
> + case VK_SUCCESS:
> + /* The BO is no longer busy. If we haven't seen availability yet,
> + * then we never will.
> + */
> + return query_is_available(device, slot) ? VK_SUCCESS : VK_NOT_READY;
> + case VK_TIMEOUT:
> + /* The BO is still busy, keep waiting. */
> + continue;
> + default:
> + return result;
> + }
> + }
> +}
--
Chris Wilson, Intel Open Source Technology Centre
More information about the mesa-dev
mailing list