[PATCH 3/5] dma-fence: Add a single fence fast path for fence merging
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Thu Jan 9 10:53:53 UTC 2025
Christian - it looks this patch could be merged now.
Thanks,
Tvrtko
On 15/11/2024 10:21, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>
> Testing some workloads in two different scenarios, such as games running
> under Gamescope on a Steam Deck, or vkcube under a Plasma desktop, shows
> that in a significant portion of calls the dma_fence_unwrap_merge helper
> is called with just a single unsignalled fence.
>
> Therefore it is worthile to add a fast path for that case and so bypass
> the memory allocation and insertion sort attempts.
>
> Tested scenarios:
>
> 1) Hogwarts Legacy under Gamescope
>
> ~1500 calls per second to __dma_fence_unwrap_merge.
>
> Percentages per number of fences buckets, before and after checking for
> signalled status, sorting and flattening:
>
> N Before After
> 0 0.85%
> 1 69.80% -> The new fast path.
> 2-9 29.36% 9% (Ie. 91% of this bucket flattened to 1 fence)
> 10-19
> 20-40
> 50+
>
> 2) Cyberpunk 2077 under Gamescope
>
> ~2400 calls per second.
>
> N Before After
> 0 0.71%
> 1 52.53% -> The new fast path.
> 2-9 44.38% 50.60% (Ie. half resolved to a single fence)
> 10-19 2.34%
> 20-40 0.06%
> 50+
>
> 3) vkcube under Plasma
>
> 90 calls per second.
>
> N Before After
> 0
> 1
> 2-9 100% 0% (Ie. all resolved to a single fence)
> 10-19
> 20-40
> 50+
>
> In the case of vkcube all invocations in the 2-9 bucket were actually
> just two input fences.
>
> v2:
> * Correct local variable name and hold on to unsignaled reference. (Chistian)
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Friedrich Vock <friedrich.vock at gmx.de>
> ---
> drivers/dma-buf/dma-fence-unwrap.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
> index 6345062731f1..2a059ac0ed27 100644
> --- a/drivers/dma-buf/dma-fence-unwrap.c
> +++ b/drivers/dma-buf/dma-fence-unwrap.c
> @@ -84,8 +84,8 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
> struct dma_fence **fences,
> struct dma_fence_unwrap *iter)
> {
> + struct dma_fence *tmp, *unsignaled = NULL, **array;
> struct dma_fence_array *result;
> - struct dma_fence *tmp, **array;
> ktime_t timestamp;
> int i, j, count;
>
> @@ -94,6 +94,8 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
> for (i = 0; i < num_fences; ++i) {
> dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
> if (!dma_fence_is_signaled(tmp)) {
> + dma_fence_put(unsignaled);
> + unsignaled = dma_fence_get(tmp);
> ++count;
> } else {
> ktime_t t = dma_fence_timestamp(tmp);
> @@ -107,9 +109,16 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
> /*
> * If we couldn't find a pending fence just return a private signaled
> * fence with the timestamp of the last signaled one.
> + *
> + * Or if there was a single unsignaled fence left we can return it
> + * directly and early since that is a major path on many workloads.
> */
> if (count == 0)
> return dma_fence_allocate_private_stub(timestamp);
> + else if (count == 1)
> + return unsignaled;
> +
> + dma_fence_put(unsignaled);
>
> array = kmalloc_array(count, sizeof(*array), GFP_KERNEL);
> if (!array)
More information about the dri-devel
mailing list