[PATCH 1/2] dma-fence: Add a single fence fast path for fence merging
Tvrtko Ursulin
tursulin at igalia.com
Wed Nov 13 17:19:46 UTC 2024
From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
Testing some workloads in two different scenarios, such as games running
under Gamescope on a Steam Deck, or vkcube under a Plasma desktop, shows
that in a significant portion of calls the dma_fence_unwrap_merge helper
is called with just a single unsignalled fence.
Therefore it is worthile to add a fast path for that case and so bypass
the memory allocation and insertion sort attempts.
Tested scenarios:
1) Hogwarts Legacy under Gamescope
450 calls per second to __dma_fence_unwrap_merge.
Percentages per number of fences buckets, before and after checking for
signalled status, sorting and flattening:
N Before After
0 0.85%
1 69.80% -> The new fast path.
2-9 29.36% 9% (Ie. 91% of this bucket flattened to 1 fence)
10-19
20-40
50+
2) Cyberpunk 2077 under Gamescope
1050 calls per second.
N Before After
0 0.71%
1 52.53% -> The new fast path.
2-9 44.38% 50.60% (Ie. half resolved to a single fence)
10-19 2.34%
20-40 0.06%
50+
3) vkcube under Plasma
90 calls per second.
N Before After
0
1
2-9 100% 0% (Ie. all resolved to a single fence)
10-19
20-40
50+
In the case of vkcube all invocations in the 2-9 bucket were actually
just two input fences.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
Cc: Christian König <christian.koenig at amd.com>
Cc: Friedrich Vock <friedrich.vock at gmx.de>
---
drivers/dma-buf/dma-fence-unwrap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
index 628af51c81af..75c3e37fd617 100644
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -64,8 +64,8 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence **fences,
struct dma_fence_unwrap *iter)
{
+ struct dma_fence *tmp, *signaled, **array;
struct dma_fence_array *result;
- struct dma_fence *tmp, **array;
ktime_t timestamp;
unsigned int i;
size_t count;
@@ -75,6 +75,7 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
for (i = 0; i < num_fences; ++i) {
dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
if (!dma_fence_is_signaled(tmp)) {
+ signaled = tmp;
++count;
} else {
ktime_t t = dma_fence_timestamp(tmp);
@@ -88,9 +89,14 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
/*
* If we couldn't find a pending fence just return a private signaled
* fence with the timestamp of the last signaled one.
+ *
+ * Or if there was a single unsignaled fence left we can return it
+ * directly and early since that is a major path on many workloads.
*/
if (count == 0)
return dma_fence_allocate_private_stub(timestamp);
+ else if (count == 1)
+ return dma_fence_get(signaled);
array = kmalloc_array(count, sizeof(*array), GFP_KERNEL);
if (!array)
--
2.46.0
More information about the dri-devel
mailing list