[PATCH 2/3] dma-buf: sort fences in dma_fence_unwrap_merge

Thu Nov 7 16:00:33 UTC 2024

On 24/10/2024 13:41, Christian König wrote:
> The merge function initially handled only individual fences and
> arrays which in turn were created by the merge function. This allowed
> to create the new array by a simple merge sort based on the fence
> context number.
> 
> The problem is now that since the addition of timeline sync objects
> userspace can create chain containers in basically any fence context
> order.
> 
> If those are merged together it can happen that we create really
> large arrays since the merge sort algorithm doesn't work any more.
> 
> So put an insert sort behind the merge sort which kicks in when the
> input fences are not in the expected order. This isn't as efficient
> as a heap sort, but has better properties for the most common use
> case.
> 
> Signed-off-by: Christian König <christian.koenig at amd.com>
> ---
>   drivers/dma-buf/dma-fence-unwrap.c | 39 ++++++++++++++++++++++++++----
>   1 file changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
> index 628af51c81af..d9aa280d9ff6 100644
> --- a/drivers/dma-buf/dma-fence-unwrap.c
> +++ b/drivers/dma-buf/dma-fence-unwrap.c
> @@ -106,7 +106,7 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
>   		fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]);
>   
>   	count = 0;
> -	do {
> +	while (true) {
>   		unsigned int sel;
>   
>   restart:
> @@ -144,11 +144,40 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
>   			}
>   		}
>   
> -		if (tmp) {
> -			array[count++] = dma_fence_get(tmp);
> -			fences[sel] = dma_fence_unwrap_next(&iter[sel]);
> +		if (!tmp)
> +			break;
> +
> +		/*
> +		 * We could use a binary search here, but since the assumption
> +		 * is that the main input are already sorted dma_fence_arrays
> +		 * just looking from end has a higher chance of finding the
> +		 * right location on the first try
> +		 */
> +
> +		for (i = count; i--;) {
> +			if (likely(array[i]->context < tmp->context))
> +				break;
> +
> +			if (array[i]->context == tmp->context) {
> +				if (dma_fence_is_later(tmp, array[i])) {
> +					dma_fence_put(array[i]);
> +					array[i] = dma_fence_get(tmp);
> +				}
> +				fences[sel] = dma_fence_unwrap_next(&iter[sel]);
> +				goto restart;
> +			}
>   		}
> -	} while (tmp);
> +
> +		++i;
> +		/*
> +		 * Make room for the fence, this should be a nop most of the
> +		 * time.
> +		 */
> +		memcpy(&array[i + 1], &array[i], (count - i) * sizeof(*array));
> +		array[i] = dma_fence_get(tmp);
> +		fences[sel] = dma_fence_unwrap_next(&iter[sel]);
> +		count++;

Having ventured into this function for the first time, I can say that 
this is some smart code which is not easy to grasp. It could definitely 
benefit from a high level comment before the do-while loop to explain 
what it is going to do.

Next and tmp local variable names I also wonder if could be renamed to 
something more descriptive.

And the algorithmic complexity of the end result, given the multiple 
loops and gotos, I have no idea what it could be.

Has a dumb solution been considered like a two-pass with a 
pessimistically allocated fence array been considered? Like:

1) Populate array with all unsignalled unwrapped fences. (O(count))

2) Bog standard include/linux/sort.h by context and seqno. (O(count*log 
(count)))

3) Walk array and squash same context to latest fence. (Before this 
patch that wasn't there, right?). (O(count)) (Overwrite in place, no 
memcpy needed.)

Algorithmic complexity of that would be obvious and code much simpler.

Regards,

Tvrtko

> +	};
>   
>   	if (count == 0) {
>   		tmp = dma_fence_allocate_private_stub(ktime_get());