[PATCH 2/2] dma-buf: add warning when dma_fence is signaled from IOCTL
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Wed Aug 13 11:59:06 UTC 2025
On 13/08/2025 12:10, Christian König wrote:
> On 13.08.25 10:20, Tvrtko Ursulin wrote:
>>
>> On 12/08/2025 15:34, Christian König wrote:
>>> From: Christian König <ckoenig at able.fritz.box>
>>>
>>> We have the re-occurring problem that people try to invent a
>>> DMA-fences implementation which signals fences based on an userspace
>>> IOCTL.
>>>
>>> This is well known as source of hard to track down crashes and is
>>> documented to be an invalid approach. The problem is that it seems
>>> to work during initial testing and only long term tests points out
>>> why this can never work correctly.
>>>
>>> So give at least a warning when people try to signal a fence from
>>> task context and not from interrupts or a work item. This check is
>>> certainly not perfect but better than nothing.
>>
>> I lack context as to why this should be disallowed so strongly (maybe cover letter is missing to better explain it all?)
>
> See here https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#indefinite-dma-fences
>
> I was hoping that this problem is so well known by now that it doesn't need more explanation.
>
> Going to expand the commit message a bit.
I probably got thrown by the mention of crashes. But yes, expanding the
commit and clarifying it is about deadlocks and indefinite fences, or
adding the cover letter would be better thanks!
>> , but at least if feels overly restrictive to for example exclude threads and thread workers.
>
> Good point. Could be that someone is using a pure kernel thread for fence signaling. Going to check for that instead of a worker.
Ok, I am curious how to do it reliably. Non-null current and PF_KTHREAD
and PF_WQ_WORKER not set?
>> Even the fact opportunistic signalling needs to bypass the assert makes it sound like there isn't anything fundamentally wrong with signalling from task context.
>>
>
> Opportunistic signaling can happen from everywhere. But when an implementation tries to signal it from an IOCTL that is certainly invalid.
>
>> The first patch also feels a bit too much if it has no purpose apart from checking the new asserts would otherwise trigger.
>
> The sw_sync code is is only used for testing and debugging. See the Kconfig of it:
>
> A sync object driver that uses a 32bit counter to coordinate
> synchronization. Useful when there is no hardware primitive backing
> the synchronization.
>
> WARNING: improper use of this can result in deadlocking kernel
> drivers from userspace. Intended for test and debug only.
>
But it is adding kernel code for a questionable benefit.
What about calling the non-asserting version instead of adding
complexity? It is kernel code so should be fine.
Because even with the worker sw_sync can still create infinite fences.
Worker just defeats the asserts so I do not see the value in
complicating it.
Regards,
Tvrtko
>>
>>> Signed-off-by: Christian König <ckoenig at able.fritz.box>
>>> ---
>>> drivers/dma-buf/dma-fence.c | 59 +++++++++++++++++++++++++++----------
>>> include/linux/dma-fence.h | 9 ++++--
>>> 2 files changed, 51 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index 3f78c56b58dc..2bce620eacac 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -345,33 +345,23 @@ void __dma_fence_might_wait(void)
>>> }
>>> #endif
>>> -
>>> /**
>>> - * dma_fence_signal_timestamp_locked - signal completion of a fence
>>> + * dma_fence_signal_internal - internal signal completion of a fence
>>> * @fence: the fence to signal
>>> * @timestamp: fence signal timestamp in kernel's CLOCK_MONOTONIC time domain
>>> *
>>> - * Signal completion for software callbacks on a fence, this will unblock
>>> - * dma_fence_wait() calls and run all the callbacks added with
>>> - * dma_fence_add_callback(). Can be called multiple times, but since a fence
>>> - * can only go from the unsignaled to the signaled state and not back, it will
>>> - * only be effective the first time. Set the timestamp provided as the fence
>>> - * signal timestamp.
>>> - *
>>> - * Unlike dma_fence_signal_timestamp(), this function must be called with
>>> - * &dma_fence.lock held.
>>> + * Internal signal the dma_fence without error checking. Should *NEVER* be used
>>> + * by drivers or external code directly.
>>> *
>>> * Returns 0 on success and a negative error value when @fence has been
>>> * signalled already.
>>> */
>>> -int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>> - ktime_t timestamp)
>>> +int dma_fence_signal_internal(struct dma_fence *fence, ktime_t timestamp)
>>> {
>>> struct dma_fence_cb *cur, *tmp;
>>> struct list_head cb_list;
>>> lockdep_assert_held(fence->lock);
>>> -
>>> if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>>> &fence->flags)))
>>> return -EINVAL;
>>> @@ -390,7 +380,46 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>> return 0;
>>> }
>>> -EXPORT_SYMBOL(dma_fence_signal_timestamp_locked);
>>> +EXPORT_SYMBOL(dma_fence_signal_internal);
>>> +
>>> +/**
>>> + * dma_fence_signal_timestamp_locked - signal completion of a fence
>>> + * @fence: the fence to signal
>>> + * @timestamp: fence signal timestamp in kernel's CLOCK_MONOTONIC time domain
>>> + *
>>> + * Signal completion for software callbacks on a fence, this will unblock
>>> + * dma_fence_wait() calls and run all the callbacks added with
>>> + * dma_fence_add_callback(). Can be called multiple times, but since a fence
>>> + * can only go from the unsignaled to the signaled state and not back, it will
>>> + * only be effective the first time. Set the timestamp provided as the fence
>>> + * signal timestamp.
>>> + *
>>> + * Unlike dma_fence_signal_timestamp(), this function must be called with
>>> + * &dma_fence.lock held.
>>> + *
>>> + * Returns 0 on success and a negative error value when @fence has been
>>> + * signalled already.
>>> + */
>>> +int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>> + ktime_t timestamp)
>>> +{
>>> + /*
>>> + * We have the re-occurring problem that people try to invent a
>>> + * DMA-fences implementation which signals fences based on an userspace
>>> + * IOCTL.
>>> + *
>>> + * This is well known as source of hard to track down crashes and is
>>> + * documented to be an invalid approach. The problem is that it seems
>>> + * to work during initial testing and only long term tests points out
>>> + * why this can never work correctly.
>>> + *
>>> + * So give at least a warning when people try to signal a fence from
>>> + * task context and not from interrupts or a work item. This check is
>>> + * certainly not perfect but better than nothing.
>>> + */
>>> + WARN_ON_ONCE(!in_interrupt() && !current_work());
>>> + return dma_fence_signal_internal(fence, timestamp);
>>> +}
>>> /**
>>> * dma_fence_signal_timestamp - signal completion of a fence
>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>> index 64639e104110..8dbcd66989b8 100644
>>> --- a/include/linux/dma-fence.h
>>> +++ b/include/linux/dma-fence.h
>>> @@ -369,6 +369,7 @@ int dma_fence_signal_locked(struct dma_fence *fence);
>>> int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp);
>>> int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>> ktime_t timestamp);
>>> +int dma_fence_signal_internal(struct dma_fence *fence, ktime_t timestamp);
>>> signed long dma_fence_default_wait(struct dma_fence *fence,
>>> bool intr, signed long timeout);
>>> int dma_fence_add_callback(struct dma_fence *fence,
>>> @@ -422,7 +423,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>>> return true;
>>> if (fence->ops->signaled && fence->ops->signaled(fence)) {
>>> - dma_fence_signal_locked(fence);
>>> + dma_fence_signal_internal(fence, ktime_get());
>>> return true;
>>> }
>>> @@ -448,11 +449,15 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>>> static inline bool
>>> dma_fence_is_signaled(struct dma_fence *fence)
>>> {
>>> + unsigned long flags;
>>> +
>>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>> return true;
>>> if (fence->ops->signaled && fence->ops->signaled(fence)) {
>>> - dma_fence_signal(fence);
>>> + spin_lock_irqsave(fence->lock, flags);
>>> + dma_fence_signal_internal(fence, ktime_get());
>>> + spin_unlock_irqrestore(fence->lock, flags);
>>> return true;
>>> }
>>>
>>
>
More information about the dri-devel
mailing list