[RFC 0/4] Some (drm_sched_|dma_)fence lifetime issues
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Thu Apr 24 07:07:06 UTC 2025
On 23/04/2025 14:12, Christian König wrote:
> On 4/18/25 18:42, Tvrtko Ursulin wrote:
>> Hi all,
>>
>> Recently I mentioned to Danilo about some fence lifetime issues so here is a
>> rough series, more than anything intended to start the discussion.
>>
>> Most of the problem statement can be found in the first patch but to briefly
>> summarise - because sched fence can outlive the scheduler, we can trivially
>> engineer an use after free with xe and possibly other drivers. All that is
>> needed is to convert a syncobj into a sync file behind drivers back, and I don't
>> see what the driver can do about it.
>
>
> Yeah that topic again :) The problem here is that this is not a bug, it is a feature!
>
> IIRC it was Alex who pointed that issue out on the very first fence patch set, and we already discussed what to do back then.
>
> The problem with grabbing module references for fences is that you get trivially into circle references and so basically always preventing the module from unloading.
Where "always" is only "while there are active objects from that
module", no?
> The decision was made to postpone this and live with the potential use after free on module unload until somebody has time to fix it. Well that was +10 years ago :)
>
> I discussed this with Sima again last year and we came to the conclusion that the easiest way forward would be to decouple the dma_fence implementation from the driver or component issuing the fence.
>
> I then came up with the following steps to allow this:
> 1. Decouple the lock used for protecting the dma_fence callback list from the caller.
> 2. Stop calling enable_signaling with the lock held.
> 3. Nuke all those kmem_cache implementations and force drivers to always allocate fences using kvmalloc().
> 4. Nuke the release callback (or maybe move it directly after signaling) and set fence->ops to NULL after signaling the fence.
>
> I already send patches out for #1 and #2, but don't have enough time to actually finish the work.
>
> If you want take a look at nuking all those kmem_cache implementations for allocating the fence memory. I think that can be completed completely separate to everything else.
So enabling dma fence "revoke" so to say.
Just to check we are on the same page, it is not just about the module
references, but also use after frees which can happen even if module is
still loaded but any memory reachable via dma fence entry points has
been freed.
In that case, as Matt has already asked, if you could dig up your
unfinished work it would be interesting to see.
Regards,
Tvrtko
>> IGT that exploits the problem:
>> https://patchwork.freedesktop.org/patch/642709/?series=146211&rev=2
>>
>> Different flavour of the problem space is if we had a close(drm_fd) in that test
>> before the sleep. In that case we can even unload xe.ko and gpu-sched.ko for
>> even more fun. Last two patches in the series close that gap.
>>
>> But first two patches are just shrinking the race window. They are not proper
>> fixes. This is what I want to discuss since I understand reference counting all
>> the involved objects has been rejected in the past. And since the problem
>> probably expands to all dma fences it certainly isn't easy.
>>
>> To be clear once more - lets not focus on how this does not fix it fully - I am
>> primarily trying to start the conversation.
>>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Danilo Krummrich <dakr at kernel.org>
>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Philipp Stanner <phasta at kernel.org>
>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>
>> Tvrtko Ursulin (4):
>> sync_file: Weakly paper over one use-after-free resulting race
>> dma-fence: Slightly safer dma_fence_set_deadline
>> drm/sched: Keep module reference while there are active fences
>> drm/xe: Keep module reference while there are active fences
>>
>> drivers/dma-buf/dma-fence.c | 2 +-
>> drivers/dma-buf/sync_file.c | 29 ++++++++++++++++++++-----
>> drivers/gpu/drm/scheduler/sched_fence.c | 12 ++++++++--
>> drivers/gpu/drm/xe/xe_hw_fence.c | 13 ++++++++++-
>> 4 files changed, 47 insertions(+), 9 deletions(-)
>>
>
More information about the dri-devel
mailing list