[PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name
Asahi Lina
lina at asahilina.net
Fri Jul 14 10:07:15 UTC 2023
On 14/07/2023 18.51, Christian König wrote:
> Am 14.07.23 um 11:44 schrieb Asahi Lina:
>> On 14/07/2023 17.43, Christian König wrote:
>>> Am 14.07.23 um 10:21 schrieb Asahi Lina:
>>>> A signaled scheduler fence can outlive its scheduler, since fences are
>>>> independencly reference counted. Therefore, we can't reference the
>>>> scheduler in the get_timeline_name() implementation.
>>>>
>>>> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
>>>> dma-bufs reference fences from GPU schedulers that no longer exist.
>>>>
>>>> Signed-off-by: Asahi Lina <lina at asahilina.net>
>>>> ---
>>>> drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
>>>> drivers/gpu/drm/scheduler/sched_fence.c | 4 +++-
>>>> include/drm/gpu_scheduler.h | 5 +++++
>>>> 3 files changed, 14 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index b2bbc8a68b30..17f35b0b005a 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -389,7 +389,12 @@ static bool
>>>> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>>> /*
>>>> * Fence is from the same scheduler, only need to wait for
>>>> - * it to be scheduled
>>>> + * it to be scheduled.
>>>> + *
>>>> + * Note: s_fence->sched could have been freed and reallocated
>>>> + * as another scheduler. This false positive case is okay,
>>>> as if
>>>> + * the old scheduler was freed all of its jobs must have
>>>> + * signaled their completion fences.
>>>
>>> This is outright nonsense. As long as an entity for a scheduler exists
>>> it is not allowed to free up this scheduler.
>>>
>>> So this function can't be called like this.
>>>
>>>> */
>>>> fence = dma_fence_get(&s_fence->scheduled);
>>>> dma_fence_put(entity->dependency);
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c
>>>> b/drivers/gpu/drm/scheduler/sched_fence.c
>>>> index ef120475e7c6..06a0eebcca10 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>> @@ -68,7 +68,7 @@ static const char
>>>> *drm_sched_fence_get_driver_name(struct dma_fence *fence)
>>>> static const char *drm_sched_fence_get_timeline_name(struct
>>>> dma_fence *f)
>>>> {
>>>> struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>> - return (const char *)fence->sched->name;
>>>> + return (const char *)fence->sched_name;
>>>> }
>>>> static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
>>>> @@ -216,6 +216,8 @@ void drm_sched_fence_init(struct drm_sched_fence
>>>> *fence,
>>>> unsigned seq;
>>>> fence->sched = entity->rq->sched;
>>>> + strlcpy(fence->sched_name, entity->rq->sched->name,
>>>> + sizeof(fence->sched_name));
>>>> seq = atomic_inc_return(&entity->fence_seq);
>>>> dma_fence_init(&fence->scheduled,
>>>> &drm_sched_fence_ops_scheduled,
>>>> &fence->lock, entity->fence_context, seq);
>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>> index e95b4837e5a3..4fa9523bd47d 100644
>>>> --- a/include/drm/gpu_scheduler.h
>>>> +++ b/include/drm/gpu_scheduler.h
>>>> @@ -305,6 +305,11 @@ struct drm_sched_fence {
>>>> * @lock: the lock used by the scheduled and the finished
>>>> fences.
>>>> */
>>>> spinlock_t lock;
>>>> + /**
>>>> + * @sched_name: the name of the scheduler that owns this
>>>> fence. We
>>>> + * keep a copy here since fences can outlive their scheduler.
>>>> + */
>>>> + char sched_name[16];
>>>
>>> This just mitigates the problem, but doesn't fix it.
>>
>> Could you point out any remaining issues so we can fix them? Right now
>> this absolutely *is* broken and this fixes the breakage I observed. If
>> there are other bugs remaining, I'd like to know what they are so I
>> can fix them.
>>
>>> The real issue is that the hw fence is kept around much longer than
>>> that.
>>
>> As far as I know, the whole point of scheduler fences is to decouple
>> the hw fences from the consumers.
>
> Well yes and no. The decoupling is for the signaling, it's not
> decoupling the lifetime.
When I spoke with Daniel I understood the intent was also to decouple
the lifetime.
>> I already talked with Daniel about this. The current behavior is
>> broken. These fences can live forever. It is broken to require that
>> they outlive the driver that produced them.
>>
>>> Additional to that I'm not willing to increase the scheduler fence size
>>> like that just to decouple them from the scheduler.
>>
>> Did you read my explanation on the cover letter as to how this is just
>> broken right now? We need to fix this. If you have a better suggestion
>> I'll take it. Doing nothing is not an option.
>
> Well this isn't broken at all. This works exactly like intended, you
> just want to use it for something it wasn't made for.
>
> That scheduler fences could be changed to outlive the scheduler which
> issued them is possible, but this is certainly a new requirement.
>
> Especially since we need to grab additional references to make sure that
> the module isn't unloaded in such a case.
Yes, that's a remaining issue. The fences need to grab a module
reference to make sure drm_sched doesn't get unloaded until they're all
really gone. I can add that in v2.
It would also be desirable to drop the hw fence as soon as it signals,
instead of keeping a reference to it forever.
~~ Lina
More information about the dri-devel
mailing list