[PATCH 3/5] drm/sched: Warn if pending list is not empty
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Tue Apr 22 12:07:47 UTC 2025
On 22/04/2025 12:13, Danilo Krummrich wrote:
> On Tue, Apr 22, 2025 at 11:39:11AM +0100, Tvrtko Ursulin wrote:
>> Question I raised is if there are other drivers which manage to clean up
>> everything correctly (like the mock scheduler does), but trigger that
>> warning. Maybe there are not and maybe mock scheduler is the only false
>> positive.
>
> So far the scheduler simply does not give any guideline on how to address the
> problem, hence every driver simply does something (or nothing, effectively
> ignoring the problem). This is what we want to fix.
>
> The mock scheduler keeps it's own list of pending jobs and on tear down stops
> the scheduler's workqueues, traverses it's own list and eventually frees the
> pending jobs without updating the scheduler's internal pending list.
>
> So yes, it does avoid memory leaks, but it also leaves the schedulers internal
> structures with an invalid state, i.e. the pending list of the scheduler has
> pointers to already freed memory.
>
> What if the drm_sched_fini() starts touching the pending list? Then you'd end up
> with UAF bugs with this implementation. We cannot invalidate the schedulers
> internal structures and yet call scheduler functions - e.g. drm_sched_fini() -
> subsequently.
>
> Hence, the current implementation of the mock scheduler is fundamentally flawed.
> And so would be *every* driver that still has entries within the scheduler's
> pending list.
>
> This is not a false positive, it already caught a real bug -- in the mock
> scheduler.
To avoid furher splitting hairs on whether real bugs need to be able to
manifest or not, lets move past this with a conclusion that there are
two potential things to do here:
First one is to either send separately or include in this series
something like:
diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
index f999c8859cf7..7c4df0e890ac 100644
--- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
+++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
@@ -300,6 +300,8 @@ void drm_mock_sched_fini(struct drm_mock_scheduler
*sched)
drm_mock_sched_job_complete(job);
spin_unlock_irqrestore(&sched->lock, flags);
+ drm_sched_fini(&sched->base);
+
/*
* Free completed jobs and jobs not yet processed by the DRM
scheduler
* free worker.
@@ -311,8 +313,6 @@ void drm_mock_sched_fini(struct drm_mock_scheduler
*sched)
list_for_each_entry_safe(job, next, &list, link)
mock_sched_free_job(&job->base);
-
- drm_sched_fini(&sched->base);
}
/**
That should satisfy the requirement to "clear" memory about to be freed
and be 100% compliant with drm_sched_fini() kerneldoc (guideline b).
But the new warning from 3/5 here will still be there AFAICT and would
you then agree it is a false positive?
Secondly, the series should modify all drivers (including the unit
tests) which are known to trigger this false positive.
Regards,
Tvrtko
More information about the dri-devel
mailing list