[PATCH 3/5] drm/sched: Warn if pending list is not empty

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Tue Apr 22 13:25:52 UTC 2025


On 22/04/2025 13:00, Philipp Stanner wrote:
> On Tue, 2025-04-22 at 13:13 +0200, Danilo Krummrich wrote:
>> On Tue, Apr 22, 2025 at 11:39:11AM +0100, Tvrtko Ursulin wrote:
>>> Question I raised is if there are other drivers which manage to
>>> clean up
>>> everything correctly (like the mock scheduler does), but trigger
>>> that
>>> warning. Maybe there are not and maybe mock scheduler is the only
>>> false
>>> positive.
> 
> For clarification:
> 
> I messed up the comment from the cover letter.
> 
> What I did was run the *old* unit tests (v5 IIRC) from Tvrtko that
> still had the memory leaks. Those then trigger the warning print, as is
> expected, since they don't provide fence_context_kill().
> 
> The current unit tests are fine memory-leak-wise.
> 
> IOW, both with Nouveau and the unit tests, everything behaves as
> expected, without issues.

Hmm how? Pending list can be non-empty so it should be possible to hit 
that warn.

Regards,

Tvrtko

>> So far the scheduler simply does not give any guideline on how to
>> address the
>> problem, hence every driver simply does something (or nothing,
>> effectively
>> ignoring the problem). This is what we want to fix.
>>
>> The mock scheduler keeps it's own list of pending jobs and on tear
>> down stops
>> the scheduler's workqueues, traverses it's own list and eventually
>> frees the
>> pending jobs without updating the scheduler's internal pending list.
>>
>> So yes, it does avoid memory leaks, but it also leaves the schedulers
>> internal
>> structures with an invalid state, i.e. the pending list of the
>> scheduler has
>> pointers to already freed memory.
>>
>> What if the drm_sched_fini() starts touching the pending list? Then
>> you'd end up
>> with UAF bugs with this implementation. We cannot invalidate the
>> schedulers
>> internal structures and yet call scheduler functions - e.g.
>> drm_sched_fini() -
>> subsequently.
>>
>> Hence, the current implementation of the mock scheduler is
>> fundamentally flawed.
>> And so would be *every* driver that still has entries within the
>> scheduler's
>> pending list.
>>
>> This is not a false positive, it already caught a real bug -- in the
>> mock
>> scheduler.
>>
>> - Danilo
> 



More information about the dri-devel mailing list