[PATCH v2 4/9] drm/sched: Split free_job into own work item

Tue Aug 29 01:20:37 UTC 2023

On 8/28/23 20:41, Matthew Brost wrote:
> On Mon, Aug 28, 2023 at 08:04:31PM +0200, Danilo Krummrich wrote:
>> On 8/11/23 04:31, Matthew Brost wrote:
>>> Rather than call free_job and run_job in same work item have a dedicated
>>> work item for each. This aligns with the design and intended use of work
>>> queues.
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 137 ++++++++++++++++++-------
>>>    include/drm/gpu_scheduler.h            |   8 +-
>>>    2 files changed, 106 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index cede47afc800..b67469eac179 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -1275,7 +1338,8 @@ EXPORT_SYMBOL(drm_sched_submit_ready);
>>>    void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)
>>
>> I was wondering what the scheduler teardown sequence looks like for
>> DRM_SCHED_POLICY_SINGLE_ENTITY and how XE does that.
>>
>> In Nouveau, userspace can ask the kernel to create a channel (or multiple),
>> where each channel represents a ring feeding the firmware scheduler. Userspace
>> can forcefully close channels via either a dedicated IOCTL or by just closing
>> the FD which subsequently closes all channels opened through this FD.
>>
>> When this happens the scheduler needs to be teared down. Without keeping track of
>> things in a driver specific way, the only thing I could really come up with is the
>> following.
>>
>> /* Make sure no more jobs are fetched from the entity. */
>> drm_sched_submit_stop();
>>
>> /* Wait for the channel to be idle, namely jobs in flight to complete. */
>> nouveau_channel_idle();
>>
>> /* Stop the scheduler to free jobs from the pending_list. Ring must be idle at this
>>   * point, otherwise me might leak jobs. Feels more like a workaround to free
>>   * finished jobs.
>>   */
>> drm_sched_stop();
>>
>> /* Free jobs from the entity queue. */
>> drm_sched_entity_fini();
>>
>> /* Probably not even needed in this case. */
>> drm_sched_fini();
>>
>> This doesn't look very straightforward though. I wonder if other drivers feeding
>> firmware schedulers have similar cases. Maybe something like drm_sched_teardown(),
>> which would stop job submission, wait for pending jobs to finish and subsequently
>> free them up would makes sense?
>>
> 
> exec queue == gpu scheduler + entity in Xe
> 
> We kinda invented our own flow with reference counting + use the TDR for
> cleanup.

Thanks for the detailed explanation. In case of making it driver specific
I thought about something similar, pretty much the same reference counting,
but instead of the TDR, let jobs from the entity just return -ECANCELED from
job_run() and also signal pending jobs with the same error code.

On the other hand, I don't really want scheduler and job structures to
potentially outlive the channel. Which is where I think I'd be nice to avoid
consuming all the queued up jobs from the entity in the first place, stop the
schdeduler with drm_sched_submit_stop(), signal all pending_jobs with
-ECANCELED and call the free_job() callbacks right away.

The latter I could probably do in Nouveau as well, however, it kinda feels
wrong to do all that within the driver.

Also, I was wondering how existing drivers using the GPU scheduler handle
that. It seems like they just rely on the pending_list of the scheduler being
empty once drm_sched_fini() is called. Admittedly, that's pretty likely (never
to happen) since it's typically called on driver remove, but I don't see how
that's actually ensured. Am I missing something?
> 
> We have a creation ref for the exec queue plus each job takes a ref to
> the exec queue. On exec queue close [1][2] (whether that be IOCTL or FD
> close) we drop the creation reference and call a vfunc for killing thr
> exec queue. The firmware implementation is here [3].
> 
> If you read through it just sets the TDR to the minimum value [4], the
> TDR will kick any running jobs the off the hardware, signals the jobs
> fences, any jobs waiting on dependencies eventually flush out via
> run_job + TDR for cleanup without going on the hardware, the exec queue
> reference count goes to zero once all jobs are flushed out, we trigger
> the exec queue clean up flow and finally free all memory for the exec
> queue.
> 
> Using the TDR in this way is how we teardown an exec queue for other
> reasons too (user page fault, user job times out, user job hang detected
> by firmware, device reset, etc...).
> 
> This all works rather nicely and is a single code path for all of these
> cases. I'm no sure if this can be made any more generic nor do I really
> see the need too (at least I don't see Xe needing a generic solution).
> 
> Hope this helps,
> Matt
> 
> [1] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_exec_queue.c#L911
> [2] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_device.c#L77
> [3] https://gitlab.freedesktop.org/drm/xe/kernel/-/tree/drm-xe-next/drivers/gpu/drm/xe#L1184
> [4] https://gitlab.freedesktop.org/drm/xe/kernel/-/tree/drm-xe-next/drivers/gpu/drm/xe#L789
> 
>> - Danilo
>>