[PATCH 0/6] Allow to extend the timeout without jobs disappearing

Luben Tuikov luben.tuikov at amd.com
Wed Nov 25 03:17:02 UTC 2020


Hi guys,

This series of patches implements a pending list for
jobs which are in the hardware, and a done list for
tasks which are done and need to be freed.

It implements a second thread, dedicated to freeing
tasks from the done list. The main scheduler thread no
longer frees (cleans up) done tasks by polling the head
of the pending list (drm_sched_get_cleanup_task() is
now gone)--it only pushes tasks down to the GPU. As
tasks complete and call their DRM callback, their
fences are signalled and tasks are queued to the done
list and the done thread woken up to free them. This
can take place concurrently with the main scheduler
thread pushing tasks down to the GPU.

When a task times out, the timeout function prototype
now is made to return a value back to DRM. The reason
for this is that the GPU driver has intimate knowledge
of the hardware and can pass back information to DRM on
what to do. Whether to attempt to abort the task (by
say calling a driver abort function, etc., as the
implementation dictates), or whether the task needs
more time. Note that the task is not moved away from
the pending list, unless it is no longer in the GPU.
(The pending list holds tasks which are pending from
DRM's point of view, i.e. the GPU has control over
them--that could be things like DMA is active, CU's are
active, for the task, etc.)

The idea really is that what DRM wants to know is
whether the task is in the GPU or not. So now
drm_sched_backend_ops::timedout_job() returns
0 of the task is no longer with the GPU, or 1
if the task needs more time.

Tested up to patch 5. Running with patch 6 seems to
make X/GDM just sleep, and I'm looking into this now.

This series applies to drm-misc-next.

Luben Tuikov (6):
  drm/scheduler: "node" --> "list"
  gpu/drm: ring_mirror_list --> pending_list
  drm/scheduler: Job timeout handler returns status
  drm/scheduler: Essentialize the job done callback
  drm/amdgpu: Don't hardcode thread name length
  drm/sched: Make use of a "done" thread

 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |   2 +-
 drivers/gpu/drm/scheduler/sched_main.c      | 275 ++++++++++----------
 include/drm/gpu_scheduler.h                 |  43 ++-
 6 files changed, 186 insertions(+), 152 deletions(-)

-- 
2.29.2.154.g7f7ebe054a



More information about the dri-devel mailing list