[Bug 109692] deadlock occurs during GPU reset
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Thu Feb 21 17:23:00 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=109692
Andrey Grodzovsky <andrey.grodzovsky at amd.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |christian.koenig at amd.com
--- Comment #9 from Andrey Grodzovsky <andrey.grodzovsky at amd.com> ---
Adding Christian to consult on solution to the problem:
The deadlock happens because we lock fence->lock from dma_fence_signal followed
by sched->job_list_lock from drm_sched_process_job on one hand while on the
other do reverse order locking in drm_sched_stop, locking first
sched->job_list_lock and then fence->lock when trying to remove the cb from
dma_fence_remove_callback.
I see 2 possible solutions -
1) Change ring_mirror_list to lock-less list (e.g. RCU list), problem is I
don't think there is a readily available implementation of doubly linked
lockless list, I didn't do to much search though...
2) Go back to removing a job from ring_mirror_list in drm_sched_job_finish -
as I see it for this to work we need to add a wait_queue (signal) to
drm_sched_job which will signal AFTER job is removed from ring_mirror_list in
drm_sched_job_finish, then in drm_sched_stop when iterating over
ring_mirror_list you build a new list of all jobs in progress (for them
dma_fence_remove_callback returns false) and then you wait on all theirs
wait_queues - then you are sure they removed themselves from ring_mirror_list
and you can proceed.
Specific to AMDGPU - To insure the jobs are not being freed in
drm_sched_job_finish while we process them in drm_sched_stop it's not enough to
wait to do cancel_delayed_work_sync(&sched->work_tdr) for this scheduler but
also for all the other schedulers in the device and even in the hive for XGMI
use case.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190221/dd8820b7/attachment.html>
More information about the dri-devel
mailing list