[PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Guilherme G. Piccoli
gpiccoli at igalia.com
Wed Feb 1 16:53:44 UTC 2023
On 01/02/2023 13:21, Luben Tuikov wrote:
> Hi Guilherme,
>
> Since setting sched->ready to false, seems to be taking place in, directly amdgpu_ring_fini()
> and in amdgpu_fence_driver_sw_fini() indirectly as that function calls drm_sched_fini()
> which sets it to false, we seem to have two competing policies of,
> "set ready to false to show that _fini() was called, and set to false to disable IB submissions".
>
> To that effect, your patch is generally correct, as it would be the case of an early failure
> and unroll from (indirectly) amdgpu_device_init_schedulers().
>
> Please resubmit your patch but using .ops as Christian suggested, as .name is sufficient,
> but .ops is necessary.
>
> On a side-note: in the future we should probably discern between
> "this ring has an initialized and working scheduler" (looking up at DRM), from
> "this ring can take on IBs to send them down to the hardware" (looking down at hardware).
> Sched->ready seems to be overloaded with these disparate states, and this is why you need
> to use .ops to guard calling drm_sched_fini().
>
> Regards,
> Luben
Thanks a lot Luben, makes perfect sense!
Also, thanks for everyone that provided feedback here, very interesting
discussion.
Submitted V2:
https://lore.kernel.org/dri-devel/20230201164814.1353383-1-gpiccoli@igalia.com/
Cheers,
Guilherme
More information about the amd-gfx
mailing list