[PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

Guilherme G. Piccoli gpiccoli at igalia.com
Wed Feb 1 16:53:44 UTC 2023


On 01/02/2023 13:21, Luben Tuikov wrote:
> Hi Guilherme,
> 
> Since setting sched->ready to false, seems to be taking place in, directly amdgpu_ring_fini()
> and in amdgpu_fence_driver_sw_fini() indirectly as that function calls drm_sched_fini()
> which sets it to false, we seem to have two competing policies of,
> "set ready to false to show that _fini() was called, and set to false to disable IB submissions".
> 
> To that effect, your patch is generally correct, as it would be the case of an early failure
> and unroll from (indirectly) amdgpu_device_init_schedulers().
> 
> Please resubmit your patch but using .ops as Christian suggested, as .name is sufficient,
> but .ops is necessary.
> 
> On a side-note: in the future we should probably discern between
> "this ring has an initialized and working scheduler" (looking up at DRM), from
> "this ring can take on IBs to send them down to the hardware" (looking down at hardware).
> Sched->ready seems to be overloaded with these disparate states, and this is why you need
> to use .ops to guard calling drm_sched_fini().
> 
> Regards,
> Luben

Thanks a lot Luben, makes perfect sense!

Also, thanks for everyone that provided feedback here, very interesting
discussion.

Submitted V2:
https://lore.kernel.org/dri-devel/20230201164814.1353383-1-gpiccoli@igalia.com/
Cheers,


Guilherme


More information about the amd-gfx mailing list