lima_bo memory leak after drm_sched job destruction rework
nunes.erico at gmail.com
Fri May 17 19:35:11 UTC 2019
I have recently observed a memory leak issue with lima using
drm-misc-next, which I initially reported here:
It is an easily reproduceable memory leak which I was able to bisect to commit:
5918045c4ed4 drm/scheduler: rework job destruction
After some investigation, it seems that after the refactor,
sched->ops->free_job (in lima: lima_sched_free_job) is no longer
With some more debugging I found that it is not being called because
the job freeing is now in drm_sched_cleanup_jobs, which for lima
always aborts in the initial "Don't destroy jobs while the timeout
worker is running" condition.
Lima currently defaults to an "infinite" timeout. Setting a 500ms
default timeout like most other drm_sched users do fixed the leak for
I can send a patch to set a 500ms timeout and have it probably working
again, but I am wondering now if this is expected behaviour for
drm_sched after the refactor.
In particular I also noticed that drm_sched_suspend_timeout is not
called anywhere. Is it expected that we now rely on a timeout
parameter to cleanup jobs that ran successfully?
More information about the amd-gfx