lima_bo memory leak after drm_sched job destruction rework

Fri May 17 19:35:11 UTC 2019

Hello,

I have recently observed a memory leak issue with lima using
drm-misc-next, which I initially reported here:
https://gitlab.freedesktop.org/lima/linux/issues/24
It is an easily reproduceable memory leak which I was able to bisect to commit:

5918045c4ed4 drm/scheduler: rework job destruction

After some investigation, it seems that after the refactor,
sched->ops->free_job (in lima: lima_sched_free_job) is no longer
called.
With some more debugging I found that it is not being called because
the job freeing is now in drm_sched_cleanup_jobs, which for lima
always aborts in the initial "Don't destroy jobs while the timeout
worker is running" condition.

Lima currently defaults to an "infinite" timeout. Setting a 500ms
default timeout like most other drm_sched users do fixed the leak for
me.

I can send a patch to set a 500ms timeout and have it probably working
again, but I am wondering now if this is expected behaviour for
drm_sched after the refactor.
In particular I also noticed that drm_sched_suspend_timeout is not
called anywhere. Is it expected that we now rely on a timeout
parameter to cleanup jobs that ran successfully?

Erico