force app kill patch
Liu, Monk
Monk.Liu at amd.com
Wed Apr 18 07:20:19 UTC 2018
*Correctio for the scenario *
After we move fence_put(entity->last_sched) out of the fini_status check:
A potential race issue for the scenario:
1. Drm_sched_entity_fini(): it exit right after entity->job_queue empty, [ but that time scheduler is not fast enough to deal with this entity now ]
2. Drm_sched_entity_cleanup() : it call dma_fence_put(entity->last_scheduled) [ but this time entity->last_scheduled actually points to the fence prior to the real last one ]
3. Scheduler_main() now dealing with this entity: it call dma_fence_put(entity->last_scheduled) [ Now this fence get double put !!! ]
4. Scheduler_main() now call dma_fence_get() on the *real* last one !
So eventually the real last one fence triggers memory leak and more critical the double put fence cause NULL pointer access
/Monk
From: Liu, Monk
Sent: 2018年4月18日 15:11
To: Koenig, Christian <Christian.Koenig at amd.com>; Deng, Emily <Emily.Deng at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: force app kill patch
Hi Christian & Emily
I think the v4 fix for “fix force app kill hang” is still not good enough:
First:
See that in “sched_entity_fini”, we only call dma_fence_put(entity->last_scheduled” under the condition of “If (entity->fini_status)”, so
This way there is memory leak for the case of “entity->fini_stats ==0”
Second:
If we move dma_fence_put(entity->last_scheduled) out of the condition of “if (entity->fini_status)”, the memory leak issue can be fixed
But there will be kernel NULL pointer access, I think the time you call dma_fence_put(entity->last_scheduled”) may actually executed *not*
On the last scheduled fence of this entity, because it is run without “thread_park/unpark” pair which to make sure scheduler not dealing this entity
So with certain race issue, here is the scenario:
1. scheduler is doing the dma_fence_put() on the 1st fence,
2. scheduler set entity->last_scheduled to 1st fence
3. now sched_entity_fini() run, and it call dma_fence_put() on entity->last_scheduled
4. now this 1st fence is actually put double time and the real last fence won’t get put by expected
any idea?
/Monk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180418/c8b33ffe/attachment.html>
More information about the amd-gfx
mailing list