[PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

Mon Oct 2 16:00:36 UTC 2017

Hi Nicolai,

Ping?  :-)

Tom

On 28/09/17 11:01 AM, Christian König wrote:
> Am 28.09.2017 um 16:55 schrieb Nicolai Hähnle:
>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>
>> Highly concurrent Piglit runs can trigger a race condition where a 
>> pending
>> SDMA job on a buffer object is never executed because the corresponding
>> process is killed (perhaps due to a crash). Since the job's fences were
>> never signaled, the buffer object was effectively leaked. Worse, the
>> buffer was stuck wherever it happened to be at the time, possibly in 
>> VRAM.
>>
>> The symptom was user space processes stuck in interruptible waits with
>> kernel stacks like:
>>
>>      [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
>>      [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
>>      [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
>>      [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
>>      [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
>>      [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
>>      [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
>>      [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
>>      [<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 
>> [amdgpu]
>>      [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
>>      [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
>>      [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
>>      [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
>>      [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
>>      [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
>>      [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
>>      [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
>>      [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
>> Acked-by: Christian König <christian.koenig at amd.com>
>> ---
>>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>> index 54eb77cffd9b..32a99e980d78 100644
>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>> @@ -220,22 +220,27 @@ void amd_sched_entity_fini(struct 
>> amd_gpu_scheduler *sched,
>>                       amd_sched_entity_is_idle(entity));
>>       amd_sched_rq_remove_entity(rq, entity);
>>       if (r) {
>>           struct amd_sched_job *job;
>>           /* Park the kernel for a moment to make sure it isn't 
>> processing
>>            * our enity.
>>            */
>>           kthread_park(sched->thread);
>>           kthread_unpark(sched->thread);
>> -        while (kfifo_out(&entity->job_queue, &job, sizeof(job)))
>> +        while (kfifo_out(&entity->job_queue, &job, sizeof(job))) {
>> +            struct amd_sched_fence *s_fence = job->s_fence;
>> +            amd_sched_fence_scheduled(s_fence);
> 
> It would be really nice to have an error code set on s_fence->finished 
> before it is signaled, use dma_fence_set_error() for this.
> 
> Additional to that it would be nice to note in the subject line that 
> this is a rather important bug fix.
> 
> With that fixed the whole series is Reviewed-by: Christian König 
> <christian.koenig at amd.com>.
> 
> Regards,
> Christian.
> 
>> +            amd_sched_fence_finished(s_fence);
>> +            dma_fence_put(&s_fence->finished);
>>               sched->ops->free_job(job);
>> +        }
>>       }
>>       kfifo_free(&entity->job_queue);
>>   }
>>   static void amd_sched_entity_wakeup(struct dma_fence *f, struct 
>> dma_fence_cb *cb)
>>   {
>>       struct amd_sched_entity *entity =
>>           container_of(cb, struct amd_sched_entity, cb);
>>       entity->dependency = NULL;
> 
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx