[PATCH 3/7] drm/amdgpu: resources will freed in job_free

Christian König deathsimple at vodafone.de
Wed Jun 29 09:54:05 UTC 2016


Am 29.06.2016 um 11:33 schrieb zhoucm1:
>
>
> On 2016年06月29日 17:30, Christian König wrote:
>> Am 29.06.2016 um 11:12 schrieb zhoucm1:
>>>
>>>
>>> On 2016年06月29日 16:54, Christian König wrote:
>>>> Am 29.06.2016 um 10:09 schrieb Chunming Zhou:
>>>>> We will re-submit jobs to recovery hw ring after gpu reset.
>>>>>
>>>>> Change-Id: I0f99bd14673ce0e0dbb7b3b6c2b050245824b9ca
>>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>>>
>>>> NAK, this can lead to a deadlock in the SA because it isn't 
>>>> informed any more about the fence protecting the IBs.
>>>
>>> I didn't get your means, IBs is using, just delay to free, where 
>>> does deadlock come from?
>>
>> We use the IB test after the GPU reset to test if the reset was 
>> successfully. This allocates some space for each IB using the SA.
>>
>> If the resources in the SA aren't freed it will wait for that to 
>> happen and so wait for the hung task to continue.
>
> Why does IB test to allocate SA bo needs to wait for other IB SA bo? 
> Seems have nothing to do with it.

See the SA code. When it is out of memory it will wait for fences in the 
hashtable to become available, if that still doesn't work it will wait 
for allocations without assigned fences to become available.

That shouldn't be much of a problem unless we free the resources as 
early as possible, but when we wait for the submissions to complete we 
will clearly run into problems.

Since we use the scheduler fence for the SA protection we should 
probably move freeing the resource to submitting the job to the scheduler.

Regards,
Christian.

>
>>
>>>
>>>>
>>>> Should also not be necessary if you just want to resubmit the IBs.
>>>
>>> How to do that if needing to resubmit?
>>
>> Just resubmit that job, that should work fine.
> I will try it.
>
> Thanks,
> David Zhou
>>
>> Christian.
>>
>>>
>>> Thanks,
>>> David Zhou
>>>>
>>>> Christian.
>>>>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +--
>>>>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>> index b50a845..83771c1 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>> @@ -98,7 +98,7 @@ static void amdgpu_job_free_resources(struct 
>>>>> amdgpu_job *job)
>>>>>   void amdgpu_job_free_cb(struct amd_sched_job *s_job)
>>>>>   {
>>>>>       struct amdgpu_job *job = container_of(s_job, struct 
>>>>> amdgpu_job, base);
>>>>> -
>>>>> +    amdgpu_job_free_resources(job);
>>>>>       kfree(job);
>>>>>   }
>>>>>   @@ -178,7 +178,6 @@ static struct fence *amdgpu_job_run(struct 
>>>>> amd_sched_job *sched_job)
>>>>>     err:
>>>>>       job->fence = fence;
>>>>> -    amdgpu_job_free_resources(job);
>>>>>       return fence;
>>>>>   }
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
>



More information about the amd-gfx mailing list