[PATCH 00/10] GART table recovery

Thu Aug 18 09:03:06 UTC 2016

Am 18.08.2016 um 10:50 schrieb zhoucm1:
>
>
> On 2016年08月04日 17:58, Christian König wrote:
>> Am 04.08.2016 um 05:35 schrieb zhoucm1:
>>>
>>>
>>> On 2016年08月03日 22:01, Christian König wrote:
>>>> Well patch #10 is incorrect. The SA BO will be set to NULL by 
>>>> amdgpu_sa_bo_free(), so it can't be freed twice and so you can't 
>>>> reference the fence twice.
>>> I see.
>>> But amdgpu_job_free_resources still shouldn't be called twice, 
>>> right? That's an obvious duplication although it seems no effect 
>>> now. Is there any other reason?
>>
>> It's actually called from a couple of different locations:
>> 1. From the CS path in amdgpu_cs.c as soon as we have a scheduler fence.
>> 2. From the amdgpu_job_submit() path as soon as we have a scheduler 
>> fence.
>> 3. From amdgpu_job_run() after submitting the job to the hardware ring.
>> 4. From amdgpu_job_free(), this is for direct submissions or for 
>> freeing the job when something went wrong.
>>
>> Thinking about it you could be right and we could probably drop the 
>> one in amdgpu_job_run(), because amdgpu_job_submit() should have 
>> already taken care of that. But I'm not 100% sure of that.
>>
>>>
>>>>
>>>> Additional to that the whole approach here of restoring the GART 
>>>> from the backup using the SDMA won't work either. For the SDMA to 
>>>> work you need the GART to access the ring buffer.
>>>>
>>>> So you run into a chicken and egg problem here, for the ring buffer 
>>>> to work you need the GART and for the GART backup to work you need 
>>>> the ring buffer.
>>> Good catch, ring buffer is a GTT buffer as well.
>>>
>>> Then Can we use memcpy to copy GTT to VRAM? Fortunately, the GART bo 
>>> is only one bo.
>>
>> Yeah that is what we did with radeon as well. Unfortunately the 
>> double housekeeping costs quite a bunch of memory.
>>
>> And actually we have the exactly same information in the TTM MM as 
>> well, we would just need to bind all BOs again.
>>
>> Give me a day or two to double check that. Might be that the solution 
>> is rather simple.
> How about this? Do you have any better idea for it?

Sorry for not answering earlier, but you know humans have only one head 
and two hands to type :)

The informations needed to restore the GART is already stored in the 
amdgpu_ttm_tt structures.

We just need to link them together in amdgpu_ttm_backend_bind() and 
unlink them in amdgpu_ttm_backend_unbind() to be able to restore the 
GART table after a reset.

> or just change ring buffer to VRAM bo?

Interesting idea, but considering how much overhead it has to write to 
VRAM from the CPU it clearly wouldn't be such a good idea to do in general.

Regards,
Christian.

>
> Regards,
> David Zhou
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>> David Zhou
>>>
>>>>
>>>> We should just restore the GART content from the housekeeping 
>>>> structure instead. Going to evaluate if and how that might be 
>>>> possible.
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 02.08.2016 um 10:00 schrieb Chunming Zhou:
>>>>> gart table is stored in one bo which must be ready before gart 
>>>>> init, but the shadow bo must be created after gart is ready, so 
>>>>> they cannot be created at a same time. shado bo itself aslo is 
>>>>> included in gart table, So shadow bo needs a synchronization after 
>>>>> device init. After sync, the contents of bo and shadwo bo will be 
>>>>> same, and be updated at a same time. Then we will be able to 
>>>>> recover gart table from shadow bo when gpu full reset.
>>>>>
>>>>> patch10 is a fix for memory leak.
>>>>>
>>>>> Chunming Zhou (10):
>>>>>    drm/amdgpu: make need_backup generic
>>>>>    drm/amdgpu: implement gart late_init/fini
>>>>>    drm/amdgpu: add gart_late_init/fini to gmc V7/8
>>>>>    drm/amdgpu: abstract amdgpu_bo_create_shadow
>>>>>    drm/amdgpu: shadow gart table support
>>>>>    drm/amdgpu: make recover_bo_from_shadow be generic
>>>>>    drm/amdgpu: implement gart recovery
>>>>>    drm/amdgpu: recover gart table first when full reset
>>>>>    drm/amdgpu: sync gart table before initialization completed
>>>>>    drm/amdgpu: fix memory leak of sched fence
>>>>>
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 
>>>>> +++++++++++++++++++++++++++++
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
>>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
>>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
>>>>>   9 files changed, 304 insertions(+), 66 deletions(-)
>>>>>
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx