[PATCH 00/10] GART table recovery

Thu Aug 4 09:58:59 UTC 2016

Am 04.08.2016 um 05:35 schrieb zhoucm1:
>
>
> On 2016年08月03日 22:01, Christian König wrote:
>> Well patch #10 is incorrect. The SA BO will be set to NULL by 
>> amdgpu_sa_bo_free(), so it can't be freed twice and so you can't 
>> reference the fence twice.
> I see.
> But amdgpu_job_free_resources still shouldn't be called twice, right? 
> That's an obvious duplication although it seems no effect now. Is 
> there any other reason?

It's actually called from a couple of different locations:
1. From the CS path in amdgpu_cs.c as soon as we have a scheduler fence.
2. From the amdgpu_job_submit() path as soon as we have a scheduler fence.
3. From amdgpu_job_run() after submitting the job to the hardware ring.
4. From amdgpu_job_free(), this is for direct submissions or for freeing 
the job when something went wrong.

Thinking about it you could be right and we could probably drop the one 
in amdgpu_job_run(), because amdgpu_job_submit() should have already 
taken care of that. But I'm not 100% sure of that.

>
>>
>> Additional to that the whole approach here of restoring the GART from 
>> the backup using the SDMA won't work either. For the SDMA to work you 
>> need the GART to access the ring buffer.
>>
>> So you run into a chicken and egg problem here, for the ring buffer 
>> to work you need the GART and for the GART backup to work you need 
>> the ring buffer.
> Good catch, ring buffer is a GTT buffer as well.
>
> Then Can we use memcpy to copy GTT to VRAM? Fortunately, the GART bo 
> is only one bo.

Yeah that is what we did with radeon as well. Unfortunately the double 
housekeeping costs quite a bunch of memory.

And actually we have the exactly same information in the TTM MM as well, 
we would just need to bind all BOs again.

Give me a day or two to double check that. Might be that the solution is 
rather simple.

Regards,
Christian.

>
> Regards,
> David Zhou
>
>>
>> We should just restore the GART content from the housekeeping 
>> structure instead. Going to evaluate if and how that might be possible.
>
>>
>> Regards,
>> Christian.
>>
>> Am 02.08.2016 um 10:00 schrieb Chunming Zhou:
>>> gart table is stored in one bo which must be ready before gart init, 
>>> but the shadow bo must be created after gart is ready, so they 
>>> cannot be created at a same time. shado bo itself aslo is included 
>>> in gart table, So shadow bo needs a synchronization after device 
>>> init. After sync, the contents of bo and shadwo bo will be same, and 
>>> be updated at a same time. Then we will be able to recover gart 
>>> table from shadow bo when gpu full reset.
>>>
>>> patch10 is a fix for memory leak.
>>>
>>> Chunming Zhou (10):
>>>    drm/amdgpu: make need_backup generic
>>>    drm/amdgpu: implement gart late_init/fini
>>>    drm/amdgpu: add gart_late_init/fini to gmc V7/8
>>>    drm/amdgpu: abstract amdgpu_bo_create_shadow
>>>    drm/amdgpu: shadow gart table support
>>>    drm/amdgpu: make recover_bo_from_shadow be generic
>>>    drm/amdgpu: implement gart recovery
>>>    drm/amdgpu: recover gart table first when full reset
>>>    drm/amdgpu: sync gart table before initialization completed
>>>    drm/amdgpu: fix memory leak of sched fence
>>>
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 
>>> +++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
>>>   9 files changed, 304 insertions(+), 66 deletions(-)
>>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>