答复: [PATCH 07/21] drm/amdgpu:fix gart table vram pin

Tue Feb 7 11:23:24 UTC 2017

no, we already make it work, and suspend is totally nonsense for SRIOV,

because when guest side need a GPU reset, hypervisor will do VF flr, so after vf flr amdgpu_suspen() is redundant at all.

those 20 patches serial is verified on KVM/XEN platform, it can recover back from guest hang.

________________________________
发件人: Christian König <deathsimple at vodafone.de>
发送时间: 2017年2月7日 19:12:04
收件人: Liu, Monk; amd-gfx at lists.freedesktop.org
主题: Re: [PATCH 07/21] drm/amdgpu:fix gart table vram pin

> Because gpu_srio_reset (will send patch for this routine later) doesn't call amdgpu_suspend()
That's most likely not a good idea.

Suspend and resume should always be paired, otherwise you run into
exactly those problems and the GART is most likely only the tip of the
iceberg here.

For example you also mess up the reference counting for buffer
containing the UVD and VCE firmware (ok that won't affect SRIOV for now).

Maybe you just want to call hw_init() instead of a resume here?

Regards,
Christian.

Am 06.02.2017 um 16:55 schrieb Liu, Monk:
> I recall why I made this patch
>
> When testing SRIOV gpu reset feature, I it will always waiting and not return if without this patch, with more look into it:
>
> Because gpu_srio_reset (will send patch for this routine later) doesn't call amdgpu_suspend(), so the gart table BO won't get unpin, which lead to driver infinite wait loop  if we pin it again in resume.
>
> For bare-metal case, gpu_reset will call amdgpu_suspend so the gart bo will unpin.
>
> BTW:
> GPU_SRIOV_RESET is invoked after HYPERVISOR call VF_FLR on this vf device, so all IP blocks's suspend routine is not needed at all.
>
> What about:
>>> +   if (adev->gart.table_addr && amdgpu_sriov_vf(adev)) {
>>> +           /* it's a resume call, gart already pin */
>>> +           return 0;
>>> +   }
>
> BR Monk
>
>
> -----Original Message-----
> From: Christian König [mailto:deathsimple at vodafone.de]
> Sent: Monday, February 06, 2017 10:31 PM
> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 07/21] drm/amdgpu:fix gart table vram pin
>
> Hui? We shouldn't need to call this function from a GPU reset, do we really do so?
>
> But even if we call it from GPU reset we certainly should have called the matching unpin function before.
>
> Otherwise we certainly won't be able to resume from the next suspend after a GPU reset.
>
> Regards,
> Christian.
>
> Am 06.02.2017 um 15:25 schrieb Liu, Monk:
>> Emmmm looks like I missed the part of S3 function
>>
>> But if this is from a GPU reset ,  we also shouldn't continue run this
>> function otherwise GPU reset will fail (SRIOV reset test)
>>
>> BR Monk
>>
>> -----Original Message-----
>> From: Christian König [mailto:deathsimple at vodafone.de]
>> Sent: Monday, February 06, 2017 4:14 PM
>> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH 07/21] drm/amdgpu:fix gart table vram pin
>>
>> A bug NAK on this! amdgpu_gart_table_vram_unpin() must be called during suspend.
>>
>> Otherwise the GART table can be corrupted and we run into a whole bunch of problems.
>>
>> We could add a "BUG_ON(adev->gart.table_addr != NULL);" here to double check that, but just ignoring that something went horrible wrong is clearly the wrong approach.
>>
>> Regards,
>> Christian.
>>
>> Am 04.02.2017 um 11:34 schrieb Monk Liu:
>>> if this call is from resume, shouldn't enter pin logic at all
>>>
>>> Change-Id: I40a5cdc2a716c4c20d2812fd74ece4ea284b6765
>>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 5 +++++
>>>     1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> index 964d2a9..5e907f7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> @@ -151,6 +151,11 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev)
>>>      uint64_t gpu_addr;
>>>      int r;
>>>
>>> +   if (adev->gart.table_addr) {
>>> +           /* it's a resume call, gart already pin */
>>> +           return 0;
>>> +   }
>>> +
>>>      r = amdgpu_bo_reserve(adev->gart.robj, false);
>>>      if (unlikely(r != 0))
>>>              return r;
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170207/37dd56c5/attachment-0001.html>