[PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

Xu, Feifei Feifei.Xu at amd.com
Tue Oct 10 09:25:02 UTC 2023


[AMD Official Use Only - General]

If gfx is not power on, both check will return ahead. The logic will not change.
If gfx is power on early in resume, tlb flush in the IP specific (gmc v11) callback will never be called because it returned ahead in the higher level check in amdgpu_gmc_flush_gpu_tlb() :

if (!down_read_trylock(&adev->reset_domain->sem) && //--> true in gfx11
+!adev->gfx.is_poweron)         //--> (!adev->gfx.is_poweron) = false in gfx11, and the whole if statement will be false, not return ahead.

Thanks,
Feifei

-----Original Message-----
From: Xu, Feifei
Sent: Tuesday, October 10, 2023 10:28 AM
To: Zhang, Hawking <Hawking.Zhang at amd.com>; Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Koenig, Christian <Christian.Koenig at amd.com>
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

Yes, adev->gfx.is_poweron check will be applied in gmc_v11 callback, which will be called after the generic gmc part: amdgpu_gmc_flush_gpu_tlb() function.
But in commit: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb v2"), the flush is moved at a higher level amdgpu_gmc_flush_gpu_tlb() function.

Thus the gmc_v11 callback will never be called in the resume because adev->reset_domain->sem not released and returned ahead. Adding a check of adev->gfx.is_poweron will let GFX11 not breaking ahead, like following:

if (!down_read_trylock(&adev->reset_domain->sem) && //--> true in gfx11
+!adev->gfx.is_poweron)         //-->false in gfx11, and the whole if statement will be false, not return ahead. The following gmc v11 callback will be executed later.

Thanks,
Feifei

-----Original Message-----
From: Zhang, Hawking <Hawking.Zhang at amd.com>
Sent: Monday, October 9, 2023 4:58 PM
To: Xu, Feifei <Feifei.Xu at amd.com>; Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Koenig, Christian <Christian.Koenig at amd.com>
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

[AMD Official Use Only - General]

adev->gfx.is_poweron check should already be applied in IP specific (gmc v11) callback. If gfx is not power on, it does nothing but just returns. I didn't see how it helps resolve the issue if we just move the check from one function to another.

Regards,
Hawking

-----Original Message-----
From: Xu, Feifei <Feifei.Xu at amd.com>
Sent: Monday, October 9, 2023 09:51
To: Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Koenig, Christian <Christian.Koenig at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

[AMD Official Use Only - General]

Hi,

>> Based on your description, the above code should use "||" instead of
>> "&&",
&& is to add more restriction here.  To avoid skipping necessary TLB flush by return.
For Asics < GFX11, !adev->gfx.is_poweron is always true (this paremeter is intrudoced from GFX11), only depends on reset_domain->sem; For Asics = GFX11, !adev->gfx.is_poweron might be false (which gfx might already poweron in the reset), this will make the if () not ture, return will not be executed, thus flush TLB.

>> And after merging code into one line may result in the lock not being released if the lock can be acquired success.
If !adev->gfx.is_poweron is true, the reset_domin->sem will not be down_read_trylock, thus could avoid this deadlock.

Thanks,
Feifei

-----Original Message-----
From: Wang, Yang(Kevin) <KevinYang.Wang at amd.com>
Sent: Sunday, October 8, 2023 9:36 PM
To: Xu, Feifei <Feifei.Xu at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Xu, Feifei <Feifei.Xu at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb


-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Feifei Xu
Sent: Sunday, October 8, 2023 6:07 PM
To: amd-gfx at lists.freedesktop.org
Cc: Xu, Feifei <Feifei.Xu at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

To fix the gpu recovery failed on GFX11 with gfxhub pagefault, flush gpu tlb after reset on GFX11.
Gfxhub tlb flush need check if adev->gfx.is_poweron set.

Fixes: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb v2")

Signed-off-by: Feifei Xu <Feifei.Xu at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 2f9bb86edd71..048d32edee88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -611,8 +611,9 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
                /*
                 * A GPU reset should flush all TLBs anyway, so no need to do
                 * this while one is ongoing.
+                * For GFX11, gfxhub flush check if adev->gfx.is_poweron is set.
                 */
-               if (!down_read_trylock(&adev->reset_domain->sem))
+               if (!down_read_trylock(&adev->reset_domain->sem) &&
+!adev->gfx.is_poweron)
                        return;

[Kevin]:
Based on your description, the above code should use "||" instead of "&&", And after merging code into one line may result in the lock not being released if the lock can be acquired success.

Best Regards,
Kevin

                if (adev->gmc.flush_tlb_needs_extra_type_2)
--
2.34.1





More information about the amd-gfx mailing list