[PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
Zhu, James
James.Zhu at amd.com
Wed Nov 3 15:54:45 UTC 2021
[AMD Official Use Only]
Hi Alex,
The following two patches were introduced for stable at vger.kernel.org
714d9e4 drm/amdgpu: init iommu after amdkfd device init
f02abeb drm/amdgpu: move iommu_resume before ip init/resume
after commit 970eae15600a883e4ad27dd0757b18871cc983ab
Merge: 27f4432 3906fe9 BackMerge tag 'v5.15-rc7' into drm-next,
It became redundant and overwrote afd1818.
I saw that you just submit (afd1818) "[PATCH] drm/amdkfd: fix boot failure when iommu is disabled in Picasso" to stable at vger.kernel.org.
I checked that if we re-applied afd1818 on current drm-next, it did the same thing as my patch after auto-merged.
I am wondering if BackMerge stable into drm-next in the future will correct current break.
For the above situation, I am not sure what is the proper way to fix this break.
Please let me know your final decision with all these information.
Thanks & Best Regards!
James Zhu
________________________________
From: Alex Deucher <alexdeucher at gmail.com>
Sent: Wednesday, November 3, 2021 11:03 AM
To: Zhu, James <James.Zhu at amd.com>
Cc: amd-gfx list <amd-gfx at lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Yifan <Yifan1.Zhang at amd.com>; James Zhu <jzhums at gmail.com>; Ken Moffat <zarniwhoop at ntlworld.com>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
Reverting 714d9e4 and f02abeb results in this diff which is more than this patch does. Is that correct or should I just use your patch?
Alex
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e56bc925afcf..70540712ff2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2360,6 +2360,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
if (r)
goto init_failed;
+ r = amdgpu_amdkfd_resume_iommu(adev);
+ if (r)
+ goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2398,10 +2402,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);
- r = amdgpu_amdkfd_resume_iommu(adev);
- if (r)
- goto init_failed;
-
amdgpu_fru_get_product_info(adev);
init_failed:
@@ -3119,10 +3119,6 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
{
int r;
- r = amdgpu_amdkfd_resume_iommu(adev);
- if (r)
- return r;
-
r = amdgpu_device_ip_resume_phase1(adev);
if (r)
return r;
@@ -4595,10 +4591,6 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
dev_warn(tmp_adev->dev, "asic atom init failed!");
} else {
dev_info(tmp_adev->dev, "GPU reset succeeded, trying to resume\n");
- r = amdgpu_amdkfd_resume_iommu(tmp_adev);
- if (r)
- goto out;
-
r = amdgpu_device_ip_resume_phase1(tmp_adev);
if (r)
goto out;
On Wed, Nov 3, 2021 at 10:50 AM Alex Deucher <alexdeucher at gmail.com<mailto:alexdeucher at gmail.com>> wrote:
On Wed, Nov 3, 2021 at 10:34 AM Zhu, James <James.Zhu at amd.com<mailto:James.Zhu at amd.com>> wrote:
[AMD Official Use Only]
Hi Alex,
Finally figured out the root cause for this broken,
Linux 5.14.15 + afd1818 can fix the issue.
I'll do that for stable.
Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again.
714d9e4 drm/amdgpu: init iommu after amdkfd device init
f02abeb drm/amdgpu: move iommu_resume before ip init/resume
afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.
286826d drm/amdgpu: init iommu after amdkfd device init
9cec53c drm/amdgpu: move iommu_resume before ip init/resume
[cid:17ce6464fcfcb971f161]
So, do we just discard this patch, and revert 714d9e4 and f02abeb?
I'll do that for 5.15+
Thanks for sorting this out.
Alex
Thanks & Best Regards!
James Zhu
________________________________
From: Alex Deucher <alexdeucher at gmail.com<mailto:alexdeucher at gmail.com>>
Sent: Tuesday, November 2, 2021 10:01 PM
To: Zhu, James <James.Zhu at amd.com<mailto:James.Zhu at amd.com>>
Cc: amd-gfx list <amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher at amd.com<mailto:Alexander.Deucher at amd.com>>; Zhang, Yifan <Yifan1.Zhang at amd.com<mailto:Yifan1.Zhang at amd.com>>; James Zhu <jzhums at gmail.com<mailto:jzhums at gmail.com>>; Ken Moffat <zarniwhoop at ntlworld.com<mailto:zarniwhoop at ntlworld.com>>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu at amd.com<mailto:James.Zhu at amd.com>> wrote:
>
> Remove duplicated kfd_resume_iommu which already runs
> in mdgpu_amdkfd_device_init.
>
> Signed-off-by: James Zhu <James.Zhu at amd.com<mailto:James.Zhu at amd.com>>
Once you get confirmation, please add:
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208277821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=a6WyuNGhOU5OT3J8GQtXSQ3O5r942D2p%2BbruFUncT0E%3D&reserved=0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&data=04%7C01%7CJames.Zhu%40amd.com%7C67f2c85612f7475d0dd008d99edb1fef%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715486249968500%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WhxYtNqFSoeWcuJSbJCCl99VSdd3XyHBVzjbpR3nx7g%3D&reserved=0>
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208287813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=E1MFXdprEaldLux2AoXNEeDWL5E85WFv8CrfZODTa%2F4%3D&reserved=0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&data=04%7C01%7CJames.Zhu%40amd.com%7C67f2c85612f7475d0dd008d99edb1fef%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715486249978500%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hX2U%2BcWp%2BEinTjxptnx0zExc%2Fy3lbFUYgHT2JDdUY0g%3D&reserved=0>
Acked-by: Alex Deucher <alexander.deucher at amd.com<mailto:alexander.deucher at amd.com>>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e56bc925afcf..f77823ce7ae8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
> if (!adev->gmc.xgmi.pending_reset)
> amdgpu_amdkfd_device_init(adev);
>
> - r = amdgpu_amdkfd_resume_iommu(adev);
> - if (r)
> - goto init_failed;
> -
> amdgpu_fru_get_product_info(adev);
>
> init_failed:
> --
> 2.25.1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20211103/e24a614f/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 381936 bytes
Desc: image.png
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20211103/e24a614f/attachment-0001.png>
More information about the amd-gfx
mailing list