[PATCH 2/2] drm/amdgpu: init iommu after amdkfd device init

Zhu, James James.Zhu at amd.com
Mon Oct 4 16:56:45 UTC 2021


[AMD Official Use Only]

Hi Flelix,

I have tested YiFan's patch and pco s3 test can pass.

The original s3 issue always occurred on resume, And YiFan's patches
are only related to initialization.

Hi YiFan,

Reviewed-by: James Zhu <James.Zhu at amd.com> for the series

Tested-by: James Zhu <James.Zhu at amd.com> for the series



Thanks & Best Regards!


James Zhu

________________________________
From: Kuehling, Felix <Felix.Kuehling at amd.com>
Sent: Monday, October 4, 2021 11:00 AM
To: Zhang, Yifan <Yifan1.Zhang at amd.com>; Zhu, James <James.Zhu at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device init

I'm trying to understand what the end result is after James' and your
patches. If I'm reading it correctly, we now initialize IOMMUv2 after
kfd_device_init during initialization, but before kfd_device_init during
resume from S3. Is  that the correct understanding?

My concern is, that we may run into related problems again if the root
cause and the mechanism of the fix are poorly understood. Do you have an
explanation why this different sequence is needed during init and resume?

That said, given that your patches are tested and reviewed by James, you
can add my Acked-by to both patches. The series is

Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>

Thanks,
  Felix


Am 2021-10-04 um 5:42 a.m. schrieb Zhang, Yifan:
>
> [AMD Official Use Only]
>
>
>
> Hi Felix,
>
>
>
> After sync w/ James, we agree that this patch series could fix both
> our problems, and he verified this patch series will not cause
> regression of his previous issue. Do you have more comments regarding
> this patch series ? Thanks.
>
>
>
> BRs,
>
> Yifan
>
>
>
> *From:* Zhu, James <James.Zhu at amd.com>
> *Sent:* Wednesday, September 29, 2021 9:19 PM
> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>; Zhang, Yifan
> <Yifan1.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
> *Subject:* Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device init
>
>
>
> [AMD Official Use Only]
>
>
>
> H Felix,
>
>
>
> Since the previous patch can help on PCO suspend/resume hung issue.
> Let me work with YiFan to see if
>
> there is proper way to cover both cases.
>
>
>
> Thanks & Best Regards!
>
>
>
> James Zhu
>
> ------------------------------------------------------------------------
>
> *From:*Kuehling, Felix <Felix.Kuehling at amd.com
> <mailto:Felix.Kuehling at amd.com>>
> *Sent:* Tuesday, September 28, 2021 11:41 AM
> *To:* Zhang, Yifan <Yifan1.Zhang at amd.com
> <mailto:Yifan1.Zhang at amd.com>>; amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>>; Zhu, James <James.Zhu at amd.com
> <mailto:James.Zhu at amd.com>>
> *Subject:* Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device
> init
>
>
>
> [+James]
>
> This basically undoes James's change "drm/amdgpu: move iommu_resume
> before ip init/resume". I assume James made his change for a reason. Can
> you please discuss the issue with him and determine a solution that
> solves both your problem and his?
>
> If James' patch series was a mistake, I'd prefer to revert his patches,
> because his patches complicated the initialization sequence and exposed
> the iommu init sequence in amdgpu.
>
> Thanks,
>   Felix
>
>
> Am 2021-09-28 um 4:28 a.m. schrieb Yifan Zhang:
> > This patch is to fix clinfo failure in Raven/Picasso:
> >
> > Number of platforms: 1
> >   Platform Profile: FULL_PROFILE
> >   Platform Version: OpenCL 2.2 AMD-APP (3364.0)
> >   Platform Name: AMD Accelerated Parallel Processing
> >   Platform Vendor: Advanced Micro Devices, Inc.
> >   Platform Extensions: cl_khr_icd cl_amd_event_callback
> >
> >   Platform Name: AMD Accelerated Parallel Processing Number of
> devices: 0
> >
> > Signed-off-by: Yifan Zhang <yifan1.zhang at amd.com
> <mailto:yifan1.zhang at amd.com>>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 4c8f2f4647c0..89ed9b091386 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2393,10 +2393,6 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
> >        if (r)
> >                goto init_failed;
> >
> > -     r = amdgpu_amdkfd_resume_iommu(adev);
> > -     if (r)
> > -             goto init_failed;
> > -
> >        r = amdgpu_device_ip_hw_init_phase1(adev);
> >        if (r)
> >                goto init_failed;
> > @@ -2435,6 +2431,10 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
> >        if (!adev->gmc.xgmi.pending_reset)
> >                amdgpu_amdkfd_device_init(adev);
> >
> > +     r = amdgpu_amdkfd_resume_iommu(adev);
> > +     if (r)
> > +             goto init_failed;
> > +
> >        amdgpu_fru_get_product_info(adev);
> >
> >  init_failed:
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20211004/7210cf77/attachment.htm>


More information about the amd-gfx mailing list