[PATCH] drm/amdgpu: properly powergate Polaris12 UVD/VCE on suspend

Quan, Evan Evan.Quan at amd.com
Thu Aug 19 03:02:20 UTC 2021


[AMD Official Use Only]



> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> Sent: Wednesday, August 18, 2021 5:35 PM
> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org; Liu,
> Leo <Leo.Liu at amd.com>
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Chen, Guchun
> <Guchun.Chen at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: properly powergate Polaris12 UVD/VCE
> on suspend
> 
> 
> 
> On 8/18/2021 2:15 PM, Quan, Evan wrote:
> > [AMD Official Use Only]
> >
> >
> >
> >> -----Original Message-----
> >> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> >> Sent: Tuesday, August 17, 2021 6:09 PM
> >> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org;
> >> Liu, Leo <Leo.Liu at amd.com>
> >> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Chen, Guchun
> >> <Guchun.Chen at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>
> >> Subject: Re: [PATCH] drm/amdgpu: properly powergate Polaris12
> UVD/VCE
> >> on suspend
> >>
> >>
> >>
> >> On 8/17/2021 3:13 PM, Quan, Evan wrote:
> >>> [AMD Official Use Only]
> >>>
> >>> +Leo to share his insights
> >>>
> >>>> -----Original Message-----
> >>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> >>>> Sent: Tuesday, August 17, 2021 3:28 PM
> >>>> To: Quan, Evan <Evan.Quan at amd.com>; amd-
> gfx at lists.freedesktop.org
> >>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Chen,
> Guchun
> >>>> <Guchun.Chen at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>
> >>>> Subject: Re: [PATCH] drm/amdgpu: properly powergate Polaris12
> >> UVD/VCE
> >>>> on suspend
> >>>>
> >>>>
> >>>>
> >>>> On 8/17/2021 12:10 PM, Evan Quan wrote:
> >>>>> If the powergating of UVD/VCE is in process, wait for its
> >>>>> completion before proceeding(suspending). This can fix some hangs
> >>>>> observed on suspending when UVD/VCE still using(e.g. issue
> >>>>> "pm-suspend" when video is still playing).
> >>>>>
> >>>>> Change-Id: I36f39d9731e0a9638b52d5d92558b0ee9c23a9ed
> >>>>> Signed-off-by: Evan Quan <evan.quan at amd.com>
> >>>>> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> >>>>> ---
> >>>>>     drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 5 +++++
> >>>>>     drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 5 +++++
> >>>>>     2 files changed, 10 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> >>>>> b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> >>>>> index 4eebf973a065..2fdce572baeb 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> >>>>> @@ -554,6 +554,11 @@ static int uvd_v6_0_suspend(void *handle)
> >>>>>     	int r;
> >>>>>     	struct amdgpu_device *adev = (struct amdgpu_device
> *)handle;
> >>>>>
> >>>>> +	/*
> >>>>> +	 * If the powergating is in process, wait for its completion.
> >>>>> +	 */
> >>>>> +	flush_delayed_work(&adev->uvd.idle_work);
> >>>>> +
> >>>> If running idle is a prerequisite before going to suspend, then
> >>>> something else is missing here.
> >>>>
> >>>> Otherwise, the hang looks more like a pending work launched after
> >>>> hardware is suspended and trying to access hardware. As the
> >>>> hardware is going to be suspended anyway, doesn't it work with
> >>>> cancel_delayed_work_sync - making sure that nothing is going to be
> >>>> launched later to access hardware?
> >>> [Quan, Evan] The reason we chose flush_delayed_work instead of
> >> cancel_delayed_work_sync is we think those operations performed in
> >> idle_work(dpm disablement, powergating) seems needed considering
> the
> >> action is 'suspend'. So, instead of "cancel", maybe waiting for them
> >> completion is more proper.
> >>
> >> But it will do so only if the work is scheduled - so it doesn't seem
> >> to be a prerequisite for suspend. If it was a prerequisite, then the
> >> existing code is missing that (so that it gets done for all cases).
> > [Quan, Evan] Just confirmed that cancel_delayed_work_sync() alone
> cannot work.
> > In fact, our current driver already get cancel_delayed_work_sync()
> called(e.g. in amdgpu_uvd_suspend() on the path of uvd_v6_0_suspend).
> 
> Thanks Evan for checking this further.
> 
> If cancel_delayed_work_sync() is called in amdgpu_uvd_suspend(), then it
> means that originally executing idle_work was not considered as a
> prerequisite for suspending.
> 
> _uvd_suspend() is called "after" uvd_v6_0_hw_fini(adev), that maybe a little
> too late.
> 
> > To get the issue fixed, it has to be either:
> > 1. flush_delayed_work()
> > Or
> > 2. cancel_delayed_work_sync + amdgpu_dpm_enable_uvd/vce(adev,
> false)
> > (the job performed in idle work)
> 
> At minimum, it proves that disabling dpm is required for suspend.
> 
> > Btw, I do not think flush_delayed_work() is an incomplete fix. Since the
> UVD/VCE idle work is appended on ring->funcs->end_use.
> > So, as long as the UVD/VCE ring used and ended(it will since at least there is
> ring/ib test), there will be a chance to get the work waited and completed.
> 
> I agree that it fixes the issue, only thing is someone should look further to
> provide the right sequence of uvd suspend.
> 
> It doesn't give a root cause/right sequence - it only tells that forcing to
> execute idle_work fixes the problem probably due to an extra delay added
> by increased execution time or it disables DPM or something else.
> Someone should confirm that all of idle_work or a part of it (as dpm
> disable) is required for proper suspend sequence.
> 
> That said, I don't have any objections to this fix either. If there are other
> things, probably fix it this way and move on.
[Quan, Evan] Thanks Lijo. I checked more yesterday.
I found the issue was not seen on sienna cichlid and I checked its related source code(vcn_v3_0.c).
I think I might got some clues what's the right sequence and the reason why adding amdgpu_dpm_enable_uvd/vce(adev,false) can address the issue.
For vcn_v3_0.c, the cleanups performed on suspend:
 - cancel_delayed_work_sync
 - vcn_v3_0_set_powergating_state()
    - There is actually more jobs performed than just powergating. The naming is a little confusing.
    - what it does are: powergating enablement + clockgating enablement + dpm disablement

For uvd_v6_0.c(where the issue was found), the above cleanups are missing. But if "cancel_delayed_work_sync + amdgpu_dpm_enable_uvd/vce(adev," added, the sequence will be just like above. That might explain why it works.
  - cancel_delayed_work_sync
  - amdgpu_dpm_enable_uvd/vce()
    - Again, the naming is confusing. It actually performs more than just dpm disablement/enablmenet.
    - What it does are: powergating enablement + clockgating enablement  + dpm disablement

So, to me it seems proper to introduce a quick fix by " cancel_delayed_work_sync + amdgpu_dpm_enable_uvd/vce()". 
A more proper long-term solution may be as the latest VCN(to put all the necessary cleanups in ->set_powergating_state()). But that will affect more than S3. I will leave that to Leo or James who are experts on UVD/VCE/VCN relates.

BR
Evan
> false) 
> 
> Thanks,
> Lijo
> 
> >
> > BR
> > Evan
> >>
> >>>>
> >>>> Then this may be a potential issue for other suspend calls also
> >>>> where work is pending to be launched when hardware is suspended.
> >>> [Quan, Evan] Do you mean we need to check whether there is similar
> >>> issue
> >> for other IPs?
> >>>
> >>
> >> Yes, if there are cases where other IPs may schedule a delayed work
> >> and call hw_fini without cancelling the work.
> >>
> >> Thanks,
> >> Lijo
> >>
> >>> BR
> >>> Evan
> >>>>
> >>>> Thanks,
> >>>> Lijo
> >>>>
> >>>>>     	r = uvd_v6_0_hw_fini(adev);
> >>>>>     	if (r)
> >>>>>     		return r;
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> >>>>> b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> >>>>> index 6d9108fa22e0..f0adecd5ec0b 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> >>>>> @@ -503,6 +503,11 @@ static int vce_v3_0_suspend(void *handle)
> >>>>>     	int r;
> >>>>>     	struct amdgpu_device *adev = (struct amdgpu_device
> *)handle;
> >>>>>
> >>>>> +	/*
> >>>>> +	 * If the powergating is in process, wait for its completion.
> >>>>> +	 */
> >>>>> +	flush_delayed_work(&adev->vce.idle_work);
> >>>>> +
> >>>>>     	r = vce_v3_0_hw_fini(adev);
> >>>>>     	if (r)
> >>>>>     		return r;
> >>>>>


More information about the amd-gfx mailing list