[PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

Alex Deucher alexdeucher at gmail.com
Fri Dec 17 02:02:59 UTC 2021


On Thu, Dec 16, 2021 at 8:43 PM Quan, Evan <Evan.Quan at amd.com> wrote:
>
> [AMD Official Use Only]
>
> Hi Alex,
>
> Per our checking, vcn_v2 and vcn_v3 already have the dpm disablement(below) in their ->suspend routine which should prevent them from the issue here.
>         if (adev->pm.dpm_enabled)
>                 amdgpu_dpm_enable_uvd(adev, false);
> So, maybe it's a different story with those newer APUs. Can you forward the bug reports to me?

https://gitlab.freedesktop.org/drm/amd/-/issues/1712#note_1192187

Alex

>
> Thanks,
> Evan
> > -----Original Message-----
> > From: Alex Deucher <alexdeucher at gmail.com>
> > Sent: Thursday, December 16, 2021 11:38 PM
> > To: Quan, Evan <Evan.Quan at amd.com>
> > Cc: Zhu, James <James.Zhu at amd.com>; Gong, Curry
> > <Curry.Gong at amd.com>; amd-gfx at lists.freedesktop.org; Deucher,
> > Alexander <Alexander.Deucher at amd.com>; Liu, Leo <Leo.Liu at amd.com>
> > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > powergating is explicitly enabled
> >
> > FWIW, it looks like all versions of VCN need the same fix.  There have been
> > reports of suspend failing when VCN is in use on other newer APUs as well.
> >
> > Alex
> >
> > On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan <Evan.Quan at amd.com> wrote:
> > >
> > > [AMD Official Use Only]
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > From: Zhu, James <James.Zhu at amd.com>
> > > Sent: Monday, December 13, 2021 9:39 PM
> > > To: Gong, Curry <Curry.Gong at amd.com>; Zhu, James
> > <James.Zhu at amd.com>;
> > > amd-gfx at lists.freedesktop.org
> > > Cc: Liu, Leo <Leo.Liu at amd.com>; Quan, Evan <Evan.Quan at amd.com>;
> > > Deucher, Alexander <Alexander.Deucher at amd.com>
> > > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > > powergating is explicitly enabled
> > >
> > >
> > >
> > > Hi Curry, Evan,
> > >
> > > It seems vcn1.0 power gate sequence are special.
> > >
> > > if the original solution is thread safe, then the following solution will be
> > thread safe also.
> > >
> > > static int vcn_v1_0_hw_fini(void *handle) {
> > >     struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > >
> > >     cancel_delayed_work_sync(&adev->vcn.idle_work);
> > >
> > >     if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
> > >         (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
> > >          RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
> > > +        if (adev->pm.dpm_enabled)
> > > +            amdgpu_dpm_enable_uvd(adev, false);
> > > +        else
> > > +            vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
> > >
> > > [Quan, Evan] Considering adev->pm.dpm_enabled is always true,
> > “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will
> > become dead code.
> > >
> > > Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the
> > suspend/resume discussed here). So, it seems quite risky compared with
> > Curry’s original patch.
> > >
> > > Before we can come up with a better idea, I would suggest to land Curry’s
> > original patch as a quick fix.
> > >
> > >
> > >
> > > BR
> > >
> > > Evan
> > >
> > >
> > >     }
> > >
> > > Best Regards!
> > >
> > > James
> > >
> > > On 2021-12-13 3:55 a.m., Gong, Curry wrote:
> > >
> > > [AMD Official Use Only]
> > >
> > >
> > >
> > > Hi James:
> > >
> > >
> > >
> > > With the following patch, an error will be reported when the driver is
> > > loaded
> > >
> > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > >
> > > @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device
> > > *adev)
> > >
> > >         else
> > >
> > >                 r = vcn_v1_0_stop_spg_mode(adev);
> > >
> > >
> > >
> > > c
> > >
> > >         return r;
> > >
> > > }
> > >
> > >
> > >
> > >
> > >
> > > $ dmesg
> > >
> > > [  363.181081] INFO: task kworker/3:2:223 blocked for more than 120
> > seconds.
> > >
> > > [  363.181150]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-
> > Ubuntu
> > >
> > > [  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > >
> > > [  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2
> > flags:0x00004000
> > >
> > > [  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
> > >
> > > [  363.181612] Call Trace:
> > >
> > > [  363.181618]  __schedule+0x44c/0x8a0
> > >
> > > [  363.181627]  schedule+0x4f/0xc0
> > >
> > > [  363.181631]  schedule_preempt_disabled+0xe/0x10
> > >
> > > [  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
> > >
> > > [  363.181643]  __mutex_lock_slowpath+0x13/0x20
> > >
> > > [  363.181648]  mutex_lock+0x32/0x40
> > >
> > > [  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180
> > [amdgpu]
> > >
> > > [  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
> > >
> > > [  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
> > >
> > > [  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0
> > > [amdgpu]
> > >
> > > [  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
> > >
> > > [  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
> > >
> > > [  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180
> > [amdgpu]
> > >
> > > [  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
> > >
> > > [  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
> > >
> > > [  363.184667]  process_one_work+0x220/0x3c0
> > >
> > > [  363.184674]  worker_thread+0x4d/0x3f0
> > >
> > > [  363.184677]  ? process_one_work+0x3c0/0x3c0
> > >
> > > [  363.184680]  kthread+0x12b/0x150
> > >
> > > [  363.184685]  ? set_kthread_struct+0x40/0x40
> > >
> > > [  363.184690]  ret_from_fork+0x22/0x30
> > >
> > > [  363.184699] INFO: task kworker/2:2:233 blocked for more than 120
> > seconds.
> > >
> > > [  363.184739]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-
> > Ubuntu
> > >
> > > [  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > >
> > > [  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2
> > flags:0x00004000
> > >
> > > [  363.184831] Workqueue: events
> > > amdgpu_device_delayed_init_work_handler [amdgpu]
> > >
> > > [  363.185085] Call Trace:
> > >
> > > [  363.185087]  __schedule+0x44c/0x8a0
> > >
> > > [  363.185092]  schedule+0x4f/0xc0
> > >
> > > [  363.185095]  schedule_timeout+0x202/0x290
> > >
> > > [  363.185099]  ? sched_clock_cpu+0x11/0xb0
> > >
> > > [  363.185105]  wait_for_completion+0x94/0x100
> > >
> > > [  363.185110]  __flush_work+0x12a/0x1e0
> > >
> > > [  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
> > >
> > > [  363.185119]  __cancel_work_timer+0x10e/0x190
> > >
> > > [  363.185123]  cancel_delayed_work_sync+0x13/0x20
> > >
> > > [  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
> > >
> > > [  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
> > >
> > > [  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
> > >
> > > [  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
> > >
> > > [  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
> > >
> > > [  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
> > >
> > > [  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
> > >
> > > [  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30
> > > [amdgpu]
> > >
> > > [  363.187206]  process_one_work+0x220/0x3c0
> > >
> > > [  363.187210]  worker_thread+0x4d/0x3f0
> > >
> > > [  363.187214]  ? process_one_work+0x3c0/0x3c0
> > >
> > > [  363.187217]  kthread+0x12b/0x150
> > >
> > > [  363.187221]  ? set_kthread_struct+0x40/0x40
> > >
> > > [  363.187226]  ret_from_fork+0x22/0x30
> > >
> > >
> > >
> > >
> > >
> > > BR
> > >
> > > Curry Gong
> > >
> > > From: Zhu, James <James.Zhu at amd.com>
> > > Sent: Saturday, December 11, 2021 5:07 AM
> > > To: Gong, Curry <Curry.Gong at amd.com>; amd-gfx at lists.freedesktop.org
> > > Cc: Liu, Leo <Leo.Liu at amd.com>; Zhu, James <James.Zhu at amd.com>;
> > Quan,
> > > Evan <Evan.Quan at amd.com>; Deucher, Alexander
> > > <Alexander.Deucher at amd.com>
> > > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > > powergating is explicitly enabled
> > >
> > >
> > >
> > > On 2021-12-10 6:41 a.m., chen gong wrote:
> > >
> > > Play a video on the raven (or PCO, raven2) platform, and then do the
> > > S3
> > >
> > > test. When resume, the following error will be reported:
> > >
> > >
> > >
> > > amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
> > > ring
> > >
> > > vcn_dec test failed (-110)
> > >
> > > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of
> > IP
> > > block
> > >
> > > <vcn_v1_0> failed -110
> > >
> > > amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
> > >
> > > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> > >
> > >
> > >
> > > [why]
> > >
> > > When playing the video: The power state flag of the vcn block is set
> > > to
> > >
> > > POWER_STATE_ON.
> > >
> > >
> > >
> > > When doing suspend: There is no change to the power state flag of the
> > >
> > > vcn block, it is still POWER_STATE_ON.
> > >
> > >
> > >
> > > When doing resume: Need to open the power gate of the vcn block and
> > > set
> > >
> > > the power state flag of the VCN block to POWER_STATE_ON.
> > >
> > > But at this time, the power state flag of the vcn block is already
> > >
> > > POWER_STATE_ON. The power status flag check in the "8f2cdef
> > drm/amd/pm:
> > >
> > > avoid duplicate powergate/ungate setting" patch will return the
> > >
> > > amdgpu_dpm_set_powergating_by_smu function directly.
> > >
> > > As a result, the gate of the power was not opened, causing the
> > >
> > > subsequent ring test to fail.
> > >
> > >
> > >
> > > [how]
> > >
> > > In the suspend function of the vcn block, explicitly change the power
> > >
> > > state flag of the vcn block to POWER_STATE_OFF.
> > >
> > >
> > >
> > > Signed-off-by: chen gong <curry.gong at amd.com>
> > >
> > > ---
> > >
> > >  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
> > >
> > >  1 file changed, 7 insertions(+)
> > >
> > >
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > >
> > > index d54d720..d73676b 100644
> > >
> > > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > >
> > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > >
> > > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
> > >
> > >  {
> > >
> > >   int r;
> > >
> > >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > >
> > > + bool cancel_success;
> > >
> > > +
> > >
> > > + cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
> > >
> > > [JZ] Can you refer to vcn_v3_0_stop , and add
> > amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?
> > >
> > > See if it also can help.
> > >
> > >
> > >
> > > + if (cancel_success) {
> > >
> > > +        if (adev->pm.dpm_enabled)
> > >
> > > +                amdgpu_dpm_enable_uvd(adev, false);
> > >
> > > + }
> > >
> > >
> > >
> > >   r = vcn_v1_0_hw_fini(adev);
> > >
> > >   if (r)


More information about the amd-gfx mailing list