<html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> The saved data are from the engine cache, it's the runtime of engine before suspend, it might be different after you have the engine powered off. Regards, Leo <div class="moz-cite-prefix">On 2021-05-17 2:11 p.m., Zhu, James wrote: </div> <blockquote type="cite" cite="mid:DM5PR12MB25173E8B288010950417C2E2E42D9@DM5PR12MB2517.namprd12.prod.outlook.com"> <style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style> [AMD Official Use Only - Internal Distribution Only] <div> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> save_bo needn't ungate vcn, it just keeps data in memory. </div> <div> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> </div> <div id="Signature"> <div> <div id="divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Arial,Helvetica,sans-serif"> Thanks & Best Regards! James Zhu </div> </div> </div> </div> <hr style="display:inline-block;width:98%" tabindex="-1"> <div id="divRplyFwdMsg" dir="ltr">From: Liu, Leo <a class="moz-txt-link-rfc2396E" href="mailto:Leo.Liu@amd.com"><Leo.Liu@amd.com></a> Sent: Monday, May 17, 2021 2:07 PM To: Zhu, James <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a>; Zhu, James <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org"><amd-gfx@lists.freedesktop.org></a> Subject: Re: [PATCH v2 1/2] drm/amdgpu: enhance amdgpu_vcn_suspend <div> </div> </div> <div> Definitely, we need to move cancel_delayed_work_sync moved to before power gate. Should "save_bo" be step 4 before power gate ? Regards, Leo <div class="x_moz-cite-prefix">On 2021-05-17 1:59 p.m., James Zhu wrote: </div> <blockquote type="cite"> Then we forgot the proposal I provided before. I think the below seq may fixed the race condition issue that we are facing. 1. stop scheduling new jobs for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) continue; ring = &adev->vcn.inst[i].ring_dec; ring->sched.ready = false; for (j = 0; j < adev->vcn.num_enc_rings; ++j) { ring = &adev->vcn.inst[i].ring_enc[j]; ring->sched.ready = false; } } 2. cancel_delayed_work_sync(&adev->vcn.idle_work); 3. SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); 4. amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN, AMD_PG_STATE_GATE); 5. saved_bo Best Regards! James <div class="x_moz-cite-prefix">On 2021-05-17 1:43 p.m., Leo Liu wrote: </div> <blockquote type="cite"> On 2021-05-17 12:54 p.m., James Zhu wrote: <blockquote type="cite">I am wondering if there are still some jobs kept in the queue, it is lucky to check </blockquote> Yes it's possible, in this case delayed handler is set, so cancelling once is enough. <blockquote type="cite"> UVD_POWER_STATUS done, but after, fw start a new job that list in the queue. To handle this situation perfectly, we need add mechanism to suspend fw first. </blockquote> I think that should be handled by the sequence from vcn_v3_0_stop_dpg_mode(). <blockquote type="cite"> Another case, if it is unlucky, that vcn fw hung at that time, UVD_POWER_STATUS always keeps busy. then it needs force powering gate the vcn hw after certain time waiting. </blockquote> Yep, we still need to gate VCN power after certain timeout. Regards, Leo <blockquote type="cite"> Best Regards! James On 2021-05-17 12:34 p.m., Leo Liu wrote: <blockquote type="cite"> On 2021-05-17 11:52 a.m., James Zhu wrote: <blockquote type="cite">During vcn suspends, stop ring continue to receive new requests, and try to wait for all vcn jobs to finish gracefully. v2: Forced powering gate vcn hardware after few wainting retry. Signed-off-by: James Zhu <a class="x_moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com" moz-do-not-send="true"> <James.Zhu@amd.com></a> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 2016459..9f3a6e7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct amdgpu_device *adev) { unsigned size; void *ptr; + int retry_max = 6; int i; - cancel_delayed_work_sync(&adev->vcn.idle_work); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->vcn.harvest_config & (1 << i)) + continue; + ring = &adev->vcn.inst[i].ring_dec; + ring->sched.ready = false; + + for (j = 0; j < adev->vcn.num_enc_rings; ++j) { + ring = &adev->vcn.inst[i].ring_enc[j]; + ring->sched.ready = false; + } + } + + while (retry_max-- && cancel_delayed_work_sync(&adev->vcn.idle_work)) + mdelay(5); </blockquote> I think it's possible to have one pending job unprocessed with VCN when suspend sequence getting here, but it shouldn't be more than one, cancel_delayed_work_sync probably return false after the first time, so calling cancel_delayed_work_sync once should be enough here. we probably need to wait longer from: SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); to make sure the unprocessed job get done. Regards, Leo <blockquote type="cite">+ if (!retry_max && !amdgpu_sriov_vf(adev)) { + if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) { + dev_warn(adev->dev, "Forced powering gate vcn hardware!"); + vcn_v3_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + } + } for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) </blockquote> </blockquote> </blockquote> </blockquote> </blockquote> </div> </div> </blockquote> </body> </html>