<html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> Then we forgot the proposal I provided before. I think the below seq may fixed the race condition issue that we are facing. 1. stop scheduling new jobs for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) continue; ring = &adev->vcn.inst[i].ring_dec; ring->sched.ready = false; for (j = 0; j < adev->vcn.num_enc_rings; ++j) { ring = &adev->vcn.inst[i].ring_enc[j]; ring->sched.ready = false; } } 2. cancel_delayed_work_sync(&adev->vcn.idle_work); 3. SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); 4. amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN, AMD_PG_STATE_GATE); 5. saved_bo Best Regards! James <div class="moz-cite-prefix">On 2021-05-17 1:43 p.m., Leo Liu wrote: </div> <blockquote type="cite" cite="mid:3c6a0bf3-b4b4-0a93-573e-fd9ae02f16a8@amd.com"> On 2021-05-17 12:54 p.m., James Zhu wrote: <blockquote type="cite">I am wondering if there are still some jobs kept in the queue, it is lucky to check </blockquote> Yes it's possible, in this case delayed handler is set, so cancelling once is enough. <blockquote type="cite"> UVD_POWER_STATUS done, but after, fw start a new job that list in the queue. To handle this situation perfectly, we need add mechanism to suspend fw first. </blockquote> I think that should be handled by the sequence from vcn_v3_0_stop_dpg_mode(). <blockquote type="cite"> Another case, if it is unlucky, that vcn fw hung at that time, UVD_POWER_STATUS always keeps busy. then it needs force powering gate the vcn hw after certain time waiting. </blockquote> Yep, we still need to gate VCN power after certain timeout. Regards, Leo <blockquote type="cite"> Best Regards! James On 2021-05-17 12:34 p.m., Leo Liu wrote: <blockquote type="cite"> On 2021-05-17 11:52 a.m., James Zhu wrote: <blockquote type="cite">During vcn suspends, stop ring continue to receive new requests, and try to wait for all vcn jobs to finish gracefully. v2: Forced powering gate vcn hardware after few wainting retry. Signed-off-by: James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 2016459..9f3a6e7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct amdgpu_device *adev) { unsigned size; void *ptr; + int retry_max = 6; int i; - cancel_delayed_work_sync(&adev->vcn.idle_work); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->vcn.harvest_config & (1 << i)) + continue; + ring = &adev->vcn.inst[i].ring_dec; + ring->sched.ready = false; + + for (j = 0; j < adev->vcn.num_enc_rings; ++j) { + ring = &adev->vcn.inst[i].ring_enc[j]; + ring->sched.ready = false; + } + } + + while (retry_max-- && cancel_delayed_work_sync(&adev->vcn.idle_work)) + mdelay(5); </blockquote> I think it's possible to have one pending job unprocessed with VCN when suspend sequence getting here, but it shouldn't be more than one, cancel_delayed_work_sync probably return false after the first time, so calling cancel_delayed_work_sync once should be enough here. we probably need to wait longer from: SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); to make sure the unprocessed job get done. Regards, Leo <blockquote type="cite">+ if (!retry_max && !amdgpu_sriov_vf(adev)) { + if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) { + dev_warn(adev->dev, "Forced powering gate vcn hardware!"); + vcn_v3_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + } + } for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) </blockquote> </blockquote> </blockquote> </blockquote> </body> </html>