<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Definitely, we need to move cancel_delayed_work_sync moved to
before power gate.<br>
</p>
<p>Should "save_bo" be step 4 before power gate ?<br>
</p>
<p>Regards,<br>
</p>
<p>Leo</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 2021-05-17 1:59 p.m., James Zhu
wrote:<br>
</div>
<blockquote type="cite" cite="mid:8659d46e-58b6-d090-f95f-2ad9d4533f0e@amd.com">
<p>Then we forgot the proposal I provided before.</p>
<p>I think the below seq may fixed the race condition issue that
we are facing.<br>
</p>
<p>1. stop scheduling new jobs <br>
</p>
<p> for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {<br>
if (adev->vcn.harvest_config & (1 << i))<br>
continue;<br>
<br>
ring = &adev->vcn.inst[i].ring_dec;<br>
ring->sched.ready = false;<br>
<br>
for (j = 0; j < adev->vcn.num_enc_rings; ++j) {<br>
ring = &adev->vcn.inst[i].ring_enc[j];<br>
ring->sched.ready = false;<br>
}<br>
}</p>
<p>2. cancel_delayed_work_sync(&adev->vcn.idle_work);</p>
<p>3. <font size="2"><span style="font-size:11pt;">SOC15_WAIT_ON_RREG(VCN,
inst_idx, mmUVD_POWER_STATUS, 1,<br>
UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);</span></font></p>
<p><font size="2"><span style="font-size:11pt;">4. </span></font>amdgpu_device_ip_set_powergating_state(adev,
AMD_IP_BLOCK_TYPE_VCN, AMD_PG_STATE_GATE);</p>
<p>5. saved_bo</p>
<p>Best Regards!</p>
<p>James<br>
</p>
<div class="moz-cite-prefix">On 2021-05-17 1:43 p.m., Leo Liu
wrote:<br>
</div>
<blockquote type="cite" cite="mid:3c6a0bf3-b4b4-0a93-573e-fd9ae02f16a8@amd.com"> <br>
On 2021-05-17 12:54 p.m., James Zhu wrote: <br>
<blockquote type="cite">I am wondering if there are still some
jobs kept in the queue, it is lucky to check <br>
</blockquote>
<br>
Yes it's possible, in this case delayed handler is set, so
cancelling once is enough. <br>
<br>
<br>
<blockquote type="cite"> <br>
UVD_POWER_STATUS done, but after, fw start a new job that list
in the queue. <br>
<br>
To handle this situation perfectly, we need add mechanism to
suspend fw first. <br>
</blockquote>
<br>
I think that should be handled by the sequence from
vcn_v3_0_stop_dpg_mode(). <br>
<br>
<br>
<blockquote type="cite"> <br>
Another case, if it is unlucky, that vcn fw hung at that
time, UVD_POWER_STATUS <br>
<br>
always keeps busy. then it needs force powering gate the vcn
hw after certain time waiting. <br>
</blockquote>
<br>
Yep, we still need to gate VCN power after certain timeout. <br>
<br>
<br>
Regards, <br>
<br>
Leo <br>
<br>
<br>
<br>
<blockquote type="cite"> <br>
Best Regards! <br>
<br>
James <br>
<br>
On 2021-05-17 12:34 p.m., Leo Liu wrote: <br>
<blockquote type="cite"> <br>
On 2021-05-17 11:52 a.m., James Zhu wrote: <br>
<blockquote type="cite">During vcn suspends, stop ring
continue to receive new requests, <br>
and try to wait for all vcn jobs to finish gracefully. <br>
<br>
v2: Forced powering gate vcn hardware after few wainting
retry. <br>
<br>
Signed-off-by: James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com" moz-do-not-send="true"><James.Zhu@amd.com></a>
<br>
--- <br>
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22
+++++++++++++++++++++- <br>
1 file changed, 21 insertions(+), 1 deletion(-) <br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c <br>
index 2016459..9f3a6e7 100644 <br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c <br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c <br>
@@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct
amdgpu_device *adev) <br>
{ <br>
unsigned size; <br>
void *ptr; <br>
+ int retry_max = 6; <br>
int i; <br>
-
cancel_delayed_work_sync(&adev->vcn.idle_work); <br>
+ for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
<br>
+ if (adev->vcn.harvest_config & (1 <<
i)) <br>
+ continue; <br>
+ ring = &adev->vcn.inst[i].ring_dec; <br>
+ ring->sched.ready = false; <br>
+ <br>
+ for (j = 0; j < adev->vcn.num_enc_rings;
++j) { <br>
+ ring = &adev->vcn.inst[i].ring_enc[j];
<br>
+ ring->sched.ready = false; <br>
+ } <br>
+ } <br>
+ <br>
+ while (retry_max-- &&
cancel_delayed_work_sync(&adev->vcn.idle_work)) <br>
+ mdelay(5); <br>
</blockquote>
<br>
I think it's possible to have one pending job unprocessed
with VCN when suspend sequence getting here, but it
shouldn't be more than one, cancel_delayed_work_sync
probably return false after the first time, so calling
cancel_delayed_work_sync once should be enough here. we
probably need to wait longer from: <br>
<br>
SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, <br>
UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); <br>
<br>
to make sure the unprocessed job get done. <br>
<br>
<br>
Regards, <br>
<br>
Leo <br>
<br>
<br>
<blockquote type="cite">+ if (!retry_max &&
!amdgpu_sriov_vf(adev)) { <br>
+ if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) { <br>
+ dev_warn(adev->dev, "Forced powering gate
vcn hardware!"); <br>
+ vcn_v3_0_set_powergating_state(adev,
AMD_PG_STATE_GATE); <br>
+ } <br>
+ } <br>
for (i = 0; i < adev->vcn.num_vcn_inst; ++i)
{ <br>
if (adev->vcn.harvest_config & (1
<< i)) <br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</body>
</html>