[PATCH v2 1/2] drm/amdgpu: enhance amdgpu_vcn_suspend

Mon May 17 17:59:33 UTC 2021

Then we forgot the proposal I provided before.

I think the below seq may fixed the race condition issue that we are facing.

1. stop scheduling new jobs

     for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
         if (adev->vcn.harvest_config & (1 << i))
             continue;

         ring = &adev->vcn.inst[i].ring_dec;
         ring->sched.ready = false;

         for (j = 0; j < adev->vcn.num_enc_rings; ++j) {
             ring = &adev->vcn.inst[i].ring_enc[j];
             ring->sched.ready = false;
         }
     }

2.    cancel_delayed_work_sync(&adev->vcn.idle_work);

3. SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
          UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);

4. amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN,   
AMD_PG_STATE_GATE);

5.  saved_bo

Best Regards!

James

On 2021-05-17 1:43 p.m., Leo Liu wrote:
>
> On 2021-05-17 12:54 p.m., James Zhu wrote:
>> I am wondering if there are still some jobs kept in the queue, it is 
>> lucky to check
>
> Yes it's possible, in this case delayed handler is set, so cancelling 
> once is enough.
>
>
>>
>> UVD_POWER_STATUS done, but after, fw start a new job that list in the 
>> queue.
>>
>> To handle this situation perfectly, we need add mechanism to suspend 
>> fw first.
>
> I think that should be handled by the sequence from 
> vcn_v3_0_stop_dpg_mode().
>
>
>>
>> Another case, if it is unlucky, that  vcn fw hung at that time, 
>> UVD_POWER_STATUS
>>
>> always keeps busy.   then it needs force powering gate the vcn hw 
>> after certain time waiting.
>
> Yep, we still need to gate VCN power after certain timeout.
>
>
> Regards,
>
> Leo
>
>
>
>>
>> Best Regards!
>>
>> James
>>
>> On 2021-05-17 12:34 p.m., Leo Liu wrote:
>>>
>>> On 2021-05-17 11:52 a.m., James Zhu wrote:
>>>> During vcn suspends, stop ring continue to receive new requests,
>>>> and try to wait for all vcn jobs to finish gracefully.
>>>>
>>>> v2: Forced powering gate vcn hardware after few wainting retry.
>>>>
>>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22 +++++++++++++++++++++-
>>>>   1 file changed, 21 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>> index 2016459..9f3a6e7 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>> @@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct amdgpu_device 
>>>> *adev)
>>>>   {
>>>>       unsigned size;
>>>>       void *ptr;
>>>> +    int retry_max = 6;
>>>>       int i;
>>>>   - cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>> +    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>> +        if (adev->vcn.harvest_config & (1 << i))
>>>> +            continue;
>>>> +        ring = &adev->vcn.inst[i].ring_dec;
>>>> +        ring->sched.ready = false;
>>>> +
>>>> +        for (j = 0; j < adev->vcn.num_enc_rings; ++j) {
>>>> +            ring = &adev->vcn.inst[i].ring_enc[j];
>>>> +            ring->sched.ready = false;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    while (retry_max-- && 
>>>> cancel_delayed_work_sync(&adev->vcn.idle_work))
>>>> +        mdelay(5);
>>>
>>> I think it's possible to have one pending job unprocessed with VCN 
>>> when suspend sequence getting here, but it shouldn't be more than 
>>> one, cancel_delayed_work_sync probably return false after the first 
>>> time, so calling cancel_delayed_work_sync once should be enough 
>>> here. we probably need to wait longer from:
>>>
>>> SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
>>>         UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
>>>
>>> to make sure the unprocessed job get done.
>>>
>>>
>>> Regards,
>>>
>>> Leo
>>>
>>>
>>>> +    if (!retry_max && !amdgpu_sriov_vf(adev)) {
>>>> +        if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) {
>>>> +            dev_warn(adev->dev, "Forced powering gate vcn 
>>>> hardware!");
>>>> +            vcn_v3_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>>>> +        }
>>>> +    }
>>>>         for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>           if (adev->vcn.harvest_config & (1 << i))
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210517/e132c7e1/attachment.htm>