[PATCH v2 1/2] drm/amdgpu: enhance amdgpu_vcn_suspend

Mon May 17 18:07:57 UTC 2021

Definitely, we need to move cancel_delayed_work_sync moved to before 
power gate.

Should "save_bo" be step 4 before power gate ?

Regards,

Leo

On 2021-05-17 1:59 p.m., James Zhu wrote:
>
> Then we forgot the proposal I provided before.
>
> I think the below seq may fixed the race condition issue that we are 
> facing.
>
> 1. stop scheduling new jobs
>
>     for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>         if (adev->vcn.harvest_config & (1 << i))
>             continue;
>
>         ring = &adev->vcn.inst[i].ring_dec;
>         ring->sched.ready = false;
>
>         for (j = 0; j < adev->vcn.num_enc_rings; ++j) {
>             ring = &adev->vcn.inst[i].ring_enc[j];
>             ring->sched.ready = false;
>         }
>     }
>
> 2.    cancel_delayed_work_sync(&adev->vcn.idle_work);
>
> 3. SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
>          UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
>
> 4. amdgpu_device_ip_set_powergating_state(adev, 
> AMD_IP_BLOCK_TYPE_VCN,   AMD_PG_STATE_GATE);
>
> 5.  saved_bo
>
> Best Regards!
>
> James
>
> On 2021-05-17 1:43 p.m., Leo Liu wrote:
>>
>> On 2021-05-17 12:54 p.m., James Zhu wrote:
>>> I am wondering if there are still some jobs kept in the queue, it is 
>>> lucky to check
>>
>> Yes it's possible, in this case delayed handler is set, so cancelling 
>> once is enough.
>>
>>
>>>
>>> UVD_POWER_STATUS done, but after, fw start a new job that list in 
>>> the queue.
>>>
>>> To handle this situation perfectly, we need add mechanism to suspend 
>>> fw first.
>>
>> I think that should be handled by the sequence from 
>> vcn_v3_0_stop_dpg_mode().
>>
>>
>>>
>>> Another case, if it is unlucky, that  vcn fw hung at that time, 
>>> UVD_POWER_STATUS
>>>
>>> always keeps busy.   then it needs force powering gate the vcn hw 
>>> after certain time waiting.
>>
>> Yep, we still need to gate VCN power after certain timeout.
>>
>>
>> Regards,
>>
>> Leo
>>
>>
>>
>>>
>>> Best Regards!
>>>
>>> James
>>>
>>> On 2021-05-17 12:34 p.m., Leo Liu wrote:
>>>>
>>>> On 2021-05-17 11:52 a.m., James Zhu wrote:
>>>>> During vcn suspends, stop ring continue to receive new requests,
>>>>> and try to wait for all vcn jobs to finish gracefully.
>>>>>
>>>>> v2: Forced powering gate vcn hardware after few wainting retry.
>>>>>
>>>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22 +++++++++++++++++++++-
>>>>>   1 file changed, 21 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>> index 2016459..9f3a6e7 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>> @@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct amdgpu_device 
>>>>> *adev)
>>>>>   {
>>>>>       unsigned size;
>>>>>       void *ptr;
>>>>> +    int retry_max = 6;
>>>>>       int i;
>>>>>   - cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>>> +    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>> +        if (adev->vcn.harvest_config & (1 << i))
>>>>> +            continue;
>>>>> +        ring = &adev->vcn.inst[i].ring_dec;
>>>>> +        ring->sched.ready = false;
>>>>> +
>>>>> +        for (j = 0; j < adev->vcn.num_enc_rings; ++j) {
>>>>> +            ring = &adev->vcn.inst[i].ring_enc[j];
>>>>> +            ring->sched.ready = false;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    while (retry_max-- && 
>>>>> cancel_delayed_work_sync(&adev->vcn.idle_work))
>>>>> +        mdelay(5);
>>>>
>>>> I think it's possible to have one pending job unprocessed with VCN 
>>>> when suspend sequence getting here, but it shouldn't be more than 
>>>> one, cancel_delayed_work_sync probably return false after the first 
>>>> time, so calling cancel_delayed_work_sync once should be enough 
>>>> here. we probably need to wait longer from:
>>>>
>>>> SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
>>>>         UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
>>>>
>>>> to make sure the unprocessed job get done.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Leo
>>>>
>>>>
>>>>> +    if (!retry_max && !amdgpu_sriov_vf(adev)) {
>>>>> +        if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) {
>>>>> +            dev_warn(adev->dev, "Forced powering gate vcn 
>>>>> hardware!");
>>>>> +            vcn_v3_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>>>>> +        }
>>>>> +    }
>>>>>         for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>>           if (adev->vcn.harvest_config & (1 << i))
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210517/ca8115af/attachment-0001.htm>