[PATCH v2 1/2] drm/amdgpu: enhance amdgpu_vcn_suspend

Mon May 17 18:19:03 UTC 2021

To be accurate, the Bo is mapped to engine cache window, and the runtime 
of engine stacks, so we should save it before the poweroff.

On 2021-05-17 2:15 p.m., Leo Liu wrote:
>
> The saved data are from the engine cache, it's the runtime of engine 
> before suspend, it might be different after you have the engine 
> powered off.
>
>
> Regards,
>
> Leo
>
>
>
> On 2021-05-17 2:11 p.m., Zhu, James wrote:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>
>> save_bo needn't ungate vcn,  it just keeps data in memory.
>>
>> Thanks & Best Regards!
>>
>>
>> James Zhu
>>
>> ------------------------------------------------------------------------
>> *From:* Liu, Leo <Leo.Liu at amd.com>
>> *Sent:* Monday, May 17, 2021 2:07 PM
>> *To:* Zhu, James <James.Zhu at amd.com>; Zhu, James <James.Zhu at amd.com>; 
>> amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
>> *Subject:* Re: [PATCH v2 1/2] drm/amdgpu: enhance amdgpu_vcn_suspend
>>
>> Definitely, we need to move cancel_delayed_work_sync moved to before 
>> power gate.
>>
>> Should "save_bo" be step 4 before power gate ?
>>
>> Regards,
>>
>> Leo
>>
>>
>> On 2021-05-17 1:59 p.m., James Zhu wrote:
>>>
>>> Then we forgot the proposal I provided before.
>>>
>>> I think the below seq may fixed the race condition issue that we are 
>>> facing.
>>>
>>> 1. stop scheduling new jobs
>>>
>>>     for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>         if (adev->vcn.harvest_config & (1 << i))
>>>             continue;
>>>
>>>         ring = &adev->vcn.inst[i].ring_dec;
>>>         ring->sched.ready = false;
>>>
>>>         for (j = 0; j < adev->vcn.num_enc_rings; ++j) {
>>>             ring = &adev->vcn.inst[i].ring_enc[j];
>>>             ring->sched.ready = false;
>>>         }
>>>     }
>>>
>>> 2. cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>
>>> 3. SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
>>>          UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
>>>
>>> 4. amdgpu_device_ip_set_powergating_state(adev, 
>>> AMD_IP_BLOCK_TYPE_VCN,   AMD_PG_STATE_GATE);
>>>
>>> 5.  saved_bo
>>>
>>> Best Regards!
>>>
>>> James
>>>
>>> On 2021-05-17 1:43 p.m., Leo Liu wrote:
>>>>
>>>> On 2021-05-17 12:54 p.m., James Zhu wrote:
>>>>> I am wondering if there are still some jobs kept in the queue, it 
>>>>> is lucky to check
>>>>
>>>> Yes it's possible, in this case delayed handler is set, so 
>>>> cancelling once is enough.
>>>>
>>>>
>>>>>
>>>>> UVD_POWER_STATUS done, but after, fw start a new job that list in 
>>>>> the queue.
>>>>>
>>>>> To handle this situation perfectly, we need add mechanism to 
>>>>> suspend fw first.
>>>>
>>>> I think that should be handled by the sequence from 
>>>> vcn_v3_0_stop_dpg_mode().
>>>>
>>>>
>>>>>
>>>>> Another case, if it is unlucky, that  vcn fw hung at that time, 
>>>>> UVD_POWER_STATUS
>>>>>
>>>>> always keeps busy.   then it needs force powering gate the vcn hw 
>>>>> after certain time waiting.
>>>>
>>>> Yep, we still need to gate VCN power after certain timeout.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Leo
>>>>
>>>>
>>>>
>>>>>
>>>>> Best Regards!
>>>>>
>>>>> James
>>>>>
>>>>> On 2021-05-17 12:34 p.m., Leo Liu wrote:
>>>>>>
>>>>>> On 2021-05-17 11:52 a.m., James Zhu wrote:
>>>>>>> During vcn suspends, stop ring continue to receive new requests,
>>>>>>> and try to wait for all vcn jobs to finish gracefully.
>>>>>>>
>>>>>>> v2: Forced powering gate vcn hardware after few wainting retry.
>>>>>>>
>>>>>>> Signed-off-by: James Zhu <James.Zhu at amd.com> 
>>>>>>> <mailto:James.Zhu at amd.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22 
>>>>>>> +++++++++++++++++++++-
>>>>>>>   1 file changed, 21 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>> index 2016459..9f3a6e7 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>> @@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct amdgpu_device 
>>>>>>> *adev)
>>>>>>>   {
>>>>>>>       unsigned size;
>>>>>>>       void *ptr;
>>>>>>> +    int retry_max = 6;
>>>>>>>       int i;
>>>>>>>   - cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>>>>> +    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>>>> +        if (adev->vcn.harvest_config & (1 << i))
>>>>>>> +            continue;
>>>>>>> +        ring = &adev->vcn.inst[i].ring_dec;
>>>>>>> +        ring->sched.ready = false;
>>>>>>> +
>>>>>>> +        for (j = 0; j < adev->vcn.num_enc_rings; ++j) {
>>>>>>> +            ring = &adev->vcn.inst[i].ring_enc[j];
>>>>>>> +            ring->sched.ready = false;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    while (retry_max-- && 
>>>>>>> cancel_delayed_work_sync(&adev->vcn.idle_work))
>>>>>>> +        mdelay(5);
>>>>>>
>>>>>> I think it's possible to have one pending job unprocessed with 
>>>>>> VCN when suspend sequence getting here, but it shouldn't be more 
>>>>>> than one, cancel_delayed_work_sync probably return false after 
>>>>>> the first time, so calling cancel_delayed_work_sync once should 
>>>>>> be enough here. we probably need to wait longer from:
>>>>>>
>>>>>> SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
>>>>>>         UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
>>>>>>
>>>>>> to make sure the unprocessed job get done.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Leo
>>>>>>
>>>>>>
>>>>>>> +    if (!retry_max && !amdgpu_sriov_vf(adev)) {
>>>>>>> +        if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) {
>>>>>>> +            dev_warn(adev->dev, "Forced powering gate vcn 
>>>>>>> hardware!");
>>>>>>> +            vcn_v3_0_set_powergating_state(adev, 
>>>>>>> AMD_PG_STATE_GATE);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>>         for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>>>>           if (adev->vcn.harvest_config & (1 << i))
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210517/b0c08b56/attachment.htm>