[PATCH v2 1/2] drm/amdgpu: enhance amdgpu_vcn_suspend
Leo Liu
leo.liu at amd.com
Mon May 17 18:19:03 UTC 2021
To be accurate, the Bo is mapped to engine cache window, and the runtime
of engine stacks, so we should save it before the poweroff.
On 2021-05-17 2:15 p.m., Leo Liu wrote:
>
> The saved data are from the engine cache, it's the runtime of engine
> before suspend, it might be different after you have the engine
> powered off.
>
>
> Regards,
>
> Leo
>
>
>
> On 2021-05-17 2:11 p.m., Zhu, James wrote:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>
>> save_bo needn't ungate vcn, it just keeps data in memory.
>>
>> Thanks & Best Regards!
>>
>>
>> James Zhu
>>
>> ------------------------------------------------------------------------
>> *From:* Liu, Leo <Leo.Liu at amd.com>
>> *Sent:* Monday, May 17, 2021 2:07 PM
>> *To:* Zhu, James <James.Zhu at amd.com>; Zhu, James <James.Zhu at amd.com>;
>> amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
>> *Subject:* Re: [PATCH v2 1/2] drm/amdgpu: enhance amdgpu_vcn_suspend
>>
>> Definitely, we need to move cancel_delayed_work_sync moved to before
>> power gate.
>>
>> Should "save_bo" be step 4 before power gate ?
>>
>> Regards,
>>
>> Leo
>>
>>
>> On 2021-05-17 1:59 p.m., James Zhu wrote:
>>>
>>> Then we forgot the proposal I provided before.
>>>
>>> I think the below seq may fixed the race condition issue that we are
>>> facing.
>>>
>>> 1. stop scheduling new jobs
>>>
>>> for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>> if (adev->vcn.harvest_config & (1 << i))
>>> continue;
>>>
>>> ring = &adev->vcn.inst[i].ring_dec;
>>> ring->sched.ready = false;
>>>
>>> for (j = 0; j < adev->vcn.num_enc_rings; ++j) {
>>> ring = &adev->vcn.inst[i].ring_enc[j];
>>> ring->sched.ready = false;
>>> }
>>> }
>>>
>>> 2. cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>
>>> 3. SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
>>> UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
>>>
>>> 4. amdgpu_device_ip_set_powergating_state(adev,
>>> AMD_IP_BLOCK_TYPE_VCN, AMD_PG_STATE_GATE);
>>>
>>> 5. saved_bo
>>>
>>> Best Regards!
>>>
>>> James
>>>
>>> On 2021-05-17 1:43 p.m., Leo Liu wrote:
>>>>
>>>> On 2021-05-17 12:54 p.m., James Zhu wrote:
>>>>> I am wondering if there are still some jobs kept in the queue, it
>>>>> is lucky to check
>>>>
>>>> Yes it's possible, in this case delayed handler is set, so
>>>> cancelling once is enough.
>>>>
>>>>
>>>>>
>>>>> UVD_POWER_STATUS done, but after, fw start a new job that list in
>>>>> the queue.
>>>>>
>>>>> To handle this situation perfectly, we need add mechanism to
>>>>> suspend fw first.
>>>>
>>>> I think that should be handled by the sequence from
>>>> vcn_v3_0_stop_dpg_mode().
>>>>
>>>>
>>>>>
>>>>> Another case, if it is unlucky, that vcn fw hung at that time,
>>>>> UVD_POWER_STATUS
>>>>>
>>>>> always keeps busy. then it needs force powering gate the vcn hw
>>>>> after certain time waiting.
>>>>
>>>> Yep, we still need to gate VCN power after certain timeout.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Leo
>>>>
>>>>
>>>>
>>>>>
>>>>> Best Regards!
>>>>>
>>>>> James
>>>>>
>>>>> On 2021-05-17 12:34 p.m., Leo Liu wrote:
>>>>>>
>>>>>> On 2021-05-17 11:52 a.m., James Zhu wrote:
>>>>>>> During vcn suspends, stop ring continue to receive new requests,
>>>>>>> and try to wait for all vcn jobs to finish gracefully.
>>>>>>>
>>>>>>> v2: Forced powering gate vcn hardware after few wainting retry.
>>>>>>>
>>>>>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>>>>>> <mailto:James.Zhu at amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22
>>>>>>> +++++++++++++++++++++-
>>>>>>> 1 file changed, 21 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>> index 2016459..9f3a6e7 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>> @@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct amdgpu_device
>>>>>>> *adev)
>>>>>>> {
>>>>>>> unsigned size;
>>>>>>> void *ptr;
>>>>>>> + int retry_max = 6;
>>>>>>> int i;
>>>>>>> - cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>>>>> + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>>>> + if (adev->vcn.harvest_config & (1 << i))
>>>>>>> + continue;
>>>>>>> + ring = &adev->vcn.inst[i].ring_dec;
>>>>>>> + ring->sched.ready = false;
>>>>>>> +
>>>>>>> + for (j = 0; j < adev->vcn.num_enc_rings; ++j) {
>>>>>>> + ring = &adev->vcn.inst[i].ring_enc[j];
>>>>>>> + ring->sched.ready = false;
>>>>>>> + }
>>>>>>> + }
>>>>>>> +
>>>>>>> + while (retry_max-- &&
>>>>>>> cancel_delayed_work_sync(&adev->vcn.idle_work))
>>>>>>> + mdelay(5);
>>>>>>
>>>>>> I think it's possible to have one pending job unprocessed with
>>>>>> VCN when suspend sequence getting here, but it shouldn't be more
>>>>>> than one, cancel_delayed_work_sync probably return false after
>>>>>> the first time, so calling cancel_delayed_work_sync once should
>>>>>> be enough here. we probably need to wait longer from:
>>>>>>
>>>>>> SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
>>>>>> UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
>>>>>>
>>>>>> to make sure the unprocessed job get done.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Leo
>>>>>>
>>>>>>
>>>>>>> + if (!retry_max && !amdgpu_sriov_vf(adev)) {
>>>>>>> + if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) {
>>>>>>> + dev_warn(adev->dev, "Forced powering gate vcn
>>>>>>> hardware!");
>>>>>>> + vcn_v3_0_set_powergating_state(adev,
>>>>>>> AMD_PG_STATE_GATE);
>>>>>>> + }
>>>>>>> + }
>>>>>>> for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>>>> if (adev->vcn.harvest_config & (1 << i))
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210517/b0c08b56/attachment.htm>
More information about the amd-gfx
mailing list