[PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
Christian König
ckoenig.leichtzumerken at gmail.com
Tue May 18 18:06:11 UTC 2021
Am 18.05.21 um 19:04 schrieb James Zhu:
>
> On 2021-05-18 12:36 p.m., Christian König wrote:
>> Am 18.05.21 um 17:59 schrieb James Zhu:
>>>
>>> On 2021-05-18 11:54 a.m., Christian König wrote:
>>>>
>>>>
>>>> Am 18.05.21 um 17:45 schrieb James Zhu:
>>>>>
>>>>> On 2021-05-18 11:23 a.m., Christian König wrote:
>>>>>> Am 18.05.21 um 17:11 schrieb James Zhu:
>>>>>>> Add cancel_delayed_work_sync before set power gating state
>>>>>>> to avoid race condition issue when power gating.
>>>>>>>
>>>>>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>>>>>>> 1 file changed, 18 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>> index 0c1beef..6c5c083 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>> @@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
>>>>>>> static int vcn_v1_0_hw_fini(void *handle)
>>>>>>> {
>>>>>>> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>>>>>> + struct amdgpu_ring *ring;
>>>>>>> + int i;
>>>>>>> +
>>>>>>> + ring = &adev->vcn.inst->ring_dec;
>>>>>>> + ring->sched.ready = false;
>>>>>>> +
>>>>>>> + for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
>>>>>>> + ring = &adev->vcn.inst->ring_enc[i];
>>>>>>> + ring->sched.ready = false;
>>>>>>> + }
>>>>>>> +
>>>>>>> + ring = &adev->jpeg.inst->ring_dec;
>>>>>>> + ring->sched.ready = false;
>>>>>>
>>>>>> Thinking more about that this is a really big NAK. The scheduler
>>>>>> threads must to stay ready during a reset.
>>>>>>
>>>>>> This is controlled by the upper layer and shouldn't be messed
>>>>>> with in the hardware specific backend at all.
>>>>>
>>>>>> [JZ] I ported this from current vcn3 hw_fini. Just want to make
>>>>>> sure that no more new jobs will be scheduled after suspend
>>>>>> process starts.
>>>>> It may a redundancy, since scheduler maybe already suspend. I can
>>>>> remove those if you are sure no side effect,
>>>>
>>>> Well, we *must* remove those. This flag controls if the hardware
>>>> engine can be used for command submission and is only be set to
>>>> true/false during initial driver load.
>>>>
>>>> If you change it to false during hw_fini the engine won't work
>>>> correctly any more after GPU reset or resume.
>>> [JZ] If I recalled correctly tat hw_init will be called every time
>>> after GPU reset or suspend/resume,
>>
>> Yes that's correct.
>>
>> But before that and during GPU reset the ready flag is then false for
>> a short period of time which would result in userspace applications
>> crashing when they try to submit something.
> [JZ] Application should handle situation when submission failed
> without crash.Maybe driver should return -EAGAIN to ask application to
> submit job later when gpu is under reset/suspend-resume.
No, by design driver should always be able to accept jobs except for the
case when the hardware is unrecoverable broken.
This is how we have implemented userspace already.
>> The flag essentially says that userspace can submit jobs to the
>> scheduler. Processing of those jobs is of course only started after
>> the hardware is re-initialized, but pushing jobs down the pipe is
>> still perfectly valid in that situation.
> [JZ] I am wondering if it is requested to stop scheduling new jobs
> before save bo.
Yes, that is guaranteed. The hardware backend doesn't need to worry
about this in hw_fini() or otherwise we have a bug.
Christian.
>>
>> Christian.
>>
>>>>
>>>> If you have any idea how to document that fact then please speak
>>>> up, cause we had this problem a couple of times now.
>>>>
>>>> Just send out a patch fixing various other occasions of that.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>>> I've removed all of those a couple of years ago.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> +
>>>>>>> + cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>>>>> if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>>>>>>> - RREG32_SOC15(VCN, 0, mmUVD_STATUS))
>>>>>>> + (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>>>>>>> + RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
>>>>>>> vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>>>>>>> + }
>>>>>>> return 0;
>>>>>>> }
>>>>>>
>>>>
>>
More information about the amd-gfx
mailing list