[PATCH 1/4] drm/amdgpu/vcn: fix race condition issue for vcn start
Christian König
christian.koenig at amd.com
Wed Mar 4 08:53:42 UTC 2020
Am 03.03.20 um 23:48 schrieb James Zhu:
>
> On 2020-03-03 2:03 p.m., James Zhu wrote:
>>
>> On 2020-03-03 1:44 p.m., Christian König wrote:
>>> Am 03.03.20 um 19:16 schrieb James Zhu:
>>>> Fix race condition issue when multiple vcn starts are called.
>>>>
>>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 4 ++++
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
>>>> 2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>> index f96464e..aa7663f 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>> @@ -63,6 +63,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
>>>> int i, r;
>>>> INIT_DELAYED_WORK(&adev->vcn.idle_work,
>>>> amdgpu_vcn_idle_work_handler);
>>>> + mutex_init(&adev->vcn.vcn_pg_lock);
>>>> switch (adev->asic_type) {
>>>> case CHIP_RAVEN:
>>>> @@ -210,6 +211,7 @@ int amdgpu_vcn_sw_fini(struct amdgpu_device *adev)
>>>> }
>>>> release_firmware(adev->vcn.fw);
>>>> + mutex_destroy(&adev->vcn.vcn_pg_lock);
>>>> return 0;
>>>> }
>>>> @@ -321,6 +323,7 @@ void amdgpu_vcn_ring_begin_use(struct
>>>> amdgpu_ring *ring)
>>>> struct amdgpu_device *adev = ring->adev;
>>>> bool set_clocks =
>>>> !cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>> + mutex_lock(&adev->vcn.vcn_pg_lock);
>>>
>>> That still won't work correctly here.
>>>
>>> The whole idea of the cancel_delayed_work_sync() and
>>> schedule_delayed_work() dance is that you have exactly one user of
>>> that. If you have multiple rings that whole thing won't work correctly.
>>>
>>> To fix this you need to call mutex_lock() before
>>> cancel_delayed_work_sync() and schedule_delayed_work() before
>>> mutex_unlock().
>>
>> Big lock definitely works. I am trying to use as smaller lock as
>> possible here. the share resource which needs protect here are power
>> gate process and dpg mode switch process.
>>
>> if we move mutex_unlock() before schedule_delayed_work(. I am
>> wondering what are the other necessary resources which need protect.
>
> By the way, cancel_delayed_work_sync() supports multiple thread
> itself, so I didn't put it into protection area.
Yeah, but that's correct but it still won't working correctly :)
See the problem is that only for the first caller
cancel_delayed_work_sync() returns true because it canceled the delayed
work.
For all others it returns false and those would then think that they
need to ungate the power.
The only solution I see is to either put both the
cancel_delayed_work_sync() and schedule_delayed_work() under the same
mutex protection or start to use an atomic or other counter to note
concurrent processing.
> power gate is shared by all VCN IP instances and different rings , so
> it needs be put into protection area.
>
> each ring's job itself is serialized by scheduler. it doesn't need be
> put into this protection area.
Yes, those should work as expected.
Regards,
Christian.
>
>>
>> Thanks!
>>
>> James
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>> if (set_clocks) {
>>>> amdgpu_gfx_off_ctrl(adev, false);
>>>> amdgpu_device_ip_set_powergating_state(adev,
>>>> AMD_IP_BLOCK_TYPE_VCN,
>>>> @@ -345,6 +348,7 @@ void amdgpu_vcn_ring_begin_use(struct
>>>> amdgpu_ring *ring)
>>>> adev->vcn.pause_dpg_mode(adev, ring->me, &new_state);
>>>> }
>>>> + mutex_unlock(&adev->vcn.vcn_pg_lock);
>>>> }
>>>> void amdgpu_vcn_ring_end_use(struct amdgpu_ring *ring)
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>> index 6fe0573..2ae110d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>> @@ -200,6 +200,7 @@ struct amdgpu_vcn {
>>>> struct drm_gpu_scheduler
>>>> *vcn_dec_sched[AMDGPU_MAX_VCN_INSTANCES];
>>>> uint32_t num_vcn_enc_sched;
>>>> uint32_t num_vcn_dec_sched;
>>>> + struct mutex vcn_pg_lock;
>>>> unsigned harvest_config;
>>>> int (*pause_dpg_mode)(struct amdgpu_device *adev,
>>>
More information about the amd-gfx
mailing list