[PATCH v2 1/4] drm/amdgpu/vcn: fix race condition issue for vcn start

James Zhu jamesz at amd.com
Thu Mar 5 14:34:27 UTC 2020


On 2020-03-05 6:27 a.m., Christian König wrote:
> Am 05.03.20 um 12:25 schrieb Christian König:
>> Am 04.03.20 um 17:34 schrieb James Zhu:
>>> Fix race condition issue when multiple vcn starts are called.
>>>
>>> v2: Removed checking the return value of cancel_delayed_work_sync()
>>> to prevent possible races here.
>>>
>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>
>> Reviewed-by: Christian König <christian.koenig at amd.com>
>
> One thing worth noting is that in theory you could run into the issue 
> that one ring restarts the timer while another ring is still preparing 
> the engine for usage.

Yes, you are right, The current timeout setting is 1 sec to guarantee 
one dec/enc can be finished for all formats.  the preparing process 
should nuch less than this setting.

otherwise there are some other bugs that needs to be fixed.

By the way, maybe we shouldn't just rely on fence check to determine if 
any job is left. We can maintain a dec/enc submission count(patch 2 
already added for enc). how do you think?

Best Regards!

James

>
> So the timeout should be large enough to guarantee that this never 
> causes problems.
>
> Regards,
> Christian.
>
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 15 +++++++++------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h |  1 +
>>>   2 files changed, 10 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>> index f96464e..8a8406b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>> @@ -63,6 +63,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
>>>       int i, r;
>>>         INIT_DELAYED_WORK(&adev->vcn.idle_work, 
>>> amdgpu_vcn_idle_work_handler);
>>> +    mutex_init(&adev->vcn.vcn_pg_lock);
>>>         switch (adev->asic_type) {
>>>       case CHIP_RAVEN:
>>> @@ -210,6 +211,7 @@ int amdgpu_vcn_sw_fini(struct amdgpu_device *adev)
>>>       }
>>>         release_firmware(adev->vcn.fw);
>>> +    mutex_destroy(&adev->vcn.vcn_pg_lock);
>>>         return 0;
>>>   }
>>> @@ -319,13 +321,13 @@ static void 
>>> amdgpu_vcn_idle_work_handler(struct work_struct *work)
>>>   void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
>>>   {
>>>       struct amdgpu_device *adev = ring->adev;
>>> -    bool set_clocks = !cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>   -    if (set_clocks) {
>>> -        amdgpu_gfx_off_ctrl(adev, false);
>>> -        amdgpu_device_ip_set_powergating_state(adev, 
>>> AMD_IP_BLOCK_TYPE_VCN,
>>> -               AMD_PG_STATE_UNGATE);
>>> -    }
>>> +    cancel_delayed_work_sync(&adev->vcn.idle_work);
>>> +
>>> +    mutex_lock(&adev->vcn.vcn_pg_lock);
>>> +    amdgpu_gfx_off_ctrl(adev, false);
>>> +    amdgpu_device_ip_set_powergating_state(adev, 
>>> AMD_IP_BLOCK_TYPE_VCN,
>>> +           AMD_PG_STATE_UNGATE);
>>>         if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
>>>           struct dpg_pause_state new_state;
>>> @@ -345,6 +347,7 @@ void amdgpu_vcn_ring_begin_use(struct 
>>> amdgpu_ring *ring)
>>>             adev->vcn.pause_dpg_mode(adev, ring->me, &new_state);
>>>       }
>>> +    mutex_unlock(&adev->vcn.vcn_pg_lock);
>>>   }
>>>     void amdgpu_vcn_ring_end_use(struct amdgpu_ring *ring)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>> index 6fe0573..2ae110d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>> @@ -200,6 +200,7 @@ struct amdgpu_vcn {
>>>       struct drm_gpu_scheduler 
>>> *vcn_dec_sched[AMDGPU_MAX_VCN_INSTANCES];
>>>       uint32_t         num_vcn_enc_sched;
>>>       uint32_t         num_vcn_dec_sched;
>>> +    struct mutex         vcn_pg_lock;
>>>         unsigned    harvest_config;
>>>       int (*pause_dpg_mode)(struct amdgpu_device *adev,
>>
>


More information about the amd-gfx mailing list