[PATCH] drm/amdgpu/vcn: fix video profile race condition (v3)

David Wu davidwu2 at amd.com
Thu Aug 14 19:18:39 UTC 2025


On 2025-08-14 14:00, Alex Deucher wrote:
> On Thu, Aug 14, 2025 at 12:44 PM David Wu<davidwu2 at amd.com> wrote:
>>
>> On 2025-08-14 12:01, Alex Deucher wrote:
>>> On Thu, Aug 14, 2025 at 11:35 AM David Wu<davidwu2 at amd.com> wrote:
>>>> On 2025-08-14 08:56, Alex Deucher wrote:
>>>>> On Wed, Aug 13, 2025 at 7:06 PM Wu, David<davidwu2 at amd.com> wrote:
>>>>>> On 8/13/2025 6:11 PM, Alex Deucher wrote:
>>>>>>> On Wed, Aug 13, 2025 at 5:47 PM Wu, David<davidwu2 at amd.com> wrote:
>>>>>>>> On 8/13/2025 5:03 PM, Alex Deucher wrote:
>>>>>>>>> On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar
>>>>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
>>>>>>>>>> On 8/14/2025 1:35 AM, Alex Deucher wrote:
>>>>>>>>>>> On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar
>>>>>>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
>>>>>>>>>>>> Hi Alex, Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>> I see David's concern but his suggestion yet wont solve the problem,
>>>>>>>>>>>> neither the current form , reason :-
>>>>>>>>>>>>
>>>>>>>>>>>> The emitted fence count and total submission count are fast transients
>>>>>>>>>>>> which frequently become 0 in between video decodes (between jobs) even
>>>>>>>>>>>> with the atomics and locks there can be a switch of video power profile,
>>>>>>>>>>>> in the current form of patch that window is minimized, but still can
>>>>>>>>>>>> happen if stress tested. But power state of any instance becoming zero
>>>>>>>>>>> Can you explain how this can happen?  I'm not seeing it.
>>>>>>>>>> Consider this situation, inst0 and inst1 actively decoding, inst0 decode
>>>>>>>>>> completes, delayed idle work starts.
>>>>>>>>>> inst0 idle handler can read 0 total fences and 0 total submission count,
>>>>>>>>>> even if inst1 is actively decoding,
>>>>>>>>>> that's between the jobs,
>>>>>>>>>>        - as begin_use increaments vcn.total_submission_cnt and end_use
>>>>>>>>>> decreaments vcn.total_submission_cnt that can be 0.
>>>>>>>>>>        - if outstanding fences are cleared and no new emitted fence, between
>>>>>>>>>> jobs , can be 0.
>>>>>>>>>>        - both of the above conditions do not mean video decode is complete on
>>>>>>>>>> inst1, it is actively decoding.
>>>>>>>>> How can there be active decoding without an outstanding fence?  In
>>>>>>>>> that case, total_fences (fences from both instances) would be non-0.
>>>>>>>> I think it should be non-0.
>>>>>>>> I do see a hiccup possible - i.e the power switching from ON to OFF then
>>>>>>>> ON in the
>>>>>>>> middle of decoding, i.e inst0 idle handler turns it off then inst1 turns
>>>>>>>> it on.
>>>>>>> How would that happen? As long as there submission cnt is non-0 and
>>>>>>> there are outstanding fences on any instance, the video profile will
>>>>>>> stay active.
>>>>>> there could be no jobs but it doesn't timeout yet and new jobs will come in
>>>>>> any ms - note all fences are done at this time. The idle handler sees no
>>>>>> fences
>>>>>> and no jobs so it turns off the power - but just ms later a new job is
>>>>>> submitted
>>>>>> from the same decode session which could be mpv player as it does not
>>>>>> need to
>>>>>> submit jobs without delays. This will turn on the power.
>>>>> I'm not following.  Every submission will start with begin_use().
>>>> yes - it does - it can power on vcn but this happens in the middle of a
>>>> decode session which has 10s timeout to call its own idle handler - in fact
>>>> the other instance's idle handler will power off vcn because it does not
>>>> know it needs
>>>> to wait until the decoding session times out.
>>> I don't follow.  If there are no outstanding fences, there is no
>>> reason to not power down the VCN instance and disable the video
>>> profile. If there are still outstanding fences, then the VCN instance
>>> those fences are associated with will stay on and the video profile
>>> will stay enabled.  If the engine hangs and eventually gets reset, the
>>> fence will be signalled and then there will be no outstanding fences
>>> so the idle handler will eventually disable the power profile.  The
>>> idle handler will keep getting rescheduled as long as there is still
>>> oustanding work.
>> inst0 and inst1:
>> inst0 sends jobA, then ends jobA and no more job submitted in 500ms and
>> job queue is empty - at this point  inst1's idle handler sees no
>> outstanding fences/jobs
>> then power off.  However inst0 starts to submit job after 500ms - inst0'
>> idle handler
>> has not started/scheduled to run but inst1's has finished already which
>> does not know inst0 has not timed out or called its own idle handler.
>> This violates the
>> logic for idle handler's timeout condition. (i.e 10s timeout designed
>> but timed out in 500ms)
>> all this means it powered down too early for inst0.
> I still don't follow.  Here's a sample flow.  Job comes in on inst 0
> and then slightly later on inst 1.
>
> Inst 0 job submission
> Inst 0 calls begin_use().  This cancels the current inst worker
> thread.  It enables the video profile and ungates the instance.
> IBs and fence packets get submitted to instance 0 of the engine
> Inst 0 calls end_use().  This schedules the worker thread for
> VCN_IDLE_TIMEOUT jiffies in the future.
>
> Inst 1 job submission:
> Inst 1 calls begin_use().  This cancels the current inst worker
> thread.  It sees the video profile is enabled and ungates the
> instance.
> IBs and fence packets get submitted to instance 1 of the engine
> Inst 1 calls end_use().  This schedules the worker thread for
> VCN_IDLE_TIMEOUT jiffies in the future.
>
> inst 0 work hander runs.  Sees outstanding fences on inst 0; skips
> powergating inst 0, skips disabling video profile. Schedules the
> worker thread for VCN_IDLE_TIMEOUT jiffies in the future.
>
> inst 0 IB completes and fence signals
>
> inst 1 IB completes and fence signals
>
> inst 1 work hander runs.  Sees no outstanding fences on inst 1.
> powergates inst 1.  Check if there are any outstanding fences on other
> instances.  Sees the no fences from inst 0 so disables the video
> profile.
now there are jobs coming from inst0, so inst 0 idle handler won't run.
> inst 0 work hander runs.  Sees no outstanding fences on inst 0.
> powergates inst 0.  Check if there are any outstanding fences on other
> instances.  Sees the no fences from inst 1, sees that video profile is
> already disabled.
inst 0 work handler runs? could or could not - right? depends on if 
there are more jobs for inst0 and also
if VCN_IDLE_TIMEOUT jiffies has passed for inst0. There is possibly a 
point in the sequence that inst0
stops submit jobs but its idle handler has not run yet. Should we wait 
until all instances have finished their idle handlers?
if not then we will run into a power OFF(by inst1) -> ON(by inst0) for 
the active instance(inst0, expected ON). (active - I mean
those have not timed out in VCN_IDLE_TIMEOUT jiffies and still can 
submit jobs in any time.)
if this could happen then we powered off too early. (I did not say we 
cannot do it but it is not expected)
> You can insert additional job submissions anywhere you want in the timeline.
>
> Alex
>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250814/3c56a8ea/attachment.htm>


More information about the amd-gfx mailing list