[RFC] drm/amdgpu/sdma5.2: Avoid latencies caused by the powergating workaround
Christian König
christian.koenig at amd.com
Wed Jul 16 13:00:39 UTC 2025
On 16.07.25 14:51, Tvrtko Ursulin wrote:
>>>>>> be disabled once GFX/SDMA is no longer active. In this particular
>>>>>> case there was a race condition somewhere in the internal handshaking
>>>>>> with SDMA which led to SDMA missing doorbells sometimes and not
>>>>>> executing the job even if there was work in the ring.
>>>>>
>>>>> Thank you, more or less than what I assumed.
>>>>>
>>>>> But in this case there should be no harm in holding GFXOFF disabled
>>>>> until the job completes (like this patch)? Only a win to avoid the SMU
>>>>> communication latencies while unit is powered on anyway.
>>>>
>>>> The extra latency is only on the CPU side, once the
>>>> amdgpu_ring_commit() is called the SDMA engine is already working.
>>>
>>> It is on the CPU side but can create bubbles in the pipeline, no? Is
>>> there no scope with AMD to have GFX and SDMA jobs depend on each other?
>>> Because, as said, I've seen some high latencies from the GFXOFF disable
>>> calls.
>>
>> The SDMA job is already executing at that point. The allow gfxoff
>> message to the firmware shouldn't come until later because it's
>> handled by a delayed work thread from end_use(). If you have multiple
>> submissions to SDMA within the delay window, the begin_use() and
>> end_use() will just be ref count handling and won't actually talk to
>> the firmware.
>
> I followed up with testing a bunch more games, and is it turns out, Cyberpunk 2077 is the only one which has this submission patterns where default GFX_OFF_DELAY_ENABLE is regularly defeated.
>
> There, around 1.2 times per second the SDMA submissions miss that 100ms hysteresis and cause a CPU latency over 100us (I only measured when >100us and ignored the rest). Average latency is ~400us and max is ~2ms. So IMHO quite bad.
What exactly does Cyberpunk do to hit that? Are those SDMA page table updates, clears or userspace submissions?
>
> And the vast majority of those latencies come from the SMU request. Only very rarely someone hits the mutex contention path.
>
> So that was the motivation for the RFC. I suppose I could have also proposed to increase the hysteresis, but holding the GFXOFF disabled for the duration of the job sounded preferable for power consmuption.
>
> Anyway, given I only found Cyberpunk 2077 suffers from this I guess it maybe isn't to interesting to upstream for you guys. Then again it is limited to specific old SKU so maybe it should not be that controversial either? Only that Christian NAKed tying it to job lifetime. So I don't know, AMDs call.
Well what you could do is to take a look if we couldn't simplify the SMU and/or adjust the GFX_OFF_DELAY_ENABLED.
On the other hand why does it help to keep GFXOFF disabled while running the SDMA job?
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
More information about the amd-gfx
mailing list