[RFC] drm/amdgpu/sdma5.2: Avoid latencies caused by the powergating workaround
Christian König
christian.koenig at amd.com
Wed Jul 16 14:58:59 UTC 2025
On 16.07.25 16:06, Tvrtko Ursulin wrote:
>
> On 16/07/2025 14:00, Christian König wrote:
>> On 16.07.25 14:51, Tvrtko Ursulin wrote:
>>>>>>>> be disabled once GFX/SDMA is no longer active. In this particular
>>>>>>>> case there was a race condition somewhere in the internal handshaking
>>>>>>>> with SDMA which led to SDMA missing doorbells sometimes and not
>>>>>>>> executing the job even if there was work in the ring.
>>>>>>>
>>>>>>> Thank you, more or less than what I assumed.
>>>>>>>
>>>>>>> But in this case there should be no harm in holding GFXOFF disabled
>>>>>>> until the job completes (like this patch)? Only a win to avoid the SMU
>>>>>>> communication latencies while unit is powered on anyway.
>>>>>>
>>>>>> The extra latency is only on the CPU side, once the
>>>>>> amdgpu_ring_commit() is called the SDMA engine is already working.
>>>>>
>>>>> It is on the CPU side but can create bubbles in the pipeline, no? Is
>>>>> there no scope with AMD to have GFX and SDMA jobs depend on each other?
>>>>> Because, as said, I've seen some high latencies from the GFXOFF disable
>>>>> calls.
>>>>
>>>> The SDMA job is already executing at that point. The allow gfxoff
>>>> message to the firmware shouldn't come until later because it's
>>>> handled by a delayed work thread from end_use(). If you have multiple
>>>> submissions to SDMA within the delay window, the begin_use() and
>>>> end_use() will just be ref count handling and won't actually talk to
>>>> the firmware.
>>>
>>> I followed up with testing a bunch more games, and is it turns out, Cyberpunk 2077 is the only one which has this submission patterns where default GFX_OFF_DELAY_ENABLE is regularly defeated.
>>>
>>> There, around 1.2 times per second the SDMA submissions miss that 100ms hysteresis and cause a CPU latency over 100us (I only measured when >100us and ignored the rest). Average latency is ~400us and max is ~2ms. So IMHO quite bad.
>>
>> What exactly does Cyberpunk do to hit that? Are those SDMA page table updates, clears or userspace submissions?
>
> I will have to look into that to provide an answer.
If it's some kernel work we could consider using the light weight DMA instead, but we never fully exposed that yet.
>
>>> And the vast majority of those latencies come from the SMU request. Only very rarely someone hits the mutex contention path.
>>>
>>> So that was the motivation for the RFC. I suppose I could have also proposed to increase the hysteresis, but holding the GFXOFF disabled for the duration of the job sounded preferable for power consmuption.
>>>
>>> Anyway, given I only found Cyberpunk 2077 suffers from this I guess it maybe isn't to interesting to upstream for you guys. Then again it is limited to specific old SKU so maybe it should not be that controversial either? Only that Christian NAKed tying it to job lifetime. So I don't know, AMDs call.
>>
>> Well what you could do is to take a look if we couldn't simplify the SMU and/or adjust the GFX_OFF_DELAY_ENABLED.
>
> SMU stuff, as far as I can follow it, ends up with simply sending some messages to the firmware. So I am not sure what and how could be optimised there.
Well how does the SMU wait for the HW to complete the request? IIRC the SMU interface is not really made to be used like this.
It could be that we can improve quite a bit there.
> Increasing GFX_OFF_DELAY_ENABLED would work, if large enough, but I think it could be bad for power usage, depending on the workload.
>
>> On the other hand why does it help to keep GFXOFF disabled while running the SDMA job?
>
> Only because I tied it to both GFX and SDMA.
Got it. Yeah that is not really something we should do.
>
> RFC does this:
>
> 1) Marks SDMA as "needs GFXOFF workaround".
> 2) Propagates "needs GFXOFF workaround" to adev if any active ring has it set.
> 3) If adev has it set, it grabs and extra GFXOFF disable for GFX, COMPUTE and SDMA submissions, and marks those jobs as "hold GFXOFF".
> 4) Releases the GFXOFF when marked jobs are "completed" (well freed, since completion is IRQ context so hard).
>
> AFAIU from what Alex said I understood the parts of the chip handling GFX and SDMA (not sure about compute) are under the same "power gating domain" (right name?).
Correct yes, and both power and clock gating domain is the right term to use for this.
> What would you suggest to log power use during the game? Something like once per second or so?
For the game the power difference is probably so small that it isn't measurable.
The real issue are things like battery life for laptop where you only make a GFX submission every few milliseconds on the GFXOFF clock gates or even power gates the whole block in between.
This is all done inside the GPU because the extra round trip to the kernel driver on the CPU takes to long and draws to much extra power in the long term.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
More information about the amd-gfx
mailing list