[RFC v3 06/14] drm/sched: Implement RR via FIFO
Michel Dänzer
michel.daenzer at mailbox.org
Wed Apr 2 13:22:04 UTC 2025
On 2025-04-02 14:00, Philipp Stanner wrote:
> On Wed, 2025-04-02 at 12:58 +0200, Michel Dänzer wrote:
>> On 2025-04-02 12:46, Philipp Stanner wrote:
>>> On Mon, 2025-03-31 at 21:16 +0100, Tvrtko Ursulin wrote:
>>>> Round-robin being the non-default policy and unclear how much it
>>>> is
>>>> used,
>>>> we can notice that it can be implemented using the FIFO data
>>>> structures if
>>>> we only invent a fake submit timestamp which is monotonically
>>>> increasing
>>>> inside drm_sched_rq instances.
>>>>
>>>> So instead of remembering which was the last entity the scheduler
>>>> worker
>>>> picked, we can bump the picked one to the bottom of the tree,
>>>> achieving
>>>> the same round-robin behaviour.
>>>>
>>>> Advantage is that we can consolidate to a single code path and
>>>> remove
>>>> a
>>>> bunch of code. Downside is round-robin mode now needs to lock on
>>>> the
>>>> job
>>>> pop path but that should not be visible.
>>>
>>> Why did you decide to do it that way and then later remove RR &
>>> FIFO
>>> alltogether in patch 10, basically?
>>>
>>> I think the far cleaner way for our development-process would be a
>>> separate patch(-series) that *removes* RR completely. Advantages
>>> are:
>>>
>>> 1. It should be relatively easy to do
>>> 2. It would simplify the existing code base independently of
>>> what
>>> happens with your RFC series here
>>> 3. Before changing everyone's scheduling policy to a completely
>>> new,
>>> deadline-based one, we could first be sure for a few release
>>> cycles that everyone is now on FIFO, establishing common
>>> ground.
>>> 4. We could CC every- and anyone who might use RR or might know
>>> someone who does
>>> 5. If it turns out we screwed up and someone really relies on
>>> RR, it
>>> would be easy to revert.
>>>
>>> I am not aware of any RR users and have, in past discussions, never
>>> heard of any. So removing it is more tempting for the above
>>> reasons.
>>
>> https://gitlab.freedesktop.org/drm/amd/-/issues/2516 has a bunch of
>> RR users...
>
> Right, there's a number of people complaining about the regression. But
> what I'm interested in is: how did it evolve since then. Are there
> distributions who set the module parameter? Does Steam do it? Or is it
> individual users who work around the problem that way?
I know only of the latter.
> https://gitlab.freedesktop.org/drm/amd/-/issues/2516#note_2679509
>
> ^ this comment for example seems to indicate that on newer Wayland
> versions part of the problem has vanished?
That's about using the Wine wayland driver (which uses the Wayland protocol directly) instead of the x11 driver (which uses the X11 protocol via Xwayland). Xwayland not being involved can avoid at least some of the issues (in particular, the scenario I described in https://gitlab.freedesktop.org/drm/amd/-/issues/2516#note_2119750 can't happen then). That doesn't solve the issues when Xwayland is involved though, just avoids them.
--
Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer
https://redhat.com \ Libre software enthusiast
More information about the amd-gfx
mailing list