[PATCH v1] drm/xe: Rework throttle ABI
Raag Jadav
raag.jadav at intel.com
Fri Oct 25 19:04:14 UTC 2024
On Fri, Oct 25, 2024 at 10:03:41AM -0700, Matt Roper wrote:
> On Fri, Oct 25, 2024 at 02:52:38PM +0530, Raag Jadav wrote:
> > Current implementation adds multiple sysfs entries for getting GT
> > throttle status and its reasons, which forces the user to read multiple
> > entries for evaluating the result. Since output of each entry is based
> > on the same underlying hardware register and considering the access type
> > of this register is RO/v, the value of this register can change at any
> > given point, even between subsequent sysfs reads. This makes current
> > implementation fundamentally flawed which can produce inconsistent results.
> >
> > Rework throttle ABI and introduce throttle_status attribute which will
> > provide throttle status through a oneshot register read, making it
> > relatively less error prone. The new ABI will provide throttle reasons
> > in a string based list based on the respective bits which are set in the
> > hardware. Empty output means no bits are set and hence no throttling.
>
> But the old ABI is already released, and we have platforms with
> force_probe lifted already. Presumably userspace software is already
> using the existing interface (otherwise it never would have been allowed
> to land upstream), so that means we can't change it in incompatible ways
> anymore; we're locked into supporting the current interface forever on
> these platforms.
>
> We can change the ABI for _future_ platforms if it makes sense, but it's
> too late to make compatibility-breaking changes for LNL and BMG.
It landed upstream pretty recently AFAICT, atleast in xe.
So maybe worth reconsidering.
> >
> > $ cat /sys/devices/.../tile0/gt0/freq0/throttle_status
> > prochot
> > thermal
> > ratl
> > thermalert
> > tdc
> > pl4
> > pl1
> > pl2
> >
> > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2810
>
> I'm not sure what this ticket is about, but it was already closed
> several weeks ago due to no longer reproducing.
I'm sure we all have our "it works on my machine" moment :D
> The ABI change here doesn't seem to be directly related to this ticket.
We can always create new tickets, and that's not the point.
The point is we're stuck with unreliable ABI.
Side note: The logs in the ticket may help connect the dots with commit message.
Raag
More information about the Intel-xe
mailing list