[Intel-gfx] [PATCH] drm/i915/hwmon: Use 0 to designate disabled PL1 power limit
Dixit, Ashutosh
ashutosh.dixit at intel.com
Fri Mar 31 02:17:49 UTC 2023
On Thu, 30 Mar 2023 08:44:34 -0700, Rodrigo Vivi wrote:
>
> On Wed, Mar 29, 2023 at 10:50:09PM -0700, Dixit, Ashutosh wrote:
> > On Tue, 28 Mar 2023 16:35:43 -0700, Ashutosh Dixit wrote:
> > >
> > > On ATSM the PL1 limit is disabled at power up. The previous uapi assumed
> > > that the PL1 limit is always enabled and therefore did not have a notion of
> > > a disabled PL1 limit. This results in erroneous PL1 limit values when the
> > > PL1 limit is disabled. For example at power up, the disabled ATSM PL1 limit
> > > was previously shown as 0 which means a low PL1 limit whereas the limit
> > > being disabled actually implies a high effective PL1 limit value.
> > >
> > > To get round this problem, the PL1 limit uapi is expanded to include a
> > > special value 0 to designate a disabled PL1 limit.
> >
> > This patch is another attempt to show when the PL1 power limit is disabled
> > and to disable it when it needs to. Previous abandoned attempts to do this
> > are [1] and [2].
> >
> > The preferred way to do this was [2] but that was NAK'd by hwmon folks (see
> > [2]). That is why here we fall back on the approach in [1].
>
> I still don't get it, but let's move on...
>
> >
> > This patch is identical to [1] except that the value used to disable the
> > PL1 limit has been changed to 0 (from -1 in [1]) as was suggested in [2]
> > (both -1 and 0 seem ok for the purpose).
> >
> > > Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
> > > Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/8060
> >
> > The link between this patch and these pretty serious bugs might not be
> > immediately clear so here's an explanation:
> >
> > * Because on ATSM the PL1 power limit is disabled on power up and there
> > were no means to enable it, in 6fd3d8bf89fc we implemented the means to
> > enable the limit when the PL1 hwmon entry (power1_max) was written to.
> >
> > * Now there is an IGT igt at i915_hwmon@hwmon_write which (a) reads orig value
> > from all hwmon sysfs (b) does a bunch of random writes and finally (c)
> > restores the orig value read. On ATSM since the orig value was 0, when
> > the IGT restores the 0 value, the PL1 limit is now enabled with a value
> > of 0.
> >
> > * PL1 limit of 0 implies a low PL1 limit which causes GPU freq to fall to
> > 100 MHz. This causes GuC FW load and several IGT's to start timing out
> > and gives rise the above (and even more) bugs about GuC FW load timing
> > out.
>
> I believe these 3 bullets are key information that deserves to be in
> the commit message itself.
Done in v2.
>
> With that there,
>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
Thanks.
--
Ashutosh
>
>
> >
> > * After this patch, writing 0 would disable the PL1 limit instead of
> > enabling it, avoiding the freq drop issue above, and resolving this Intel
> > CI issue.
> >
> > Thanks.
> > --
> > Ashutosh
> >
> > [1] https://patchwork.freedesktop.org/patch/522612/?series=113972&rev=1
> > [2] https://patchwork.freedesktop.org/patch/522652/?series=113984&rev=1
More information about the Intel-gfx
mailing list