[PATCH 3/3] drm/i915/hwmon: Block waiting for GuC reset to complete

Dixit, Ashutosh ashutosh.dixit at intel.com
Thu Apr 20 16:26:28 UTC 2023


On Thu, 20 Apr 2023 08:43:52 -0700, Rodrigo Vivi wrote:
>

Hi Rodrigo,

> On Thu, Apr 20, 2023 at 08:57:24AM +0100, Tvrtko Ursulin wrote:
> >
> > On 19/04/2023 23:10, Dixit, Ashutosh wrote:
> > > On Wed, 19 Apr 2023 06:21:27 -0700, Tvrtko Ursulin wrote:
> > > >
> > >
> > > Hi Tvrtko,
> > >
> > > > On 10/04/2023 23:35, Ashutosh Dixit wrote:
> > > > > Instead of erroring out when GuC reset is in progress, block waiting for
> > > > > GuC reset to complete which is a more reasonable uapi behavior.
> > > > >
> > > > > v2: Avoid race between wake_up_all and waiting for wakeup (Rodrigo)
> > > > >
> > > > > Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/i915/i915_hwmon.c | 38 +++++++++++++++++++++++++++----
> > > > >    1 file changed, 33 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
> > > > > index 9ab8971679fe3..8471a667dfc71 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_hwmon.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> > > > > @@ -51,6 +51,7 @@ struct hwm_drvdata {
> > > > >	char name[12];
> > > > >	int gt_n;
> > > > >	bool reset_in_progress;
> > > > > +	wait_queue_head_t waitq;
> > > > >    };
> > > > >      struct i915_hwmon {
> > > > > @@ -395,16 +396,41 @@ hwm_power_max_read(struct hwm_drvdata *ddat, long *val)
> > > > >    static int
> > > > >    hwm_power_max_write(struct hwm_drvdata *ddat, long val)
> > > > >    {
> > > > > +#define GUC_RESET_TIMEOUT msecs_to_jiffies(2000)
> > > > > +
> > > > > +	int ret = 0, timeout = GUC_RESET_TIMEOUT;
> > > >
> > > > Patch looks good to me
> > >
> > > Great, thanks :)
> > >
> > > > apart that I am not sure what is the purpose of the timeout? This is just
> > > > the sysfs write path or has more callers?
> > >
> > > It is just the sysfs path, but the sysfs is accessed also by the oneAPI
> > > stack (Level 0). In the initial version I also didn't have the timeout
> > > thinking that the app can send a signal to the blocked thread to unblock
> > > it. I introduced the timeout after Rodrigo brought it up and I am now
> > > thinking maybe it's better to have the timeout in the driver since the app
> > > has no knowledge of how long GuC resets can take. But I can remove it if
> > > you think it's not needed.
> >
> > Maybe I am missing something but I don't get why we would need to provide a
> > timeout facility in sysfs? If the library writes here to configure something
> > it already has to expect a blocking write by the nature of a a write(2) and
> > sysfs contract. It can take long for any reason so I hope we are not
> > guaranteeing some latency number to someone? Or the concern is just about
> > things getting stuck? In which case I think Ctrl-C is the answer because
> > ETIME is not even listed as an errno for write(2).

Hmm, good point.

> I suggested the timeout on the other version because of that race,
> which is fixed now with this approach. It is probably better to remove
> it now to avoid confusions. I'm sorry about that.

No problem, I've removed the timeout in the latest version.

Thanks for the R-b.

Ashutosh


More information about the dri-devel mailing list