[PATCH v2 2/2] drm/i915: Fix gt reset with GuC submission is disabled
Andi Shyti
andi.shyti at linux.intel.com
Tue Apr 23 14:42:51 UTC 2024
Hi Nirmoy,
> > > Currently intel_gt_reset() kills the GuC and then resets requested
> > > engines. This is problematic because there is a dedicated CSB FIFO
> > > which only GuC can access and if that FIFO fills up, the hardware
> > > will block on the next context switch until there is space that means
> > > the system is effectively hung. If an engine is reset whilst actively
> > > executing a context, a CSB entry will be sent to say that the context
> > > has gone idle. Thus if reset happens on a very busy system then
> > > killing GuC before killing the engines will lead to deadlock because
> > > of filled up CSB FIFO.
> > is this a fix?
>
> I went quite far back in the commit logs, and it appears to me that we've
> always been using the current reset flow.
>
> I believe we don't perform a GT reset immediately after sending a number of
> requests, which is what the current failed test is doing.
>
> So, I don't think there will be any visible impact on the user with the
> current flow.
Agree... good thinking here... we often abuse on the Fixes tag.
Thanks,
Andi
More information about the Intel-gfx
mailing list