[Intel-gfx] [PATCH v2 2/3] drm/i915/guc: Close deregister-context race against CT-loss
Teres Alexis, Alan Previn
alan.previn.teres.alexis at intel.com
Mon Aug 28 21:06:43 UTC 2023
Additional update from the most recent testing.
When relying solely on guc_lrc_desc_unpin getting a failure from deregister_context
as a means for identifying that we are in the "deregister-context-vs-suspend-late" race,
it is too late a location to handle this safely. This is because one of the
first things destroyed_worker_func does it to take a gt pm wakeref - which triggers
the gt_unpark function that does a whole lot bunch of other flows including triggering more
workers and taking additional refs. That said, its best to not even call
deregister_destroyed_contexts from the worker when !intel_guc_is_ready (ct-is-disabled).
...alan
On Fri, 2023-08-25 at 11:54 -0700, Teres Alexis, Alan Previn wrote:
> just a follow up note-to-self:
>
> On Tue, 2023-08-15 at 12:08 -0700, Teres Alexis, Alan Previn wrote:
> > On Tue, 2023-08-15 at 09:56 -0400, Vivi, Rodrigo wrote:
> > > On Mon, Aug 14, 2023 at 06:12:09PM -0700, Alan Previn wrote:
> > > >
> [snip]
>
> in guc_submission_send_busy_loop, we are incrementing the following
> that needs to be decremented if the function fails.
>
> atomic_inc(&guc->outstanding_submission_g2h);
>
> also, it seems that even with thie unroll design - we are still
> leaking a wakeref elsewhere. this is despite a cleaner redesign of
> flows in function "guc_lrc_desc_unpin"
> (discussed earlier that wasnt very readible).
>
> will re-rev today but will probably need more follow ups
> tracking that one more leaking gt-wakeref (one in thousands-cycles)
> but at least now we are not hanging mid-suspend.. we bail from suspend
> with useful kernel messages.
>
>
>
>
More information about the Intel-gfx
mailing list