[PATCH] drm/xe/pf: Move VFs reprovisioning to worker

Mon Jan 27 18:28:24 UTC 2025

On Mon, 2025-01-27 at 19:05 +0100, Michal Wajdeczko wrote:
> 
> 
> On 27.01.2025 18:07, Summers, Stuart wrote:
> > On Sat, 2025-01-25 at 22:55 +0100, Michal Wajdeczko wrote:
> > > Since the GuC is reset during GT reset, we need to re-send the
> > > entire SR-IOV provisioning configuration to the GuC. But since
> > > this whole configuration is protected by the PF master mutex and
> > > we can't avoid making allocations under this mutex (like during
> > > LMEM provisioning), we can't do this reprovisioning from gt-reset
> > > path if we want to be reclaim-safe. Move VFs reprovisioning to a
> > > async worker that we will start from the gt-reset path.
> > 
> > Admittedly I don't fully understand the PF restart flow here from
> > userspace. Is there some race condition we need to check for
> > whether
> > GuC completes base configuration before the PF config comes
> > through? Is
> > it possible we can get into either some deadlock between the native
> > init and the PF init or start running content on some engines in
> > native
> > mode before PF completes?
> 
> Even if due to a race we start running PF content on engines before
> we
> finish GuC reconfiguration from native to SRIOV mode, then that
> content
> may just run a little longer than before a reset, due to initial
> "infinity" execution quantum or preemption timeout settings, which in
> SRIOV mode were likely reconfigured to a smaller values.
> 
> Also any race with new provisioning requests from the user space
> should
> be harmless since during a PF restart we will resend whole SRIOV
> configuration, including any latest changes done between GT reset and
> PF
> restart.

Ok thinking out loud here...

So let's say we have an application that has a submission in flight. It
is stuck in drm scheduler for some reason. Then we get a GT reset.
Native mode is configured first per the update here. Drm-scheduler
restarts and submits the workload to GuC. GuC then submits to HW. While
the workload is running in HW, PF mode is configured in GuC. The
workload that was running in HW then completes Before the CSB of the
completion hits GuC though, PF config completes and KMD replays the
workload that just completed in HW again. GuC will check the ring tail
in memory and see that the head/tail match and therefore won't submit
the new workload.

If userspace was waiting on a memory update for job completion (or
semaphore), it should go through either way and should be no harm
having a second workload there even if it was submitted.

But yeah makes sense to me, no issue here.

Reviewed-by: Stuart Summers <stuart.summers at intel.com>

> 
> - Michal