[PATCH v4 7/8] drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Philipp Stanner
phasta at mailbox.org
Tue Jul 8 09:47:02 UTC 2025
On Tue, 2025-07-08 at 00:22 -0700, Matthew Brost wrote:
> On Mon, Jul 07, 2025 at 11:46:36AM -0300, Maíra Canal wrote:
> > Xe can skip the reset if TDR has fired before the free job worker
> > and can
> > also re-arm the timeout timer in some scenarios. Instead of
> > manipulating
> > scheduler's internals, inform the scheduler that the job did not
> > actually
> > timeout and no reset was performed through the new status code
> > DRM_GPU_SCHED_STAT_NO_HANG.
> >
> > Note that, in the first case, there is no need to restart
> > submission if it
> > hasn't been stopped.
> >
> > Signed-off-by: Maíra Canal <mcanal at igalia.com>
>
> I'm fairly certain this is correct. However, Intel's CI didn't run
> with
> your latest series. Can you resubmit and ensure a clean CI run before
> merging?
How can someone who's not at Intel ensure that?
P.
> CI can be a bit flaky—if you get some failures, ping me and
> I’ll let you know if they're related to this patch.
>
> With clean CI:
> Reviewed-by: Matthew Brost matthew.brost at intel.com
>
> > ---
> > drivers/gpu/drm/xe/xe_guc_submit.c | 12 +++---------
> > 1 file changed, 3 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index
> > 9c7e445b9ea7ce7e3610eadca023e6d810e683e9..f6289eeffd852e40b33d0e455
> > d9bcc21a4fb1467 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1078,12 +1078,8 @@ guc_exec_queue_timedout_job(struct
> > drm_sched_job *drm_job)
> > * list so job can be freed and kick scheduler ensuring
> > free job is not
> > * lost.
> > */
> > - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence-
> > >flags)) {
> > - xe_sched_add_pending_job(sched, job);
> > - xe_sched_submission_start(sched);
> > -
> > - return DRM_GPU_SCHED_STAT_RESET;
> > - }
> > + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence-
> > >flags))
> > + return DRM_GPU_SCHED_STAT_NO_HANG;
> >
> > /* Kill the run_job entry point */
> > xe_sched_submission_stop(sched);
> > @@ -1261,10 +1257,8 @@ guc_exec_queue_timedout_job(struct
> > drm_sched_job *drm_job)
> > * but there is not currently an easy way to do in DRM
> > scheduler. With
> > * some thought, do this in a follow up.
> > */
> > - xe_sched_add_pending_job(sched, job);
> > xe_sched_submission_start(sched);
> > -
> > - return DRM_GPU_SCHED_STAT_RESET;
> > + return DRM_GPU_SCHED_STAT_NO_HANG;
> > }
> >
> > static void __guc_exec_queue_fini_async(struct work_struct *w)
> >
> > --
> > 2.50.0
> >
More information about the Intel-xe
mailing list