[PATCH] drm/sched: Only start TDR in drm_sched_job_begin on first job
Matthew Brost
matthew.brost at intel.com
Wed Aug 14 05:44:43 UTC 2024
On Thu, Jul 25, 2024 at 02:50:54PM +0000, Matthew Brost wrote:
> On Thu, Jul 25, 2024 at 09:42:08AM +0200, Christian König wrote:
> > Am 25.07.24 um 01:44 schrieb Matthew Brost:
> > > Only start in drm_sched_job_begin on first job being added to the
> > > pending list as if pending list non-empty the TDR has already been
> > > started. It is problematic to restart the TDR as it will extend TDR
> > > period for an already running job, potentially leading to dma-fence
> > > signaling for a very long period of with continous stream of jobs.
> >
> > Mhm, that should be unnecessary. drm_sched_start_timeout() should only start
> > the timeout, but never re-start it.
> >
>
> That function checks the pending list for not empty, so it indeed starts
> it. Which is the correct behavior for some of the callers, e.g.
> drm_sched_tdr_queue_imm, drm_sched_get_finished_job
>
> IMO best to fix this here.
>
> Also FWIW on Xe I wrote a test which submitted a new ending spinner,
> then submitted a job every second on the same queue in a loop and
> observed the spinner not get canceled for a long time. After this patch,
> the spinner correctly timed out after 5 second (our default TDR period).
>
> Matt
Ping Christian. Any response to above?
Pretty clear problem, would like to resolve.
Matt
>
> > Could be that this isn't working properly.
> >
> > Regards,
> > Christian.
> >
> > >
> > > Cc: Christian König <christian.koenig at amd.com>
> > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > ---
> > > drivers/gpu/drm/scheduler/sched_main.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 7e90c9f95611..feeeb9dbeb86 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -540,7 +540,8 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
> > > spin_lock(&sched->job_list_lock);
> > > list_add_tail(&s_job->list, &sched->pending_list);
> > > - drm_sched_start_timeout(sched);
> > > + if (list_is_singular(&sched->pending_list))
> > > + drm_sched_start_timeout(sched);
> > > spin_unlock(&sched->job_list_lock);
> > > }
> >
More information about the dri-devel
mailing list