[PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Philipp Stanner
pstanner at redhat.com
Fri Nov 15 16:07:04 UTC 2024
On Fri, 2024-11-15 at 15:39 +0100, Christian König wrote:
> Am 15.11.24 um 14:55 schrieb Philipp Stanner:
>
> > [SNIP]
> > >
> > > A good bunch of the problems we have here are caused by abusing
> > > the
> > > job
> > > as state for executing the submission on the hardware.
> > >
> > > The original design idea of the scheduler instead was to only
> > > have
> > > the
> > > job as intermediate state between submission and picking what to
> > > execute
> > > next. E.g. the scheduler in that view is just a pipeline were you
> > > push
> > > jobs in on one end and jobs come out on the other end.
> > >
> >
> > So let's get a bit more precise about this:
> > 1. Driver enqueues with drm_sched_job_arm()
> > 2. job ends up in pending_list
> > 3. Sooner or later scheduler calls run_job()
> > 4. Job is pushed to hardware
> > 5. Fence gets signaled
> > 6. ???
> >
> > What would the "come out on the other end" part you describe look
> > like?
> >
> > How would jobs get removed from pending_list and, accordingly, how
> > would we avoid leaks?
> >
>
> Let me describe the full solution a bit further down.
>
>
>
> >
> > >
> > > In that approach the object which represents the hardware
> > > execution
> > > is
> > > the dma_fence instead of the job. And the dma_fence has a well
> > > defined
> > > lifetime, error handling, etc...
> > >
> > > So when we go back to this original approach it would mean that
> > > we
> > > don't
> > > need to wait for any cleanup because the scheduler wouldn't be
> > > involved
> > > in the execution of the jobs on the hardware any more.
> > >
> >
> > It would be involved (to speak precisely) in the sense of the
> > scheduler
> > still being the one who pushes the job onto the hardware, agreed?
> >
>
> Yes, exactly.
>
> See in the original design the "job" wasn't even a defined
> structure, but rather just a void*.
>
> So basically what the driver did was telling the scheduler here I
> have a bunch of different void* please tell me which one to run
> first.
>
> And apart from being this identifier this void* had absolutely no
> meaning for the scheduler.
Interesting..
>
> > >
> > > The worst thing that could happen is that the driver messes
> > > things up
> > > and still has not executed job in an entity,
> > >
> >
> > I can't fully follow.
> >
> > So in your mind, the driver would personally set the dma_fence
> > callback
> > and hereby be informed about the job being completed, right?
> >
>
> No. The run_job callback would still return the hw fence so that the
> scheduler can install the callback and so gets informed when the next
> job could be run.
>
>
> >
> > But you wouldn't want to aim for getting rid of struct
> > drm_sched_job
> > completely, or would you?
> >
>
> No, the drm_sched_job structure was added to aid the single producer
> single consumer queue and so made it easier to put the jobs into a
> container.
>
>
> In my mind the full solution for running jobs looks like this:
>
> 1. Driver enqueues with drm_sched_job_arm()
> 2. job ends up in pending_list
> 3. Sooner or later scheduler calls run_job()
> 4. In return scheduler gets a dma_fence representing the resulting
> hardware operation.
> 5, This fence is put into a container to keep around what the hw is
> actually executing.
> 6. At some point the fence gets signaled informing the scheduler
> that the next job can be pushed if enough credits are now available.
>
> There is no free_job callback any more because after run_job is
> called the scheduler is done with the job and only the dma_fence
> which represents the actually HW operation is the object the
> scheduler now works with.
>
> We would still need something like a kill_job callback which is used
> when an entity is released without flushing all jobs (see
> drm_sched_entity_kill()), but that is then only used for a specific
> corner case.
Can't we just limit the scheduler's responsibility to telling the
driver that it has to flush, and if it doesn't it's a bug?
> Blocking for the cleanup in drm_sched_fini() then becomes a trivial
> dma_fence_wait() on the remaining unsignaled HW fences and eventually
> on the latest killed fence.
But that results in exactly the same situation as my waitque solution,
doesn't it?
Blocking infinitely in drm_sched_fini():
If the driver doesn't guarantee that all fences will get signaled, then
wait_event() in the waitque solution will block forever. The same with
dma_fence_wait().
Or are you aiming at an interruptible dma_fence_wait(), thereby not
blocking SIGKILL?
That then would still result in leaks. So basically the same situation
as with an interruptible drm_sched_flush() function.
(Although I agree that would probably be more elegant)
>
> The problem with this approach is that the idea of re-submitting
> jobs in a GPU reset by the scheduler is then basically dead. But to
> be honest that never worked correctly.
>
> See the deprecated warning I already put on
> drm_sched_resubmit_jobs(). The job lifetime is not really well
> defined because of this, see the free_guilty hack as well.
drm_sched_resubmit_jobs() is being used by 5 drivers currently. So if
we go for this approach we have to port them, first. That doesn't sound
trivial to me
P.
>
> Regards,
> Christian.
>
>
> >
> >
> >
> > Grüße,
> > P.
> >
> >
>
More information about the dri-devel
mailing list