[PATCH 1/2] drm/panfrost: Make sure MMU context lifetime is not bound to panfrost_priv

Boris Brezillon boris.brezillon at collabora.com
Wed Feb 5 14:01:34 UTC 2020


On Wed, 5 Feb 2020 13:39:21 +0000
Steven Price <steven.price at arm.com> wrote:

> On 04/02/2020 14:35, Boris Brezillon wrote:
> > Jobs can be in-flight when the file descriptor is closed (either because
> > the process did not terminate properly, or because it didn't wait for
> > all GPU jobs to be finished), and apparently panfrost_job_close() does
> > not cancel already running jobs. Let's refcount the MMU context object
> > so it's lifetime is no longer bound to the FD lifetime and running jobs
> > can finish properly without generating spurious page faults.  
> 
> Is there any good reason not to just make panfrost_job_close() kill off
> any running jobs?

Nope, I just didn't know how to do that without stopping all other jobs
(should have looked at how mali_kbase is doing that before posting this
patch :)).

> I'm not sure what the benefit is of allowing the jobs
> to still run after the file descriptor has closed.

None that I can think of.

> 
> In particular this could cause problems when(/if) Panfrost starts trying
> to deal with "compute" work loads that might have long runtimes. It's
> quite possible to produce a job which never (naturally) exits, currently
> we have a simplistic timeout which kills anything which doesn't complete
> promptly. However there is nothing conceptually wrong with a job which
> takes seconds (or even minutes) to complete.

Absolutely. That was also one of my concerns.

> The hardware has support
> for task switching ('soft stopping') between jobs so this can be done to
> prevent blocking other applications.

Okay. I guess it's implemented in mali_kbase. I'll have a look.

> 
> If panfrost_job_close() doesn't kill the jobs then removing the timeouts
> could lead to the situation where there is an 'infinite' job with no
> owner and no way of killing it off. Which doesn't seem like a great
> feature ;)

Didn't know you were planning to remove the timeouts.

> 
> Another approach could be simply to silence the page fault output in
> this case - switching the address space to UNMAPPED is actually an
> effective way of killing jobs - at some point I think this was a
> workaround to a hardware bug, but IIRC that was unreleased hardware :)

Okay. I'll check how it's done in mali_kbase.

Thanks for the feedback.

Boris


More information about the dri-devel mailing list