[Intel-gfx] [PATCH 01/11] drm/i915/gem: Make context persistence optional

Chris Wilson chris at chris-wilson.co.uk
Fri Oct 25 21:29:28 UTC 2019


Quoting Jason Ekstrand (2019-10-25 19:22:04)
> On Thu, Oct 24, 2019 at 6:40 AM Chris Wilson <chris at chris-wilson.co.uk> wrote:
> 
>     Our existing behaviour is to allow contexts and their GPU requests to
>     persist past the point of closure until the requests are complete. This
>     allows clients to operate in a 'fire-and-forget' manner where they can
>     setup a rendering pipeline and hand it over to the display server and
>     immediately exiting. As the rendering pipeline is kept alive until
>     completion, the display server (or other consumer) can use the results
>     in the future and present them to the user.
> 
>     However, not all clients want this persistent behaviour and would prefer
>     that the contexts are cleaned up immediately upon closure. This ensures
>     that when clients are run without hangchecking, any GPU hang is
>     terminated with the process and does not continue to hog resources.
> 
>     By defining a context property to allow clients to control persistence
>     explicitly, we can remove the blanket advice to disable hangchecking
>     that seems to be far too prevalent.
> 
> 
> Just to be clear, when you say "disable hangchecking" do you mean disabling it
> for all processes via a kernel parameter at boot time or a sysfs entry or
> similar?  Or is there some mechanism whereby a context can request no hang
> checking?

They are being told to use the module parameter i915.enable_hangcheck=0
to globally disable hangchecking. This is what we are trying to wean
them off, and yet still allow indefinitely long kernels. The softer
hangcheck is focused on if you block scheduling or preemption of higher
priority work, then you are forcibly removed from the GPU. However, even
that is too much for some workloads, where they really do expect to
permanently hog the GPU. (All I can say is that they better be dedicated
systems because if you demand interactivity on top of disabling
preemption...)

>     The default behaviour for new controls is the legacy persistence mode.
>     New clients will have to opt out for immediate cleanup on context
>     closure. If the hangchecking modparam is disabled, so is persistent
>     context support -- all contexts will be terminated on closure.
> 
> 
> What happens to fences when the context is cancelled?  Is it the same behavior
> as we have today for when a GPU hang is detected and a context is banned?

Yes. The incomplete fence statuses are set to -EIO -- it is the very same
mechanism used to remove this context's future work from the GPU as is
used for banning.
-Chris


More information about the Intel-gfx mailing list