[PATCH v3] drm/panfrost: Fix job timeout handling

Thu Oct 8 10:04:15 UTC 2020

On 02/10/2020 13:25, Boris Brezillon wrote:
> If more than two jobs end up timeout-ing concurrently, only one of them
> (the one attached to the scheduler acquiring the lock) is fully handled.
> The other one remains in a dangling state where it's no longer part of
> the scheduling queue, but still blocks something in scheduler, leading
> to repetitive timeouts when new jobs are queued.
> 
> Let's make sure all bad jobs are properly handled by the thread
> acquiring the lock.
> 
> v3:
> - Add Steven's R-b
> - Don't take the sched_lock when stopping the schedulers
> 
> v2:
> - Fix the subject prefix
> - Stop the scheduler before returning from panfrost_job_timedout()
> - Call cancel_delayed_work_sync() after drm_sched_stop() to make sure
>    no timeout handlers are in flight when we reset the GPU (Steven Price)
> - Make sure we release the reset lock before restarting the
>    schedulers (Steven Price)
> 
> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
> Cc: <stable at vger.kernel.org>
> Signed-off-by: Boris Brezillon <boris.brezillon at collabora.com>
> Reviewed-by: Steven Price <steven.price at arm.com>

Applied to drm-misc-next, thanks!

Steve