[PATCH v4] drm/panfrost: Move the GPU reset bits outside the timeout handler
Boris Brezillon
boris.brezillon at collabora.com
Thu Nov 5 13:39:53 UTC 2020
On Thu, 5 Nov 2020 13:27:04 +0000
Steven Price <steven.price at arm.com> wrote:
> > + old_status = atomic_xchg(&queue->status,
> > + PANFROST_QUEUE_STATUS_STOPPED);
> > + WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE &&
> > + old_status != PANFROST_QUEUE_STATUS_STOPPED);
> > + if (old_status == PANFROST_QUEUE_STATUS_STOPPED)
> > + goto out;
>
> NIT: It's slightly cleaner if you swap the above lines, i.e.:
>
> if (old_status == PANFROST_QUEUE_STATUS_STOPPED)
> goto out;
> WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE);
I agree.
>
> > +
> > + drm_sched_stop(&queue->sched, bad);
> > + if (bad)
> > + drm_sched_increase_karma(bad);
> > +
> > + stopped = true;
> > +
> > + /*
> > + * Set the timeout to max so the timer doesn't get started
> > + * when we return from the timeout handler (restored in
> > + * panfrost_scheduler_start()).
> > + */
> > + queue->sched.timeout = MAX_SCHEDULE_TIMEOUT;
> > +
> > +out:
> > mutex_unlock(&queue->lock);
> >
> > return stopped;
> > }
> >
> > +static void panfrost_scheduler_start(struct panfrost_queue_state *queue)
> > +{
> > + enum panfrost_queue_status old_status;
> > +
> > + mutex_lock(&queue->lock);
> > + old_status = atomic_xchg(&queue->status,
> > + PANFROST_QUEUE_STATUS_STARTING);
> > + if (WARN_ON(old_status != PANFROST_QUEUE_STATUS_STOPPED))
> > + goto out;
>
> The error handling isn't great here - in this case the queue status is
> left in _STATUS_STARTING, which at best would lead to another WARN_ON
> being hit, but also has the effect of ignoring job faults. Probably the
> timeout would eventually get things back to normal.
>
> Obviously this situation will never occur™, but we can do better either
> by continuing with the normal logic below, or even better replacing
> atomic_xchg() with an atomic_cmpxchg() (so leave the status alone if not
> _STOPPED). Both seem like better error recovery options to me. But keep
> the WARN_ON because something has clearly gone wrong if this happens.
The second approach doesn't unblock things if we end up with
old_status != STOPPED and the queue is really stopped (which shouldn't
happen, unless we have a problem in our state machine). I think I'll
go for the first option and restart the queue unconditionally (I'm
keeping the WARN_ON(), of course).
More information about the dri-devel
mailing list