[PATCH v2 2/2] drm/panfrost: Queue jobs on the hardware
Boris Brezillon
boris.brezillon at collabora.com
Thu Jun 24 09:56:32 UTC 2021
On Thu, 24 Jun 2021 10:23:51 +0100
Steven Price <steven.price at arm.com> wrote:
> > static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status)
> > {
> > - int j;
> > + struct panfrost_job *done[NUM_JOB_SLOTS][2] = {};
> > + struct panfrost_job *failed[NUM_JOB_SLOTS] = {};
> > + u32 js_state, js_events = 0;
> > + unsigned int i, j;
> >
> > - dev_dbg(pfdev->dev, "jobslot irq status=%x\n", status);
> > + while (status) {
> > + for (j = 0; j < NUM_JOB_SLOTS; j++) {
> > + if (status & JOB_INT_MASK_DONE(j)) {
> > + if (done[j][0]) {
> > + done[j][1] = panfrost_dequeue_job(pfdev, j);
> > + WARN_ON(!done[j][1]);
> > + } else {
> > + done[j][0] = panfrost_dequeue_job(pfdev, j);
> > + WARN_ON(!done[j][0]);
>
> NIT: I'd be tempted to move this WARN_ON into panfrost_dequeue_job() as
> it's relevant for any call to the function.
Makes sense. I'll move those WARN_ON()s.
>
> > + }
> > + }
> >
> > - for (j = 0; status; j++) {
> > - u32 mask = MK_JS_MASK(j);
> > + if (status & JOB_INT_MASK_ERR(j)) {
> > + /* Cancel the next submission. Will be submitted
> > + * after we're done handling this failure if
> > + * there's no reset pending.
> > + */
> > + job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP);
> > + failed[j] = panfrost_dequeue_job(pfdev, j);
> > + }
> > + }
> >
> > - if (!(status & mask))
> > + /* JS_STATE is sampled when JOB_INT_CLEAR is written.
> > + * For each BIT(slot) or BIT(slot + 16) bit written to
> > + * JOB_INT_CLEAR, the corresponding bits in JS_STATE
> > + * (BIT(slot) and BIT(slot + 16)) are updated, but this
> > + * is racy. If we only have one job done at the time we
> > + * read JOB_INT_RAWSTAT but the second job fails before we
> > + * clear the status, we end up with a status containing
> > + * only the DONE bit and consider both jobs as DONE since
> > + * JS_STATE reports both NEXT and CURRENT as inactive.
> > + * To prevent that, let's repeat this clear+read steps
> > + * until status is 0.
> > + */
> > + job_write(pfdev, JOB_INT_CLEAR, status);
> > + js_state = job_read(pfdev, JOB_INT_JS_STATE);
>
> This seems a bit dodgy. The spec says that JOB_INT_JS_STATE[1] is
> updated only for the job slots which have bits set in the JOB_INT_CLEAR.
> So there's potentially two problems:
>
> * The spec makes no gaurentee about the values of the bits for other
> slots. But we're not masking off those bits.
>
> * If we loop (e.g. because the other slot finishes while handling the
> first interrupt) then we may lose the state for the first slot.
>
> I'm not sure what the actual hardware returns in the bits which are
> unrelated to the previous JOB_INT_CLEAR - kbase is careful only to
> consider the bits relating to the slot it's currently dealing with.
Hm, I see. How about something like that?
struct panfrost_job *done[NUM_JOB_SLOTS][2] = {};
struct panfrost_job *failed[NUM_JOB_SLOTS] = {};
u32 js_state = 0, js_events = 0;
unsigned int i, j;
while (status) {
u32 js_state_mask = 0;
for (j = 0; j < NUM_JOB_SLOTS; j++) {
if (status & MK_JS_MASK(j))
js_state_mask |= MK_JS_MASK(j);
if (status & JOB_INT_MASK_DONE(j)) {
if (done[j][0]) {
done[j][1] = panfrost_dequeue_job(pfdev, j);
WARN_ON(!done[j][1]);
} else {
done[j][0] = panfrost_dequeue_job(pfdev, j);
WARN_ON(!done[j][0]);
}
}
if (status & JOB_INT_MASK_ERR(j)) {
/* Cancel the next submission. Will be submitted
* after we're done handling this failure if
* there's no reset pending.
*/
job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP);
failed[j] = panfrost_dequeue_job(pfdev, j);
}
}
/* JS_STATE is sampled when JOB_INT_CLEAR is written.
* For each BIT(slot) or BIT(slot + 16) bit written to
* JOB_INT_CLEAR, the corresponding bits in JS_STATE
* (BIT(slot) and BIT(slot + 16)) are updated, but this
* is racy. If we only have one job done at the time we
* read JOB_INT_RAWSTAT but the second job fails before we
* clear the status, we end up with a status containing
* only the DONE bit and consider both jobs as DONE since
* JS_STATE reports both NEXT and CURRENT as inactive.
* To prevent that, let's repeat this clear+read steps
* until status is 0.
*/
job_write(pfdev, JOB_INT_CLEAR, status);
js_state &= ~js_state_mask;
js_state |= job_read(pfdev, JOB_INT_JS_STATE) & js_state_mask;
js_events |= status;
status = job_read(pfdev, JOB_INT_RAWSTAT);
}
>
> [1] Actually JOB_IRQ_JS_STATE - I sometimes think we should rename to
> match the spec, I keep searching the docs for the wrong name ;)
I can add a patch doing that :-).
> >
> > - status &= ~mask;
> > + for (j = 0; j < NUM_JOB_SLOTS; j++) {
> > + if (!(js_events & MK_JS_MASK(j)))
> > + continue;
> > +
> > + if (!failed[j] || !pfdev->jobs[j][0])
> > + continue;
> > +
> > + if (pfdev->jobs[j][0]->jc == 0) {
> > + /* The job was cancelled, signal the fence now */
> > + struct panfrost_job *canceled = panfrost_dequeue_job(pfdev, j);
> > +
> > + panfrost_mmu_as_put(pfdev, canceled->file_priv->mmu);
> > + panfrost_devfreq_record_idle(&pfdev->pfdevfreq);
> > +
> > + dma_fence_set_error(canceled->done_fence, -ECANCELED);
> > + dma_fence_signal_locked(canceled->done_fence);
> > + pm_runtime_put_autosuspend(pfdev->dev);
> > + } else if (!atomic_read(&pfdev->reset.pending)) {
> > + /* Resume the job we stopped if no reset is pending */
> > + job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_START);
>
> If I'm following correctly this is resubmitting a soft-stopped job. But
> I haven't seen where the new JC pointer is written.
Not exactly. It's submitting a job that was queued but not started.
When we see a fault, we write NOP to COMMAND_NEXT so the next job is
not started when the interrupt is cleared. And here we're just
requeuing it.
More information about the dri-devel
mailing list