[Intel-gfx] [PATCH v4 07/38] drm/i915: Start of GPU scheduler
Joonas Lahtinen
joonas.lahtinen at linux.intel.com
Fri Feb 19 10:13:43 UTC 2016
Hi,
Adding Daniel as CC to comment below.
On to, 2016-02-18 at 14:22 +0000, John Harrison wrote:
> On 20/01/2016 13:18, Joonas Lahtinen wrote:
> > On Mon, 2016-01-11 at 18:42 +0000, John.C.Harrison at Intel.com wrote:
> > > From: John Harrison <John.C.Harrison at Intel.com>
> > >
<SNIP>
> > >
> > > + this = node->saved_objects + i;
> > > +
> > > + for (j = 0; j < test->num_objs; j++) {
> > > + that = test->saved_objects + j;
> > > +
> > > + if (this->obj != that->obj)
> > > + continue;
> > How about VMAs? There might be multiple mappings to an object, isn't it
> > enough to depend on the required VMA instead of the whole object?
> The object is what we get coming in from user land through the IOCTL. So
> why make things more complicated? If there are multiple VMAs referring
> to the same object then we can't just track an individual VMA as that
> would loose the dependency on all the other VMAs. Just because the
> object is mapped to someone else's address space doesn't mean that this
> batch buffer can't overwrite data they are reading.
>
Right, makes sense.
> > > +int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
> > > +{
> > > + struct drm_i915_private *dev_priv = qe->params.dev->dev_private;
> > > + struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > + struct intel_engine_cs *ring = qe->params.ring;
> > > + struct i915_scheduler_queue_entry *node;
> > > + struct i915_scheduler_queue_entry *test;
> > > + unsigned long flags;
> > > + bool not_flying;
> > > + int i, r;
> > > + int incomplete = 0;
> > > +
> > > + WARN_ON(!scheduler);
> > > +
> > This kind of situations should have a be a BUG_ON, because scheduler
> > being zero is literally going to cause an OOPS in the next dereference
> > which is going to happen unconditionally. WARN + OOPS is kind of what
> > BUG_ON should be used avoid. But this should be removed anyway after
> > scheduler is made a data member of dev_priv.
>
> The WARNs were originally BUGs but Daniel Vetter had the opposite
> demand. His view was the driver should never BUG under any
> circumstances. A WARN followed by an oops is better than a BUG because
> maybe it won't actually oops.
>
WARN_ON is better than BUG_ON when there won't be an immediate OOPS.
But if you're doing a null pointer dereference like here if scheduler
is NULL, it is 100% sure it is going to either OOPS or cause horrible
undefined behaviour (on non-x86 platform).
Something like;
if (WARN_ON(!a))
return -ENODEV;
Could be what Daniel meant (if the error is propagated up and somehow
dealt with), but;
WARN_ON(!a);
a->foo = bar;
Should simply be BUG_ON(!a), because otherwise it'll just end up being
WARNING + OOPS right after each other.
>
> >
> > > + if (1/*i915.scheduler_override & i915_so_direct_submit*/) {
> > I assume this is going to be addressed in a future commit. Could have
> > been introduced in this patch, too.
> >
> > > + int ret;
> > > +
> > > + scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
> > > + ret = dev_priv->gt.execbuf_final(&qe->params);
> > > + scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
> > > +
> > The kerneldoc should mention locking requirements of this function.
> >
> > > + /*
> > > + * Don't do any clean up on failure because the caller will
> > > + * do it all anyway.
> > > + */
> > > + if (ret)
> > > + return ret;
> > > +
> > > + /* Free everything that is owned by the QE structure: */
> > > + kfree(qe->params.cliprects);
> > > + if (qe->params.dispatch_flags & I915_DISPATCH_SECURE)
> > > + i915_gem_execbuff_release_batch_obj(qe->params.batch_obj);
> > > +
> > > + return 0;
> > Above piece of code looks like its own function, so it should probably
> > be one.
> >
> > > + }
> > > +
> > > + node = kmalloc(sizeof(*node), GFP_KERNEL);
> > > + if (!node)
> > > + return -ENOMEM;
> > > +
> > > + *node = *qe;
> > Any reason we can't simply move ownership of qe? If not, I'd rather
> > make a clone function
>
> The qe pointer passed in is a reference to a stack local object in the
> execbuff code path. Thus ownership cannot be transferred. Doing it this
> way keeps the execbuff code nice and simple and all the dynamic memory
> management and list tracking is self contained within the scheduler.
>
I would indicate this with const qualifier in the function parameter.
> > > +
> > > +static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
> > > +{
> > > + struct drm_i915_private *dev_priv = node->params.dev->dev_private;
> > > + struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > + struct intel_engine_cs *ring;
> > > +
> > > + WARN_ON(!scheduler);
> > > + WARN_ON(!node);
> > > + WARN_ON(node->status != i915_sqs_popped);
> > Other states had their I915_SQS_IS_* macro, why some don't?
> The purpose of the macro is to allow the combining of individual states
> into classes. E.g. dead and complete can both be considered complete for
> the majority of cases. Only in certain situations do you need to know
> that it really was dead. Hence most places that don't really care just
> use the merged macros, whereas places like this that do care use the
> explicit enum value.
>
Right, just asking cause having a macro might increase consistency.
Also here, we have WARN_ON(!node), and then next line dereferencing
node, which again is surely better off as BUG_ON.
Although it's not unreasonable to expect function to OOPS if you pass
null pointer. I think only the higher calling level should have
if(WARN_ON(!node)) return ... or BUG_ON construct, before calling this
function.
Only calls coming from userland need to be treated with such care that
anything can be passed, not internal functions. We're not having
BUG_ON(!dev) or BUG_ON(!dev_priv) around either.
> > > + /*
> > > + * In the case where the system is idle, starting 'min_seqno' from a big
> > > + * number will cause all nodes to be removed as they are now back to
> > > + * being in-order. However, this will be a problem if the last one to
> > > + * complete was actually out-of-order as the ring seqno value will be
> > > + * lower than one or more completed buffers. Thus code looking for the
> > > + * completion of said buffers will wait forever.
> > > + * Instead, use the hardware seqno as the starting point. This means
> > > + * that some buffers might be kept around even in a completely idle
> > > + * system but it should guarantee that no-one ever gets confused when
> > > + * waiting for buffer completion.
> > > + */
> > > + min_seqno = ring->get_seqno(ring, true);
> > > +
> > > + list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
> > > + if (I915_SQS_IS_QUEUED(node))
> > > + queued++;
> > > + else if (I915_SQS_IS_FLYING(node))
> > > + flying++;
> > > + else if (I915_SQS_IS_COMPLETE(node))
> > > + continue;
> > > +
> > > + if (node->params.request->seqno == 0)
> > > + continue;
> > > +
> > > + if (!i915_seqno_passed(node->params.request->seqno, min_seqno))
> > > + min_seqno = node->params.request->seqno;
> > > + }
> > Couldn't these values be kept cached, instead of counting them at each
> > function?
> The 'queued' and flying totals could be kept cached but min_seqno is
> dependent upon the state of the hardware so needs to be recalculated. In
> which case calculating the totals here is trivial and avoids having
> extra code elsewhere to keep them up to date.
>
Ok.
Btw, for_each_scheduler_node or such would be a great macro. Those are
very much preferred for readability.
$ fgrep for_each_ drivers/gpu/drm/i915/* | wc -l
525
> >
> > > +
> > > + INIT_LIST_HEAD(&remove);
> > > + list_for_each_entry_safe(node, node_next, &scheduler->node_queue[ring->id], link) {
> > > + /*
> > > + * Only remove completed nodes which have a lower seqno than
> > > + * all pending nodes. While there is the possibility of the
> > > + * ring's seqno counting backwards, all higher buffers must
> > > + * be remembered so that the 'i915_seqno_passed()' test can
> > > + * report that they have in fact passed.
> > > + *
> > > + * NB: This is not true for 'dead' nodes. The GPU reset causes
> > > + * the software seqno to restart from its initial value. Thus
> > > + * the dead nodes must be removed even though their seqno values
> > > + * are potentially vastly greater than the current ring seqno.
> > > + */
> > > + if (!I915_SQS_IS_COMPLETE(node))
> > > + continue;
> > > +
> > > + if (node->status != i915_sqs_dead) {
> > > + if (i915_seqno_passed(node->params.request->seqno, min_seqno) &&
> > > + (node->params.request->seqno != min_seqno))
> > > + continue;
> > > + }
> > > +
> > > + list_del(&node->link);
> > > + list_add(&node->link, &remove);
> > > +
> > > + /* Strip the dependency info while the mutex is still locked */
> > > + i915_scheduler_remove_dependent(scheduler, node);
> > > +
> > > + continue;
> > > + }
<SNIP>
> > > + do {
> > > + WARN_ON(!node);
> > > + WARN_ON(node->params.ring != ring);
> > > + WARN_ON(node->status != i915_sqs_popped);
> > > + count++;
> > > +
> > > + /*
> > > + * The call to pop above will have removed the node from the
> > > + * list. So add it back in and mark it as in flight.
> > > + */
> > > + i915_scheduler_fly_node(node);
> > Why do we want to pull an object out of the list inside spin lock and
> > push it back immediately in our critical code path? Seems like a waste
> > for no obvious gain at this point. Why do not we rather just select an
> > entry and modify it in-place, if it's going to stay in the same queue
> > anyway.
> The list order is significant. The element must be moved to the front to
> keep the submitted items in submission order. Doing it this way also
> keeps the code nicely partitioned and easier to understand/maintain.
> Plus, there is a plan to optimise the code by splitting the one single
> list into three separate ones - queued, flying, complete. If/when that
> happens, the element will have to be removed from one list and added to
> another.
Ok.
>
> >
> > > +
> > > + scheduler->flags[ring->id] |= i915_sf_submitting;
> > > + spin_unlock_irqrestore(&scheduler->lock, flags);
> > > + ret = dev_priv->gt.execbuf_final(&node->params);
> > > + spin_lock_irqsave(&scheduler->lock, flags);
> > > + scheduler->flags[ring->id] &= ~i915_sf_submitting;
> > > +
> > > + if (ret) {
> > > + int requeue = 1;
> > Multipurpose variable, not really a good idea. And as commented
> > further, should not exist at all.
> >
> > > +
> > > + /*
> > > + * Oh dear! Either the node is broken or the ring is
> > > + * busy. So need to kill the node or requeue it and try
> > > + * again later as appropriate.
> > > + */
> > > +
> > > + switch (-ret) {
> > > + case ENODEV:
> > > + case ENOENT:
> > > + /* Fatal errors. Kill the node. */
> > > + requeue = -1;
> > > + break;
> > "break" indent is wrong.
> >
> > > +
> > > + case EAGAIN:
> > > + case EBUSY:
> > > + case EIO:
> > > + case ENOMEM:
> > > + case ERESTARTSYS:
> > > + case EINTR:
> > > + /* Supposedly recoverable errors. */
> > > + break;
> > > +
> > > + default:
> > > + /*
> > > + * Assume the error is recoverable and hope
> > > + * for the best.
> > > + */
> > > + DRM_DEBUG_DRIVER("<%s> Got unexpected error from execfinal(): %d!\n",
> > > + ring->name, ret);
> > There's MISSING_CASE macro, should use it.
> >
> > > + break;
> > > + }
> > > +
> > Just move the code below this point to the switch, no point having a
> > switch to categorize your options and then doing bunch of ifs to
> > execute code that could be in switch.
> One of the 'if' paths is to break out of the while loop. Can't do that
> from inside the switch.
I do think the code could still be simplified, even if it involved a
carefully placed goto.
>
> > > + /*
> > > + * Check that the watchdog/reset code has not nuked
> > > + * the node while we weren't looking:
> > > + */
> > > + if (node->status == i915_sqs_dead)
> > > + requeue = 0;
Btw, just noticed, should some locking of node occur? Is it handled
gracefully if right after this check, the status is changed, or do we
have a race condition here?
>
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
More information about the Intel-gfx
mailing list