[Intel-gfx] [PATCH v5 06/35] drm/i915: Start of GPU scheduler

Fri Feb 26 09:13:30 UTC 2016

Hi,

The below answers are reasonable. So v6 should be the version.

Regards, Joonas

On pe, 2016-02-19 at 17:03 +0000, John Harrison wrote:
> On 19/02/2016 13:03, Joonas Lahtinen wrote:
> > 
> > Hi,
> > 
> > Now the code is in reviewable chunks, excellent!
> > 
> > I've added my comments below. A few repeats from last round, but
> > now
> > with more questions about the logic itself.
> > 
> > On to, 2016-02-18 at 14:26 +0000, John.C.Harrison at Intel.com wrote:
> > > 
> > > From: John Harrison <John.C.Harrison at Intel.com>
> > > 
> > > Initial creation of scheduler source files. Note that this patch
> > > implements most of the scheduler functionality but does not hook
> > > it in
> > > to the driver yet. It also leaves the scheduler code in 'pass
> > > through'
> > > mode so that even when it is hooked in, it will not actually do
> > > very
> > > much. This allows the hooks to be added one at a time in bite
> > > size
> > > chunks and only when the scheduler is finally enabled at the end
> > > does
> > > anything start happening.
> > > 
> > > The general theory of operation is that when batch buffers are
> > > submitted to the driver, the execbuffer() code packages up all
> > > the
> > > information required to execute the batch buffer at a later time.
> > > This
> > > package is given over to the scheduler which adds it to an
> > > internal
> > > node list. The scheduler also scans the list of objects
> > > associated
> > > with the batch buffer and compares them against the objects
> > > already in
> > > use by other buffers in the node list. If matches are found then
> > > the
> > > new batch buffer node is marked as being dependent upon the
> > > matching
> > > node. The same is done for the context object. The scheduler also
> > > bumps up the priority of such matching nodes on the grounds that
> > > the
> > > more dependencies a given batch buffer has the more important it
> > > is
> > > likely to be.
> > > 
> > > The scheduler aims to have a given (tuneable) number of batch
> > > buffers
> > > in flight on the hardware at any given time. If fewer than this
> > > are
> > > currently executing when a new node is queued, then the node is
> > > passed
> > > straight through to the submit function. Otherwise it is simply
> > > added
> > > to the queue and the driver returns back to user land.
> > > 
> > > The scheduler is notified when each batch buffer completes and
> > > updates
> > > its internal tracking accordingly. At the end of the completion
> > > interrupt processing, if any scheduler tracked batches were
> > > processed,
> > > the scheduler's deferred worker thread is woken up. This can do
> > > more
> > > involved processing such as actually removing completed nodes
> > > from the
> > > queue and freeing up the resources associated with them (internal
> > > memory allocations, DRM object references, context reference,
> > > etc.).
> > > The work handler also checks the in flight count and calls the
> > > submission code if a new slot has appeared.
> > > 
> > > When the scheduler's submit code is called, it scans the queued
> > > node
> > > list for the highest priority node that has no unmet
> > > dependencies.
> > > Note that the dependency calculation is complex as it must take
> > > inter-ring dependencies and potential preemptions into account.
> > > Note
> > > also that in the future this will be extended to include external
> > > dependencies such as the Android Native Sync file descriptors
> > > and/or
> > > the linux dma-buff synchronisation scheme.
> > > 
> > > If a suitable node is found then it is sent to execbuff_final()
> > > for
> > > submission to the hardware. The in flight count is then re-
> > > checked and
> > > a new node popped from the list if appropriate. All nodes that
> > > are not
> > > submitted have their priority bumped. This ensures that low
> > > priority
> > > tasks do not get starved out by busy higher priority ones -
> > > everything
> > > will eventually get its turn to run.
> > > 
> > > Note that this patch does not implement pre-emptive scheduling.
> > > Only
> > > basic scheduling by re-ordering batch buffer submission is
> > > currently
> > > implemented. Pre-emption of actively executing batch buffers
> > > comes in
> > > the next patch series.
> > > 
> > > v2: Changed priority levels to +/-1023 due to feedback from Chris
> > > Wilson.
> > > 
> > > Removed redundant index from scheduler node.
> > > 
> > > Changed time stamps to use jiffies instead of raw monotonic. This
> > > provides lower resolution but improved compatibility with other
> > > i915
> > > code.
> > > 
> > > Major re-write of completion tracking code due to struct fence
> > > conversion. The scheduler no longer has it's own private IRQ
> > > handler
> > > but just lets the existing request code handle completion events.
> > > Instead, the scheduler now hooks into the request notify code to
> > > be
> > > told when a request has completed.
> > > 
> > > Reduced driver mutex locking scope. Removal of scheduler nodes no
> > > longer grabs the mutex lock.
> > > 
> > > v3: Refactor of dependency generation to make the code more
> > > readable.
> > > Also added in read-read optimisation support - i.e., don't treat
> > > a
> > > shared read-only buffer as being a dependency.
> > > 
> > > Allowed the killing of queued nodes rather than only flying ones.
> > > 
> > > v4: Updated the commit message to better reflect the current
> > > state of
> > > the code. Downgraded some BUG_ONs to WARN_ONs. Used the correct
> > > array
> > > memory allocator function (kmalloc_array instead of kmalloc).
> > > Corrected the format of some comments. Wrapped some lines
> > > differently
> > > to keep the style checker happy.
> > > 
> > > Fixed a WARN_ON when killing nodes. The dependency removal code
> > > checks
> > > that nodes being destroyed do not have any oustanding
> > > dependencies
> > > (which would imply they should not have been executed yet). In
> > > the
> > > case of nodes being destroyed, e.g. due to context banning, then
> > > this
> > > might well be the case - they have not been executed and do
> > > indeed
> > > have outstanding dependencies.
> > > 
> > > Re-instated the code to disble interrupts when not in use. The
> > > underlying problem causing broken IRQ reference counts seems to
> > > have
> > > been fixed now.
> > > 
> > > v5: Shuffled various functions around to remove forward
> > > declarations
> > > as apparently these are frowned upon. Removed lots of white space
> > > as
> > > apparently having easy to read code is also frowned upon. Split
> > > the
> > > direct submission scheduler bypass code out into a separate
> > > function.
> > > Squashed down the i915_scheduler.c sections of various patches
> > > into
> > > this patch. Thus the later patches simply hook in existing code
> > > into
> > > various parts of the driver rather than adding the code as well.
> > > Added
> > > documentation to various functions. Re-worked the submit function
> > > in
> > > terms of mutex locking, error handling and exit paths. Split the
> > > delayed work handler function in half. Made use of the kernel
> > > 'clamp'
> > > macro. [Joonas Lahtinen]
> > > 
> > > Added runtime PM calls as these must be done at the top level
> > > before
> > > acquiring the driver mutex lock. [Chris Wilson]
> > > 
> > > Removed some obsolete debug code that had been forgotten about.
> > > 
> > > Moved more clean up code into the
> > > 'i915_gem_scheduler_clean_node()'
> > > function rather than replicating it in mutliple places.
> > > 
> > > Used lighter weight spinlocks.
> > > 
> > > For: VIZ-1587
> > > Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/Makefile         |   1 +
> > >   drivers/gpu/drm/i915/i915_drv.h       |   6 +
> > >   drivers/gpu/drm/i915/i915_gem.c       |   5 +
> > >   drivers/gpu/drm/i915/i915_scheduler.c | 874
> > > ++++++++++++++++++++++++++++++++++
> > >   drivers/gpu/drm/i915/i915_scheduler.h |  95 ++++
> > >   5 files changed, 981 insertions(+)
> > >   create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
> > >   create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Makefile
> > > b/drivers/gpu/drm/i915/Makefile
> > > index 15398c5..79cb38b 100644
> > > --- a/drivers/gpu/drm/i915/Makefile
> > > +++ b/drivers/gpu/drm/i915/Makefile
> > > @@ -10,6 +10,7 @@ ccflags-y := -Werror
> > >   i915-y := i915_drv.o \
> > >   	  i915_irq.o \
> > >   	  i915_params.o \
> > > +	  i915_scheduler.o \
> > >             i915_suspend.o \
> > >   	  i915_sysfs.o \
> > >   	  intel_csr.o \
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > b/drivers/gpu/drm/i915/i915_drv.h
> > > index f4487b9..03add1a 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -1703,6 +1703,8 @@ struct i915_execbuffer_params {
> > >   	struct drm_i915_gem_request     *request;
> > >   };
> > >   
> > > +struct i915_scheduler;
> > > +
> > >   /* used in computing the new watermarks state */
> > >   struct intel_wm_config {
> > >   	unsigned int num_pipes_active;
> > > @@ -1968,6 +1970,8 @@ struct drm_i915_private {
> > >   
> > >   	struct i915_runtime_pm pm;
> > >   
> > > +	struct i915_scheduler *scheduler;
> > > +
> > I think we should have i915.enable_scheduler parameter (just like
> > i915.enable_execlists) and make this a data member, not pointer.
> > 
> > Then we can go forward and have i915.enable_scheduler_preempt and
> > so on
> > to control things at boot time.
> There are. The enable_scheduler comes in as patch #21. Including it
> from 
> the start means modifying i915_params.[ch] in this patch as well.
> And 
> then you are getting into one big patch that contains everything and
> is 
> unreviewable due to its sheer size. The enable_preemption parameter 
> comes in with the pre-emption code. No point in adding that until
> there 
> is something for it to control.
> 
> 
> > 
> > 
> > > 
> > >   	/* Abstract the submission mechanism (legacy ringbuffer
> > > or execlists) away */
> > >   	struct {
> > >   		int (*execbuf_submit)(struct
> > > i915_execbuffer_params *params,
> > > @@ -2290,6 +2294,8 @@ struct drm_i915_gem_request {
> > >   	/** process identifier submitting this request */
> > >   	struct pid *pid;
> > >   
> > > +	struct i915_scheduler_queue_entry	*scheduler_qe;
> > > +
> > Funny whitespace.
> > 
> > > 
> > >   	/**
> > >   	 * The ELSP only accepts two elements at a time, so we
> > > queue
> > >   	 * context/tail pairs on a given queue (ring-
> > > >execlist_queue) until the
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > > b/drivers/gpu/drm/i915/i915_gem.c
> > > index dfe43ea..7d9aa24 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -32,6 +32,7 @@
> > >   #include "i915_vgpu.h"
> > >   #include "i915_trace.h"
> > >   #include "intel_drv.h"
> > > +#include "i915_scheduler.h"
> > >   #include
> > >   #include
> > >   #include
> > > @@ -5315,6 +5316,10 @@ int i915_gem_init(struct drm_device *dev)
> > >   	 */
> > >   	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> > >   
> > > +	ret = i915_scheduler_init(dev);
> > > +	if (ret)
> > > +		goto out_unlock;
> > > +
> > >   	ret = i915_gem_init_userptr(dev);
> > >   	if (ret)
> > >   		goto out_unlock;
> > > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c
> > > b/drivers/gpu/drm/i915/i915_scheduler.c
> > > new file mode 100644
> > > index 0000000..fc23ee7
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > > @@ -0,0 +1,874 @@
> > > +/*
> > > + * Copyright (c) 2014 Intel Corporation
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person
> > > obtaining a
> > > + * copy of this software and associated documentation files (the
> > > "Software"),
> > > + * to deal in the Software without restriction, including
> > > without limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute,
> > > sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to
> > > whom the
> > > + * Software is furnished to do so, subject to the following
> > > conditions:
> > > + *
> > > + * The above copyright notice and this permission notice
> > > (including the next
> > > + * paragraph) shall be included in all copies or substantial
> > > portions of the
> > > + * Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> > > KIND, EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > > MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> > > EVENT SHALL
> > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> > > DAMAGES OR OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> > > OTHERWISE, ARISING
> > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > > OTHER DEALINGS
> > > + * IN THE SOFTWARE.
> > > + *
> > > + */
> > > +
> > > +#include "i915_drv.h"
> > > +#include "intel_drv.h"
> > > +#include "i915_scheduler.h"
> > > +
> > > +/**
> > > + * i915_scheduler_is_enabled - Returns true if the scheduler is
> > > enabled.
> > > + * @dev: DRM device
> > > + */
> > > +bool i915_scheduler_is_enabled(struct drm_device *dev)
> > > +{
> > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > +
> > > +	return dev_priv->scheduler != NULL;
> > > +}
> > i915.enable_scheduler would be used instead.
> It is when it arrives.
> 
> > 
> > 
> > > 
> > > +
> > > +/**
> > > + * i915_scheduler_init - Initialise the scheduler.
> > > + * @dev: DRM device
> > > + * Returns zero on success or -ENOMEM if memory allocations
> > > fail.
> > > + */
> > > +int i915_scheduler_init(struct drm_device *dev)
> > > +{
> > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	int r;
> > > +
> > This protection -->
> > 
> > > 
> > > +	if (scheduler)
> > > +		return 0;
> > > +
> > > +	scheduler = kzalloc(sizeof(*scheduler), GFP_KERNEL);
> > > +	if (!scheduler)
> > > +		return -ENOMEM;
> > > +
> > Is not needed if we have data member and i915.enable_scheduler.
> Making the scheduler structure a object rather than a pointer
> basically 
> means moving the whole of i915_scheduler.h into i915_drv.h. There is
> too 
> much interconnectedness between the various structures. I would say
> that 
> keeping the code nicely partitioned and readable is worth the one 
> dynamic allocation at driver load time.
> 
> > 
> > 
> > > 
> > > +	spin_lock_init(&scheduler->lock);
> > > +
> > > +	for (r = 0; r < I915_NUM_RINGS; r++)
> > > +		INIT_LIST_HEAD(&scheduler->node_queue[r]);
> > > +
> > > +	/* Default tuning values: */
> > > +	scheduler->priority_level_min     = -1023;
> > > +	scheduler->priority_level_max     = 1023;
> > > +	scheduler->priority_level_bump    = 50;
> > > +	scheduler->priority_level_preempt = 900;
> > > +	scheduler->min_flying             = 2;
> > > +
> > > +	dev_priv->scheduler = scheduler;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/*
> > > + * Add a popped node back in to the queue. For example, because
> > > the ring was
> > > + * hung when execfinal() was called and thus the ring submission
> > > needs to be
> > > + * retried later.
> > > + */
> > > +static void i915_scheduler_node_requeue(struct
> > > i915_scheduler_queue_entry *node)
> > > +{
> > > +	WARN_ON(!node);
> > Rather remove this completely as it is an internal function, or use
> > BUG_ON because next line will straight lead to OOPS after warning.
> > I'll
> > pass mentioning rest of the obvious WARN_ON vs BUG_ON's.
> Yeah, most of the WARN/BUG_ONs are probably unnecessary now. They
> were 
> scattered liberally around during initial development when strange
> and 
> unexpected things could happen. Now the the code paths are pretty
> well 
> established they aren't really much use. I'll reduce them down to
> just 
> the ones that still make sense.
> 
> 
> > 
> > 
> > > 
> > > +	WARN_ON(!I915_SQS_IS_FLYING(node));
> > > +
> > > +	/* Seqno will be reassigned on relaunch */
> > > +	node->params.request->seqno = 0;
> > > +	node->status = i915_sqs_queued;
> > > +}
> > > +
> > > +/*
> > > + * Give up on a node completely. For example, because it is
> > > causing the
> > > + * ring to hang or is using some resource that no longer exists.
> > > + */
> > > +static void i915_scheduler_node_kill(struct
> > > i915_scheduler_queue_entry *node)
> > > +{
> > > +	WARN_ON(!node);
> > > +	WARN_ON(I915_SQS_IS_COMPLETE(node));
> > > +
> > > +	node->status = i915_sqs_dead;
> > > +}
> > > +
> > > +/* Mark a node as in flight on the hardware. */
> > As for documentation, from below, I only assume scheduler lock must
> > be
> > held and node's can be only manipulated when scheduler lock is
> > held?
> > It's good to add a comment describing the expected locking. Or
> > maybe
> > add assert_scheduler_lock_held() helper and sprinkle it around the
> > functions similar to assert_rpm_wakelock_held(), which is self-
> > documenting?
> > 
> > > 
> > > +static int i915_scheduler_node_fly(struct
> > > i915_scheduler_queue_entry *node)
> > > +{
> > > +	struct drm_i915_private *dev_priv = node->params.dev-
> > > >dev_private;
> > If node is NULL, this would already OOPS, I do not think it's
> > unreasonable to expect node not to be NULL. The below WARN_ON check
> > is
> > never reached.
> > 
> > > 
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct intel_engine_cs *ring;
> > > +
> > > +	WARN_ON(!scheduler);
> > > +	WARN_ON(!node);
> > > +	WARN_ON(node->status != i915_sqs_popped);
> > > +
> > > +	ring = node->params.ring;
> > > +
> > This and similar assignments can be squashed to declaration line,
> > as
> > this is basically an alias.
> > 
> > > 
> > > +	/*
> > > +	 * Add the node (which should currently be in state
> > > popped) to the
> > > +	 * front of the queue. This ensure that flying nodes are
> > > always held
> > > +	 * in hardware submission order.
> > > +	 */
> > > +	list_add(&node->link, &scheduler->node_queue[ring->id]);
> > > +
> > > +	node->status = i915_sqs_flying;
> > > +
> > > +	if (!(scheduler->flags[ring->id] &
> > > i915_sf_interrupts_enabled)) {
> > > +		bool success = true;
> > > +
> > > +		success = ring->irq_get(ring);
> > > +		if (success)
> > > +			scheduler->flags[ring->id] |=
> > > i915_sf_interrupts_enabled;
> > > +		else
> > > +			return -EINVAL;
> > Shouldn't the node be cleaned up from node_queue in case of an
> > error?
> Not really. The batch buffer will still be executed regardless of 
> whether interrupts could be enabled or not. The return code is
> actually 
> ignored so this should really be a void function, I guess. Without 
> interrupts then completion might take a while to be noticed but
> stuff 
> should still basically work.
> 
> 
> > 
> > 
> > Also, I think above could be written much more compact form
> > (because it
> > looks like something where more logic will be added later). It
> > makes it
> > easier to write and visualize the error paths when reading if there
> > are
> > no nested if's.
> > 
> > I won't mention about the error paths of functions below, I expect
> > the
> > following guidline to be adhered;
> > 
> > 	if (scheduler->flags[ring->id] & I915_SF_INT_ENABLED)
> > 		goto out;
> > 
> > 	if (!ring->irq_get(ring))
> > 		goto err_irq;
> > 
> > 	if (!...)
> > 		goto err_foo;
> > 
> > 	scheduler->flags[ring->id] |= I915_SF_INT_ENABLED;
> > out:
> > 	return 0;
> > err_foo:
> > 	foobar();
> > err_irq:
> > 	list_remove()
> > 	return -EINVAL;
> > 
> > > 
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static uint32_t i915_scheduler_count_flying(struct
> > > i915_scheduler *scheduler,
> > > +					    struct
> > > intel_engine_cs *ring)
> > > +{
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	uint32_t flying = 0;
> > > +
> > > +	list_for_each_entry(node, &scheduler->node_queue[ring-
> > > >id], link)
> > +1 for the for_each_scheduler_node(...)
> > 
> > > 
> > > +		if (I915_SQS_IS_FLYING(node))
> > > +			flying++;
> > > +
> > > +	return flying;
> > > +}
> > > +
> > > +static void i915_scheduler_priority_bump_clear(struct
> > > i915_scheduler *scheduler)
> > > +{
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	int i;
> > > +
> > > +	/*
> > > +	 * Ensure circular dependencies don't cause problems and
> > > that a bump
> > > +	 * by object usage only bumps each using buffer once:
> > > +	 */
> > > +	for (i = 0; i < I915_NUM_RINGS; i++) {
> > > +		list_for_each_entry(node, &scheduler-
> > > >node_queue[i], link)
> > > +			node->bumped = false;
> > > +	}
> > > +}
> > > +
> > > +static int i915_scheduler_priority_bump(struct i915_scheduler
> > > *scheduler,
> > > +				struct
> > > i915_scheduler_queue_entry *target,
> > Is there specific reason why struct i915_scheduler_queue_entry are
> > not
> > referred to just as "node" but "qe" and here something else, do
> > "node"
> > and "qe" have a special semantic meaning?
> Not sure what you mean. The use of 'node' is because almost
> everything 
> an i915_scheduler_queue_entry object is being used it is as a node in
> a 
> list that is being iterated over. The only 'qe' variable is in 
> 'i915_scheduler_queue_execbuffer[_bypass]' and there it is because
> the 
> qe is being passed in from outside and is copied to a local 'node'
> that 
> is then added to the list. I think the use of 'target' here is
> largely 
> historical. The subsequent array search was originally a list search
> and 
> hence had a 'node' that was used as the iterator. Thus the qe passed
> in 
> was called something else to avoid conflicts and target seemed
> appropriate.
> 
> > 
> > > 
> > > +				uint32_t bump)
> > > +{
> > > +	uint32_t new_priority;
> > > +	int i, count;
> > > +
> > > +	if (target->priority >= scheduler->priority_level_max)
> > > +		return 1;
> > So if one node reaches maximum priority the dependencies are
> > expected
> > to be maxed out already?
> Yes. Dependencies should always be of equal or higher priority to
> the 
> dependee. When a new dependency is created, the priority of the
> dependee 
> is added on to ensure that something of high priority can never be
> stuck 
> waiting on something of low priority. Also, it means that a batch
> buffer 
> that lots of other buffers are waiting on will be bumped to higher
> and 
> higher priority the more waiters there are and thus is more likely to
> be 
> executed sooner and free them all up.
> 
> 
> > 
> > > 
> > > +
> > > +	if (target->bumped)
> > > +		return 0;
> > > +
> > > +	new_priority = target->priority + bump;
> > > +	if ((new_priority <= target->priority) ||
> > > +	    (new_priority > scheduler->priority_level_max))
> > > +		target->priority = scheduler-
> > > >priority_level_max;
> > > +	else
> > > +		target->priority = new_priority;
> > > +
> > > +	count = 1;
> > > +	target->bumped = true;
> > > +
> > > +	for (i = 0; i < target->num_deps; i++) {
> > > +		if (!target->dep_list[i])
> > > +			continue;
> > > +
> > > +		if (target->dep_list[i]->bumped)
> > > +			continue;
> > > +
> > > +		count += i915_scheduler_priority_bump(scheduler,
> > > +						      target-
> > > >dep_list[i],
> > > +						      bump);
> > > +	}
> > > +
> > > +	return count;
> > > +}
> > > +
> > > +/*
> > > + * Nodes are considered valid dependencies if they are queued on
> > > any ring or
> > > + * if they are in flight on a different ring. In flight on the
> > > same ring is no
> > > + * longer interesting for non-premptive nodes as the ring
> > > serialises execution.
> > > + * For pre-empting nodes, all in flight dependencies are valid
> > > as they must not
> > > + * be jumped by the act of pre-empting.
> > > + *
> > > + * Anything that is neither queued nor flying is uninteresting.
> > > + */
> > > +static inline bool i915_scheduler_is_dependency_valid(
> > > +			struct i915_scheduler_queue_entry *node,
> > > uint32_t idx)
> > > +{
> > > +	struct i915_scheduler_queue_entry *dep;
> > > +
> > > +	dep = node->dep_list[idx];
> > > +	if (!dep)
> > > +		return false;
> > > +
> > > +	if (I915_SQS_IS_QUEUED(dep))
> > > +		return true;
> > > +
> > > +	if (I915_SQS_IS_FLYING(dep)) {
> > > +		if (node->params.ring != dep->params.ring)
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +static int i915_scheduler_pop_from_queue_locked(struct
> > > intel_engine_cs *ring,
> > > +				struct
> > > i915_scheduler_queue_entry **pop_node)
> > > +{
> > > +	struct drm_i915_private *dev_priv = ring->dev-
> > > >dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct i915_scheduler_queue_entry *best = NULL;
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	int ret;
> > > +	int i;
> > > +	bool any_queued = false;
> > > +	bool has_local, has_remote, only_remote = false;
> > > +
> > > +	*pop_node = NULL;
> > > +	ret = -ENODATA;
> > > +
> > > +	list_for_each_entry(node, &scheduler->node_queue[ring-
> > > >id], link) {
> > > +		if (!I915_SQS_IS_QUEUED(node))
> > > +			continue;
> > > +		any_queued = true;
> > > +
> > > +		has_local  = false;
> > > +		has_remote = false;
> > > +		for (i = 0; i < node->num_deps; i++) {
> > > +			if
> > > (!i915_scheduler_is_dependency_valid(node, i))
> > > +				continue;
> > > +
> > > +			if (node->dep_list[i]->params.ring ==
> > > node->params.ring)
> > > +				has_local = true;
> > > +			else
> > > +				has_remote = true;
> > > +		}
> > > +
> > > +		if (has_remote && !has_local)
> > > +			only_remote = true;
> > > +
> > > +		if (!has_local && !has_remote) {
> > > +			if (!best ||
> > > +			    (node->priority > best->priority))
> > > +				best = node;
> > > +		}
> > > +	}
> > > +
> > > +	if (best) {
> > > +		list_del(&best->link);
> > > +
> > > +		INIT_LIST_HEAD(&best->link);
> > > +		best->status  = i915_sqs_popped;
> > > +
> > > +		ret = 0;
> > > +	} else {
> > > +		/* Can only get here if:
> > > +		 * (a) there are no buffers in the queue
> > > +		 * (b) all queued buffers are dependent on other
> > > buffers
> > > +		 *     e.g. on a buffer that is in flight on a
> > > different ring
> > > +		 */
> > > +		if (only_remote) {
> > > +			/* The only dependent buffers are on
> > > another ring. */
> > > +			ret = -EAGAIN;
> > > +		} else if (any_queued) {
> > > +			/* It seems that something has gone
> > > horribly wrong! */
> > > +			DRM_ERROR("Broken dependency tracking on
> > > ring %d!\n",
> > > +				  (int) ring->id);
> > Would this condition be worthy WARN_ONCE(), because it will keep
> > repeating as the situation will persist?
> > 
> > > 
> > > +		}
> > > +	}
> > > +
> > > +	*pop_node = best;
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * NB: The driver mutex lock must be held before calling this
> > > function. It is
> > > + * only really required during the actual back end submission
> > > call. However,
> > > + * attempting to acquire a mutex while holding a spin lock is a
> > > Bad Idea.
> > > + * And releasing the one before acquiring the other leads to
> > > other code
> > > + * being run and interfering.
> > > + */
> > > +static int i915_scheduler_submit(struct intel_engine_cs *ring)
> > > +{
> > > +	struct drm_device *dev = ring->dev;
> > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	int ret, count = 0, flying;
> > > +
> > > +	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> > > +
> > > +	spin_lock_irq(&scheduler->lock);
> > > +
> > > +	WARN_ON(scheduler->flags[ring->id] &
> > > i915_sf_submitting);
> > > +	scheduler->flags[ring->id] |= i915_sf_submitting;
> > > +
> > > +	/* First time around, complain if anything unexpected
> > > occurs: */
> > > +	ret = i915_scheduler_pop_from_queue_locked(ring, &node);
> > > +	if (ret)
> > > +		goto error;
> > > +
> > > +	do {
> > > +		WARN_ON(!node);
> > > +		WARN_ON(node->params.ring != ring);
> > > +		WARN_ON(node->status != i915_sqs_popped);
> > > +		count++;
> > > +
> > > +		/*
> > > +		 * The call to pop above will have removed the
> > > node from the
> > > +		 * list. So add it back in and mark it as in
> > > flight.
> > > +		 */
> > > +		i915_scheduler_node_fly(node);
> > > +
> > > +		spin_unlock_irq(&scheduler->lock);
> > > +		ret = dev_priv->gt.execbuf_final(&node->params);
> > > +		spin_lock_irq(&scheduler->lock);
> > > +
> > > +		/*
> > > +		 * Handle failed submission but first check that
> > > the
> > > +		 * watchdog/reset code has not nuked the node
> > > while we
> > > +		 * weren't looking:
> > > +		 */
> > > +		if (ret && (node->status != i915_sqs_dead)) {
> > > +			bool requeue = true;
> > > +
> > > +			/*
> > > +			 * Oh dear! Either the node is broken or
> > > the ring is
> > > +			 * busy. So need to kill the node or
> > > requeue it and try
> > > +			 * again later as appropriate.
> > > +			 */
> > > +
> > > +			switch (-ret) {
> > > +			case ENODEV:
> > > +			case ENOENT:
> > > +				/* Fatal errors. Kill the node.
> > > */
> > > +				requeue = false;
> > > +				i915_scheduler_node_kill(node);
> > > +			break;
> > These break indents still.
> > 
> > > 
> > > +
> > > +			case EAGAIN:
> > > +			case EBUSY:
> > > +			case EIO:
> > > +			case ENOMEM:
> > > +			case ERESTARTSYS:
> > > +			case EINTR:
> > > +				/* Supposedly recoverable
> > > errors. */
> > > +			break;
> > > +
> > > +			default:
> > > +				/*
> > > +				 * Assume the error is
> > > recoverable and hope
> > > +				 * for the best.
> > > +				 */
> > > +				MISSING_CASE(-ret);
> > > +			break;
> > > +			}
> > > +
> > > +			if (requeue) {
> > > +				i915_scheduler_node_requeue(node
> > > );
> > > +				/*
> > > +				 * No point spinning if the ring
> > > is currently
> > > +				 * unavailable so just give up
> > > and come back
> > > +				 * later.
> > > +				 */
> > > +				break;
> > > +			}
> > > +		}
> > > +
> > > +		/* Keep launching until the sky is sufficiently
> > > full. */
> > > +		flying = i915_scheduler_count_flying(scheduler,
> > > ring);
> > > +		if (flying >= scheduler->min_flying)
> > > +			break;
> > > +
> > > +		/* Grab another node and go round again... */
> > > +		ret = i915_scheduler_pop_from_queue_locked(ring,
> > > &node);
> > > +	} while (ret == 0);
> > > +
> > > +	/* Don't complain about not being able to submit extra
> > > entries */
> > > +	if (ret == -ENODATA)
> > > +		ret = 0;
> > > +
> > > +	/*
> > > +	 * Bump the priority of everything that was not
> > > submitted to prevent
> > > +	 * starvation of low priority tasks by a spamming high
> > > priority task.
> > > +	 */
> > This, this calls for an I-G-T test to make sure we're effective.
> Yeah, IGTs are under development.
> 
> > 
> > > 
> > > +	i915_scheduler_priority_bump_clear(scheduler);
> > > +	list_for_each_entry(node, &scheduler->node_queue[ring-
> > > >id], link) {
> > > +		if (!I915_SQS_IS_QUEUED(node))
> > > +			continue;
> > > +
> > > +		i915_scheduler_priority_bump(scheduler, node,
> > > +					     scheduler-
> > > >priority_level_bump);
> > > +	}
> > > +
> > > +	/* On success, return the number of buffers submitted.
> > > */
> > > +	if (ret == 0)
> > > +		ret = count;
> > > +
> > > +error:
> > > +	scheduler->flags[ring->id] &= ~i915_sf_submitting;
> > > +	spin_unlock_irq(&scheduler->lock);
> > > +	return ret;
> > > +}
> > > +
> > > +static void i915_generate_dependencies(struct i915_scheduler
> > > *scheduler,
> > > +				       struct
> > > i915_scheduler_queue_entry *node,
> > > +				       uint32_t ring)
> > > +{
> > > +	struct i915_scheduler_obj_entry *this, *that;
> > > +	struct i915_scheduler_queue_entry *test;
> > > +	int i, j;
> > > +	bool found;
> > > +
> > > +	list_for_each_entry(test, &scheduler->node_queue[ring],
> > > link) {
> > > +		if (I915_SQS_IS_COMPLETE(test))
> > > +			continue;
> > > +
> > > +		/*
> > > +		 * Batches on the same ring for the same
> > > +		 * context must be kept in order.
> > > +		 */
> > > +		found = (node->params.ctx == test->params.ctx)
> > > &&
> > > +			(node->params.ring == test-
> > > >params.ring);
> > > +
> > > +		/*
> > > +		 * Batches working on the same objects must
> > > +		 * be kept in order.
> > > +		 */
> > > +		for (i = 0; (i < node->num_objs) && !found; i++)
> > > {
> > > +			this = node->saved_objects + i;
> > node->objs and node->num_objs would be a better naming match.
> > num_objs
> > and saved_objects makes me think we have a bug here, indexing wrong
> > array.
> > 
> > > 
> > > +
> > > +			for (j = 0; j < test->num_objs; j++) {
> > > +				that = test->saved_objects + j;
> > > +
> > > +				if (this->obj != that->obj)
> > > +					continue;
> > > +
> > > +				/* Only need to worry about
> > > writes */
> > > +				if (this->read_only && that-
> > > >read_only)
> > > +					continue;
> > > +
> > > +				found = true;
> > > +				break;
> > > +			}
> > > +		}
> > > +
> > > +		if (found) {
> > > +			node->dep_list[node->num_deps] = test;
> > > +			node->num_deps++;
> > > +		}
> > > +	}
> > > +}
> > > +
> > > +static int i915_scheduler_queue_execbuffer_bypass(struct
> > > i915_scheduler_queue_entry *qe)
> > > +{
> > > +	struct drm_i915_private *dev_priv = qe->params.dev-
> > > >dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	int ret;
> > > +
> > > +	scheduler->flags[qe->params.ring->id] |=
> > > i915_sf_submitting;
> > > +	ret = dev_priv->gt.execbuf_final(&qe->params);
> > > +	scheduler->flags[qe->params.ring->id] &=
> > > ~i915_sf_submitting;
> > > +
> > The above code makes me think of locking again, maybe document at
> > top
> > of this function.
> No need to acquire the spinlock at this point. If the scheduler is
> in 
> bypass mode then there is no internal state to protect. The driver
> mutex 
> lock will be held because there is only one code path to get here
> which 
> is the execbuff IOCTL. The whole point of bypass mode is that it is
> a 
> single contiguous execution path from IOCTL entry all the way through
> to 
> hardware submission. A mutex check could be added but it seems 
> unnecessary given the limited calling scope.
> 
> > 
> > 
> > > 
> > > +	/*
> > > +	 * Don't do any clean up on failure because the caller
> > > will
> > > +	 * do it all anyway.
> > > +	 */
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	/* Need to release any resources held by the node: */
> > > +	i915_scheduler_clean_node(qe);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * i915_scheduler_queue_execbuffer - Submit a batch buffer
> > > request to the
> > > + * scheduler.
> > > + * @qe: The batch buffer request to be queued.
> > > + * The expectation is the qe passed in is a local stack
> > > variable. This
> > > + * function will copy its contents into a freshly allocated list
> > > node. The
> > > + * new node takes ownership of said contents so the original qe
> > > should simply
> > > + * be discarded and not cleaned up (i.e. don't free memory it
> > > points to or
> > > + * dereference objects it holds). The node is added to the
> > > scheduler's queue
> > > + * and the batch buffer will be submitted to the hardware at
> > > some future
> > > + * point in time (which may be immediately, before returning or
> > > may be quite
> > > + * a lot later).
> > > + */
> > > +int i915_scheduler_queue_execbuffer(struct
> > > i915_scheduler_queue_entry *qe)
> > > +{
> > > +	struct drm_i915_private *dev_priv = qe->params.dev-
> > > >dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct intel_engine_cs *ring = qe->params.ring;
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	struct i915_scheduler_queue_entry *test;
> > > +	bool not_flying;
> > > +	int i, r;
> > > +	int incomplete = 0;
> > > +
> > > +	WARN_ON(!scheduler);
> > > +
> > > +	if (1/*!i915.enable_scheduler*/)
> > > +		return
> > > i915_scheduler_queue_execbuffer_bypass(qe);
> > > +
> > I'm thinking, should we branch already before calling a function
> > named
> > i915_scheduler_queue_execbuffer if scheduler is disabled?
> That would require putting if(scheduler) logic in random points 
> throughout the driver. Doing it this way, the driver as a whole has
> a 
> single execution path which is via the scheduler. It is then up to
> the 
> scheduler itself to decide whether to run in bypass mode, simple 
> re-ordering mode, full pre-emption mode or any other mode we feel a
> need 
> to add.
> 
> 
> > 
> > > 
> > > +	node = kmalloc(sizeof(*node), GFP_KERNEL);
> > > +	if (!node)
> > > +		return -ENOMEM;
> > > +
> > > +	*node = *qe;
> > Now I read the added documentation for the function, maybe we
> > should at
> > least zero out qe to avoid future problems?
> The execbuff success path hands the qe contents over and pretty much 
> immediately drops all the way back out of the IOCTL. The failure
> path 
> does the full cleanup of de-allocating all the qe stuff it had 
> previously allocated. IMO there doesn't seem to be much need to zero
> out 
> the structure at this point.
> 
> > 
> > 
> > > 
> > > +	INIT_LIST_HEAD(&node->link);
> > > +	node->status = i915_sqs_queued;
> > > +	node->stamp  = jiffies;
> > > +	i915_gem_request_reference(node->params.request);
> > > +
> > > +	WARN_ON(node->params.request->scheduler_qe);
> > > +	node->params.request->scheduler_qe = node;
> > > +
> > > +	/* Need to determine the number of incomplete entries in
> > > the list as
> > > +	 * that will be the maximum size of the dependency list.
> > > +	 *
> > > +	 * Note that the allocation must not be made with the
> > > spinlock acquired
> > > +	 * as kmalloc can sleep. However, the unlock/relock is
> > > safe because no
> > > +	 * new entries can be queued up during the unlock as the
> > > i915 driver
> > > +	 * mutex is still held. Entries could be removed from
> > > the list but that
> > > +	 * just means the dep_list will be over-allocated which
> > > is fine.
> > > +	 */
> > > +	spin_lock_irq(&scheduler->lock);
> > -->
> > 
> > > 
> > > +	for (r = 0; r < I915_NUM_RINGS; r++) {
> > > +		list_for_each_entry(test, &scheduler-
> > > >node_queue[r], link) {
> > > +			if (I915_SQS_IS_COMPLETE(test))
> > > +				continue;
> > > +
> > > +			incomplete++;
> > > +		}
> > > +	}
> > > +
> > <-- This counting structure, I still think it would be good idea to
> > make it a static helper.
> > 
> > > 
> > > +	/* Temporarily unlock to allocate memory: */
> > > +	spin_unlock_irq(&scheduler->lock);
> > > +	if (incomplete) {
> > > +		node->dep_list = kmalloc_array(incomplete,
> > > +					       sizeof(*node-
> > > >dep_list),
> > > +					       GFP_KERNEL);
> > > +		if (!node->dep_list) {
> > > +			kfree(node);
> > > +			return -ENOMEM;
> > > +		}
> > > +	} else
> > > +		node->dep_list = NULL;
> > > +
> > > +	spin_lock_irq(&scheduler->lock);
> > > +	node->num_deps = 0;
> > > +
> > > +	if (node->dep_list) {
> > > +		for (r = 0; r < I915_NUM_RINGS; r++)
> > > +			i915_generate_dependencies(scheduler,
> > > node, r);
> > > +
> > > +		WARN_ON(node->num_deps > incomplete);
> > > +	}
> > > +
> > > +	node->priority = clamp(node->priority,
> > > +			       scheduler->priority_level_min,
> > > +			       scheduler->priority_level_max);
> > > +
> > > +	if ((node->priority > 0) && node->num_deps) {
> > > +		i915_scheduler_priority_bump_clear(scheduler);
> > > +
> > > +		for (i = 0; i < node->num_deps; i++)
> > > +			i915_scheduler_priority_bump(scheduler,
> > > +					node->dep_list[i], node-
> > > >priority);
> > If node is posted with maximum priority to begin with, wouldn't the
> > priority propagation stop after first level dependencies due to the
> > check in the beginning of priority_bump function?
> The recursion will stop whenever it encounters a node already at
> maximum 
> priority. However, it is not starting from the newly submitted node
> but 
> from that node's dependencies. If they are already at max priority
> then 
> their dependencies must also be and so on down the chain and no
> further 
> processing is required. On the other hand, if they are not then they 
> will be bumped and their dependencies checked and so on down the
> chain.
> 
> > 
> > > 
> > > +
> > > +	list_add_tail(&node->link, &scheduler->node_queue[ring-
> > > >id]);
> > > +
> > > +	not_flying = i915_scheduler_count_flying(scheduler,
> > > ring) <
> > > +						 scheduler-
> > > >min_flying;
> > > +
> > > +	spin_unlock_irq(&scheduler->lock);
> > > +
> > > +	if (not_flying)
> > > +		i915_scheduler_submit(ring);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * i915_scheduler_notify_request - Notify the scheduler that the
> > > given
> > > + * request has completed on the hardware.
> > > + * @req: Request structure which has completed
> > > + * @preempt: Did it complete pre-emptively?
> > > + * A sequence number has popped out of the hardware and the
> > > request handling
> > > + * code has mapped it back to a request and will mark that
> > > request complete.
> > > + * It also calls this function to notify the scheduler about the
> > > completion
> > > + * so the scheduler's node can be updated appropriately.
> > > + * Returns true if the request is scheduler managed, false if
> > > not. The return
> > > + * value is combined for all freshly completed requests and if
> > > any were true
> > > + * then i915_scheduler_wakeup() is called so the scheduler can
> > > do further
> > > + * processing (submit more work) at the end.
> > > + */
> > > +bool i915_scheduler_notify_request(struct drm_i915_gem_request
> > > *req)
> > > +{
> > > +	struct drm_i915_private *dev_priv = to_i915(req->ring-
> > > >dev);
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct i915_scheduler_queue_entry *node = req-
> > > >scheduler_qe;
> > > +	unsigned long flags;
> > > +
> > > +	if (!node)
> > > +		return false;
> > > +
> > > +	spin_lock_irqsave(&scheduler->lock, flags);
> > > +
> > > +	WARN_ON(!I915_SQS_IS_FLYING(node));
> > > +
> > > +	/* Node was in flight so mark it as complete. */
> > > +	if (req->cancelled)
> > > +		node->status = i915_sqs_dead;
> > > +	else
> > > +		node->status = i915_sqs_complete;
> > > +
> > > +	spin_unlock_irqrestore(&scheduler->lock, flags);
> > > +
> > > +	return true;
> > > +}
> > > +
> > > +static int i915_scheduler_remove_dependent(struct i915_scheduler
> > > *scheduler,
> > > +				struct
> > > i915_scheduler_queue_entry *remove)
> > > +{
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	int i, r;
> > > +	int count = 0;
> > > +
> > > +	/*
> > > +	 * Ensure that a node is not being removed which is
> > > still dependent
> > > +	 * upon other (not completed) work. If that happens, it
> > > implies
> > > +	 * something has gone very wrong with the dependency
> > > tracking! Note
> > > +	 * that there is no need to worry if this node has been
> > > explicitly
> > > +	 * killed for some reason - it might be being killed
> > > before it got
> > > +	 * sent to the hardware.
> > > +	 */
> > > +	if (remove->status != i915_sqs_dead) {
> > > +		for (i = 0; i < remove->num_deps; i++)
> > > +			if ((remove->dep_list[i]) &&
> > > +			    (!I915_SQS_IS_COMPLETE(remove-
> > > >dep_list[i])))
> > > +				count++;
> > > +		WARN_ON(count);
> > > +	}
> > > +
> > > +	/*
> > > +	 * Remove this node from the dependency lists of any
> > > other node which
> > > +	 * might be waiting on it.
> > > +	 */
> > > +	for (r = 0; r < I915_NUM_RINGS; r++) {
> > > +		list_for_each_entry(node, &scheduler-
> > > >node_queue[r], link) {
> > > +			for (i = 0; i < node->num_deps; i++) {
> > > +				if (node->dep_list[i] != remove)
> > > +					continue;
> > > +
> > > +				node->dep_list[i] = NULL;
> > > +			}
> > > +		}
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * i915_scheduler_wakeup - wake the scheduler's worker thread
> > > + * @dev: DRM device
> > > + * Called at the end of seqno interrupt processing if any
> > > request has
> > > + * completed that corresponds to a scheduler node.
> > > + */
> > > +void i915_scheduler_wakeup(struct drm_device *dev)
> > > +{
> > > +	/* XXX: Need to call i915_scheduler_remove() via work
> > > handler. */
> > > +}
> > > +
> > > +/**
> > > + * i915_scheduler_clean_node - free up any
> > > allocations/references
> > > + * associated with the given scheduler queue entry.
> > > + * @node: Queue entry structure which is complete
> > > + * After a give batch buffer completes on the hardware, all the
> > > information
> > > + * required to resubmit it is no longer required. However, the
> > > node entry
> > > + * itself might still be required for tracking purposes for a
> > > while longer.
> > > + * This function should be called as soon as the node is known
> > > to be complete
> > > + * so that these resources may be freed even though the node
> > > itself might
> > > + * hang around.
> > > + */
> > > +void i915_scheduler_clean_node(struct i915_scheduler_queue_entry
> > > *node)
> > > +{
> > > +	if (!I915_SQS_IS_COMPLETE(node)) {
> > > +		WARN(!node->params.request->cancelled,
> > > +		     "Cleaning active node: %d!\n", node-
> > > >status);
> > > +		return;
> > > +	}
> > > +
> > > +	if (node->params.batch_obj) {
> > > +		/*
> > > +		 * The batch buffer must be unpinned before it
> > > is unreferenced
> > > +		 * otherwise the unpin fails with a missing
> > > vma!?
> > > +		 */
> > > +		if (node->params.dispatch_flags &
> > > I915_DISPATCH_SECURE)
> > > +			i915_gem_execbuff_release_batch_obj(node
> > > ->params.batch_obj);
> > > +
> > This code seems little bit out of place.
> Not sure where else you could put it. The code itself is just a copy
> of 
> the existing clean up code that was already in the execbuff code
> path. 
> The cleanup can no longer be done within the execbuff call due to
> the 
> batch buffer execution being arbitrarily delayed by the scheduler.
> Thus 
> it is down to the scheduler to do all the necessary cleanup when it 
> knows that the batch buffer is no longer required.
> 
> > 
> > 
> > > 
> > > +		node->params.batch_obj = NULL;
> > > +	}
> > > +
> > > +	/* And anything else owned by the node: */
> > > +	if (node->params.cliprects) {
> > > +		kfree(node->params.cliprects);
> > > +		node->params.cliprects = NULL;
> > > +	}
> > > +}
> > > +
> > The below function is quite hacky, I understood this will be
> > mitigated
> > by per-ctx sequence numbering in the future?
> To be honest, I think it is already obsolete due to the previous work
> of 
> converting from seqno tests everywhere to testing request
> structures. 
> The point of the code below was to cope with seqno values popping out
> of 
> the hardware in random order due to the re-ordering of batch buffers 
> prior to execution. Whereas, the struct request conversion allowed
> the 
> seqno to be late allocated and thus kept in order.
> 
> Unfortunately, pre-emption confuses matters again. Although the 
> intention is that seqno values will still be kept in order. It's
> just 
> that a given request might have multiple different seqno values
> assigned 
> to it throughout its life. The pre-emption code hasn't quite settled 
> down yet though, especially with interesting new hardware planned
> for 
> future chips. Thus I don't really want to rip out all of this code
> quite 
> yet just in case we do still need it.
> 
> As you say, per-ctx seqnos would definitely have an effect here as
> well. 
> So yes, long term the intention is to remove all this nastyness.
> Short 
> term, it seems best to keep it around until proven unnecessary.
> 
> > 
> > 
> > > 
> > > +static bool i915_scheduler_remove(struct i915_scheduler
> > > *scheduler,
> > > +				  struct intel_engine_cs *ring,
> > > +				  struct list_head *remove)
> > > +{
> > > +	struct i915_scheduler_queue_entry *node, *node_next;
> > > +	int flying = 0, queued = 0;
> > > +	bool do_submit;
> > > +	uint32_t min_seqno;
> > > +
> > > +	spin_lock_irq(&scheduler->lock);
> > > +
> > > +	/*
> > > +	 * In the case where the system is idle, starting
> > > 'min_seqno' from a big
> > > +	 * number will cause all nodes to be removed as they are
> > > now back to
> > > +	 * being in-order. However, this will be a problem if
> > > the last one to
> > > +	 * complete was actually out-of-order as the ring seqno
> > > value will be
> > > +	 * lower than one or more completed buffers. Thus code
> > > looking for the
> > > +	 * completion of said buffers will wait forever.
> > > +	 * Instead, use the hardware seqno as the starting
> > > point. This means
> > > +	 * that some buffers might be kept around even in a
> > > completely idle
> > > +	 * system but it should guarantee that no-one ever gets
> > > confused when
> > > +	 * waiting for buffer completion.
> > > +	 */
> > > +	min_seqno = ring->get_seqno(ring, true);
> > > +
> > > +	list_for_each_entry(node, &scheduler->node_queue[ring-
> > > >id], link) {
> > > +		if (I915_SQS_IS_QUEUED(node))
> > > +			queued++;
> > > +		else if (I915_SQS_IS_FLYING(node))
> > > +			flying++;
> > > +		else if (I915_SQS_IS_COMPLETE(node))
> > > +			continue;
> > > +
> > > +		if (node->params.request->seqno == 0)
> > > +			continue;
> > > +
> > > +		if (!i915_seqno_passed(node->params.request-
> > > >seqno, min_seqno))
> > > +			min_seqno = node->params.request->seqno;
> > > +	}
> > > +
> > > +	INIT_LIST_HEAD(remove);
> > > +	list_for_each_entry_safe(node, node_next, &scheduler-
> > > >node_queue[ring->id], link) {
> > > +		/*
> > > +		 * Only remove completed nodes which have a
> > > lower seqno than
> > > +		 * all pending nodes. While there is the
> > > possibility of the
> > > +		 * ring's seqno counting backwards, all higher
> > > buffers must
> > > +		 * be remembered so that the
> > > 'i915_seqno_passed()' test can
> > > +		 * report that they have in fact passed.
> > > +		 *
> > > +		 * NB: This is not true for 'dead' nodes. The
> > > GPU reset causes
> > > +		 * the software seqno to restart from its
> > > initial value. Thus
> > > +		 * the dead nodes must be removed even though
> > > their seqno values
> > > +		 * are potentially vastly greater than the
> > > current ring seqno.
> > > +		 */
> > > +		if (!I915_SQS_IS_COMPLETE(node))
> > > +			continue;
> > > +
> > > +		if (node->status != i915_sqs_dead) {
> > > +			if (i915_seqno_passed(node-
> > > >params.request->seqno, min_seqno) &&
> > > +			    (node->params.request->seqno !=
> > > min_seqno))
> > > +				continue;
> > > +		}
> > > +
> > > +		list_del(&node->link);
> > > +		list_add(&node->link, remove);
> > > +
> > > +		/* Strip the dependency info while the mutex is
> > > still locked */
> > > +		i915_scheduler_remove_dependent(scheduler,
> > > node);
> > > +
> > > +		continue;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Release the interrupt reference count if there are no
> > > longer any
> > > +	 * nodes to worry about.
> > > +	 */
> > > +	if (!flying && !queued &&
> > > +	    (scheduler->flags[ring->id] &
> > > i915_sf_interrupts_enabled)) {
> > > +		ring->irq_put(ring);
> > > +		scheduler->flags[ring->id] &=
> > > ~i915_sf_interrupts_enabled;
> > Maybe have this piece of code as a helper? To enable interrupts and
> > disable them, maybe even a reference count?
> Not sure how much you could abstract out into a helper other than
> the 
> two lines of irq_get|put + flag set|clear. The flying|queued test is 
> particular to this one instance. Also, there is no need to reference 
> count as the irq code does that already.
> 
> > 
> > 
> > > 
> > > +	}
> > > +
> > > +	/* Launch more packets now? */
> > > +	do_submit = (queued > 0) && (flying < scheduler-
> > > >min_flying);
> > > +
> > > +	spin_unlock_irq(&scheduler->lock);
> > > +
> > > +	return do_submit;
> > > +}
> > > +
> > > +void i915_scheduler_process_work(struct intel_engine_cs *ring)
> > > +{
> > > +	struct drm_i915_private *dev_priv = ring->dev-
> > > >dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	bool do_submit;
> > > +	struct list_head remove;
> > > +
> > > +	if (list_empty(&scheduler->node_queue[ring->id]))
> > > +		return;
> > > +
> > > +	/* Remove completed nodes. */
> > > +	do_submit = i915_scheduler_remove(scheduler, ring,
> > > &remove);
> > > +
> > > +	if (!do_submit && list_empty(&remove))
> > > +		return;
> > > +
> > > +	/* Need to grab the pm lock outside of the mutex lock */
> > > +	if (do_submit)
> > > +		intel_runtime_pm_get(dev_priv);
> > > +
> > > +	mutex_lock(&ring->dev->struct_mutex);
> > > +
> > > +	if (do_submit)
> > > +		i915_scheduler_submit(ring);
> > > +
> > > +	while (!list_empty(&remove)) {
> > > +		node = list_first_entry(&remove, typeof(*node),
> > > link);
> > > +		list_del(&node->link);
> > > +
> > > +		/* Free up all the DRM references */
> > > +		i915_scheduler_clean_node(node);
> > > +
> > > +		/* And anything else owned by the node: */
> > > +		node->params.request->scheduler_qe = NULL;
> > > +		i915_gem_request_unreference(node-
> > > >params.request);
> > > +		kfree(node->dep_list);
> > > +		kfree(node);
> > Maybe have a separate helper for freeing a node, would make this
> > much
> > cleaner.
> There is a helper for freeing up the contents of the node - see 
> i915_scheduler_clean_node() above. However, that function is called
> from 
> multiple places and this is the only instance where the node itself
> is 
> actually freed. The other callers either need to keep the node
> around 
> for a while longer (i.e. until this piece of code can run) or don't
> own 
> the node because it is stack allocated (in the bypass case). So the 
> 'anything else' section cannot be moved into the existing clean
> function 
> and any secondary helper would be single usage and only four lines
> long. 
> It doesn't seem worth it.
> 
> > 
> > 
> > > 
> > > +	}
> > > +
> > > +	mutex_unlock(&ring->dev->struct_mutex);
> > > +
> > > +	if (do_submit)
> > > +		intel_runtime_pm_put(dev_priv);
> > > +}
> > > diff --git a/drivers/gpu/drm/i915/i915_scheduler.h
> > > b/drivers/gpu/drm/i915/i915_scheduler.h
> > > new file mode 100644
> > > index 0000000..415fec8
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> > > @@ -0,0 +1,95 @@
> > > +/*
> > > + * Copyright (c) 2014 Intel Corporation
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person
> > > obtaining a
> > > + * copy of this software and associated documentation files (the
> > > "Software"),
> > > + * to deal in the Software without restriction, including
> > > without limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute,
> > > sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to
> > > whom the
> > > + * Software is furnished to do so, subject to the following
> > > conditions:
> > > + *
> > > + * The above copyright notice and this permission notice
> > > (including the next
> > > + * paragraph) shall be included in all copies or substantial
> > > portions of the
> > > + * Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> > > KIND, EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > > MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> > > EVENT SHALL
> > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> > > DAMAGES OR OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> > > OTHERWISE, ARISING
> > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > > OTHER DEALINGS
> > > + * IN THE SOFTWARE.
> > > + *
> > > + */
> > > +
> > > +#ifndef _I915_SCHEDULER_H_
> > > +#define _I915_SCHEDULER_H_
> > > +
> > > +enum i915_scheduler_queue_status {
> > > +	/* Limbo: */
> > > +	i915_sqs_none = 0,
> > > +	/* Not yet submitted to hardware: */
> > > +	i915_sqs_queued,
> > > +	/* Popped from queue, ready to fly: */
> > > +	i915_sqs_popped,
> > > +	/* Sent to hardware for processing: */
> > > +	i915_sqs_flying,
> > > +	/* Finished processing on the hardware: */
> > > +	i915_sqs_complete,
> > > +	/* Killed by watchdog or catastrophic submission
> > > failure: */
> > > +	i915_sqs_dead,
> > > +	/* Limit value for use with arrays/loops */
> > > +	i915_sqs_MAX
> > > +};
> > > +
> > To uppercase.
> Grrr. Linux kernel style guide is daft.
> 
> > 
> > 
> > > 
> > > +#define I915_SQS_IS_QUEUED(node)	(((node)->status ==
> > > i915_sqs_queued))
> > > +#define I915_SQS_IS_FLYING(node)	(((node)->status ==
> > > i915_sqs_flying))
> > > +#define I915_SQS_IS_COMPLETE(node)	(((node)->status ==
> > > i915_sqs_complete) || \
> > > +					 ((node)->status ==
> > > i915_sqs_dead))
> > > +
> > > +struct i915_scheduler_obj_entry {
> > > +	struct drm_i915_gem_object          *obj;
> > > +	bool                                read_only;
> > > +};
> > > +
> > > +struct i915_scheduler_queue_entry {
> > > +	struct i915_execbuffer_params       params;
> > > +	/* -1023 = lowest priority, 0 = default, 1023 = highest
> > > */
> > > +	int32_t                             priority;
> > > +	struct i915_scheduler_obj_entry     *saved_objects;
> > > +	int                                 num_objs;
> > > +	bool                                bumped;
> > > +	struct i915_scheduler_queue_entry   **dep_list;
> > > +	int                                 num_deps;
> > > +	enum i915_scheduler_queue_status    status;
> > > +	unsigned long                       stamp;
> > > +	struct list_head                    link;
> > Above whitespace is still incorrect and I could really use
> > comments.
> > 
> > > 
> > > +};
> > > +
> > > +struct i915_scheduler {
> > > +	struct list_head    node_queue[I915_NUM_RINGS];
> > > +	uint32_t            flags[I915_NUM_RINGS];
> > > +	spinlock_t          lock;
> > > +
> > > +	/* Tuning parameters: */
> > > +	int32_t             priority_level_min;
> > > +	int32_t             priority_level_max;
> > > +	int32_t             priority_level_bump;
> > > +	int32_t             priority_level_preempt;
> > > +	uint32_t            min_flying;
> > > +};
> > > +
> > > +/* Flag bits for i915_scheduler::flags */
> > > +enum {
> > > +	i915_sf_interrupts_enabled  = (1 << 0),
> > > +	i915_sf_submitting          = (1 << 1),
> > > +};
> > These must be I915_UPPERCASE_SOMETHING by convention.
> > 
> > > 
> > > +
> > > +bool i915_scheduler_is_enabled(struct drm_device *dev);
> > > +int i915_scheduler_init(struct drm_device *dev);
> > > +void i915_scheduler_clean_node(struct i915_scheduler_queue_entry
> > > *node);
> > > +int i915_scheduler_queue_execbuffer(struct
> > > i915_scheduler_queue_entry *qe);
> > > +bool i915_scheduler_notify_request(struct drm_i915_gem_request
> > > *req);
> > > +void i915_scheduler_wakeup(struct drm_device *dev);
> > > +
> > > +#endif  /* _I915_SCHEDULER_H_ */
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation