[Intel-gfx] [PATCH v4 07/38] drm/i915: Start of GPU scheduler
John Harrison
John.C.Harrison at Intel.com
Thu Feb 18 14:22:56 UTC 2016
On 20/01/2016 13:18, Joonas Lahtinen wrote:
> Hi,
>
> Comments below this pre text.
>
> Many of the comments are related to the indent and style of the code.
> That stuff is important to fix for future maintainability. In order for
> the future review to be more effective, I'd like to next see a v5 of
> the series where the code quality concerns have been addressed, patches
> squashed to be actual reviewable chunks and appropriate kerneldoc being
> added.
>
> To give an idea of proper slicing of patches, first produce a no-op
> scheduler, adding the extra function calls where needed and still
> keeping the scheduling completely linear. Second patch could introduce
> out of order submitting, third one priority bumping, fourth pre-empting
> and so on. That way, each patch extends the functionality and is itself
> already mergeable. That way I've been able to go through and understand
> the existing code, and I can actually review (other than just nag about
> indent and coding style) if the changes are appropriate to bring in the
> functionality desired.
>
> In the current split, for me or anyone who did not participate writing
> the code, it is otherwise too confusing to try to guess what future
> changes might make each piece of code make sense, and which will be
> redundant in the future too. There is no value in splitting code to
> chunks that are not itself functional.
>
> Regards, Joonas
>
> On Mon, 2016-01-11 at 18:42 +0000, John.C.Harrison at Intel.com wrote:
>> From: John Harrison <John.C.Harrison at Intel.com>
>>
>> Initial creation of scheduler source files. Note that this patch
>> implements most of the scheduler functionality but does not hook it
>> in
>> to the driver yet. It also leaves the scheduler code in 'pass
>> through'
>> mode so that even when it is hooked in, it will not actually do very
>> much. This allows the hooks to be added one at a time in bite size
>> chunks and only when the scheduler is finally enabled at the end does
>> anything start happening.
>>
>> The general theory of operation is that when batch buffers are
>> submitted to the driver, the execbuffer() code packages up all the
>> information required to execute the batch buffer at a later time.
>> This
>> package is given over to the scheduler which adds it to an internal
>> node list. The scheduler also scans the list of objects associated
>> with the batch buffer and compares them against the objects already
>> in
>> use by other buffers in the node list. If matches are found then the
>> new batch buffer node is marked as being dependent upon the matching
>> node. The same is done for the context object. The scheduler also
>> bumps up the priority of such matching nodes on the grounds that the
>> more dependencies a given batch buffer has the more important it is
>> likely to be.
>>
>> The scheduler aims to have a given (tuneable) number of batch buffers
>> in flight on the hardware at any given time. If fewer than this are
>> currently executing when a new node is queued, then the node is
>> passed
>> straight through to the submit function. Otherwise it is simply added
>> to the queue and the driver returns back to user land.
>>
>> The scheduler is notified when each batch buffer completes and
>> updates
>> its internal tracking accordingly. At the end of the completion
>> interrupt processing, if any scheduler tracked batches were
>> processed,
>> the scheduler's deferred worker thread is woken up. This can do more
>> involved processing such as actually removing completed nodes from
>> the
>> queue and freeing up the resources associated with them (internal
>> memory allocations, DRM object references, context reference, etc.).
>> The work handler also checks the in flight count and calls the
>> submission code if a new slot has appeared.
>>
>> When the scheduler's submit code is called, it scans the queued node
>> list for the highest priority node that has no unmet dependencies.
>> Note that the dependency calculation is complex as it must take
>> inter-ring dependencies and potential preemptions into account. Note
>> also that in the future this will be extended to include external
>> dependencies such as the Android Native Sync file descriptors and/or
>> the linux dma-buff synchronisation scheme.
>>
>> If a suitable node is found then it is sent to execbuff_final() for
>> submission to the hardware. The in flight count is then re-checked
>> and
>> a new node popped from the list if appropriate.
>>
>> Note that this patch does not implement pre-emptive scheduling. Only
>> basic scheduling by re-ordering batch buffer submission is currently
>> implemented. Pre-emption of actively executing batch buffers comes in
>> the next patch series.
>>
>> v2: Changed priority levels to +/-1023 due to feedback from Chris
>> Wilson.
>>
>> Removed redundant index from scheduler node.
>>
>> Changed time stamps to use jiffies instead of raw monotonic. This
>> provides lower resolution but improved compatibility with other i915
>> code.
>>
>> Major re-write of completion tracking code due to struct fence
>> conversion. The scheduler no longer has it's own private IRQ handler
>> but just lets the existing request code handle completion events.
>> Instead, the scheduler now hooks into the request notify code to be
>> told when a request has completed.
>>
>> Reduced driver mutex locking scope. Removal of scheduler nodes no
>> longer grabs the mutex lock.
>>
>> v3: Refactor of dependency generation to make the code more readable.
>> Also added in read-read optimisation support - i.e., don't treat a
>> shared read-only buffer as being a dependency.
>>
>> Allowed the killing of queued nodes rather than only flying ones.
>>
>> v4: Updated the commit message to better reflect the current state of
>> the code. Downgraded some BUG_ONs to WARN_ONs. Used the correct array
>> memory allocator function (kmalloc_array instead of kmalloc).
>> Corrected the format of some comments. Wrapped some lines differently
>> to keep the style checker happy.
>>
>> Fixed a WARN_ON when killing nodes. The dependency removal code
>> checks
>> that nodes being destroyed do not have any oustanding dependencies
>> (which would imply they should not have been executed yet). In the
>> case of nodes being destroyed, e.g. due to context banning, then this
>> might well be the case - they have not been executed and do indeed
>> have outstanding dependencies.
>>
>> Re-instated the code to disble interrupts when not in use. The
>> underlying problem causing broken IRQ reference counts seems to have
>> been fixed now.
>>
>> For: VIZ-1587
>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>> ---
>> drivers/gpu/drm/i915/Makefile | 1 +
>> drivers/gpu/drm/i915/i915_drv.h | 4 +
>> drivers/gpu/drm/i915/i915_gem.c | 5 +
>> drivers/gpu/drm/i915/i915_scheduler.c | 797
>> ++++++++++++++++++++++++++++++++++
>> drivers/gpu/drm/i915/i915_scheduler.h | 91 ++++
>> 5 files changed, 898 insertions(+)
>> create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
>> create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>> index 15398c5..79cb38b 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -10,6 +10,7 @@ ccflags-y := -Werror
>> i915-y := i915_drv.o \
>> i915_irq.o \
>> i915_params.o \
>> + i915_scheduler.o \
>> i915_suspend.o \
>> i915_sysfs.o \
>> intel_csr.o \
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 10dffdd..38f423b 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1695,6 +1695,8 @@ struct i915_execbuffer_params {
>> struct drm_i915_gem_request *request;
>> };
>>
>> +struct i915_scheduler;
>> +
> Rather add "i915_scheduler.h" include at the top and eliminate circular
> include dependencies. This is needed for the next comment.
>
>> /* used in computing the new watermarks state */
>> struct intel_wm_config {
>> unsigned int num_pipes_active;
>> @@ -1947,6 +1949,8 @@ struct drm_i915_private {
>>
>> struct i915_runtime_pm pm;
>>
>> + struct i915_scheduler *scheduler;
>> +
> As the scheduler is going to be enabled on all platforms to an extent,
> no point in making it a pointer. Just making it member like "pm" is the
> best, this also requires the above include change.
>
>> /* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
>> struct {
>> int (*execbuf_submit)(struct i915_execbuffer_params *params,
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index cff3768..47aa85b 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -37,6 +37,7 @@
>> #include
>> #include
>> #include
>> +#include "i915_scheduler.h"
>>
> This should go before any <linux/...> not to mask missing includes from
> the header itself, so correct place would be right after #include
> "i915_trace.h"
>
>> #define RQ_BUG_ON(expr)
>>
>> @@ -5242,6 +5243,10 @@ int i915_gem_init(struct drm_device *dev)
>> */
>> intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>>
>> + ret = i915_scheduler_init(dev);
>> + if (ret)
>> + goto out_unlock;
>> +
>> ret = i915_gem_init_userptr(dev);
>> if (ret)
>> goto out_unlock;
>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
>> new file mode 100644
>> index 0000000..8cb9063
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
>> @@ -0,0 +1,797 @@
>> +/*
>> + * Copyright (c) 2014 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include "i915_drv.h"
>> +#include "intel_drv.h"
>> +#include "i915_scheduler.h"
> Again, this include should be the top one, should not require any
> includes before it.
>
>> +
>> +static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node);
>> +static int i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
>> + struct i915_scheduler_queue_entry *remove);
>> +static int i915_scheduler_submit(struct intel_engine_cs *ring,
>> + bool is_locked);
>> +static uint32_t i915_scheduler_count_flying(struct i915_scheduler *scheduler,
>> + struct intel_engine_cs *ring);
>> +static void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler);
>> +static int i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
>> + struct i915_scheduler_queue_entry *target,
>> + uint32_t bump);
> Do not indent the function names like this, it becomes unmaintainable
> and messy very fast if somebody adds a new function with a more complex
> return type, which is a very likely thing to happen.
>
> What I would do is move all the helper functions here at the top and
> order them so that the forward declarations are not needed, at least
> i915_scheduler_fly_node is not used before its definition.
>
>> +
>> +int i915_scheduler_init(struct drm_device *dev)
>> +{
>> + struct drm_i915_private *dev_priv = dev->dev_private;
>> + struct i915_scheduler *scheduler = dev_priv->scheduler;
>> + int r;
> Again, this indent is a no-go. I'll not mention it on further
> functions, assume it to be fixed for next revision.
>
>> +
>> + if (scheduler)
>> + return 0;
>> +
>> + scheduler = kzalloc(sizeof(*scheduler), GFP_KERNEL);
>> + if (!scheduler)
>> + return -ENOMEM;
>> +
>> + spin_lock_init(&scheduler->lock);
>> +
>> + for (r = 0; r < I915_NUM_RINGS; r++)
>> + INIT_LIST_HEAD(&scheduler->node_queue[r]);
>> +
>> + /* Default tuning values: */
>> + scheduler->priority_level_min = -1023;
>> + scheduler->priority_level_max = 1023;
>> + scheduler->priority_level_preempt = 900;
>> + scheduler->min_flying = 2;
> This kind of indent is tolerable because it is a contained code block,
> but not needed either.
>
>> +
>> + dev_priv->scheduler = scheduler;
>> +
>> + return 0;
>> +}
>> +
>> +static void i915_generate_dependencies(struct i915_scheduler *scheduler,
>> + struct i915_scheduler_queue_entry *node,
>> + uint32_t ring)
>> +{
>> + struct i915_scheduler_obj_entry *this, *that;
>> + struct i915_scheduler_queue_entry *test;
>> + int i, j;
>> + bool found;
>> +
>> + list_for_each_entry(test, &scheduler->node_queue[ring], link) {
>> + if (I915_SQS_IS_COMPLETE(test))
>> + continue;
>> +
>> + /*
>> + * Batches on the same ring for the same
>> + * context must be kept in order.
>> + */
>> + found = (node->params.ctx == test->params.ctx) &&
>> + (node->params.ring == test->params.ring);
>> +
>> + if (found)
>> + goto depends;
>> +
> This is not needed.
Yeah, this code block has been re-worked a number of times with things
being added and/or removed. I guess it had gotten a bit messy by now.
>
>> + /*
>> + * Batches working on the same objects must
>> + * be kept in order.
>> + */
>> + for (i = 0; (i < node->num_objs) && !found; i++) {
> As the test is here already ---^
>
>> + this = node->saved_objects + i;
>> +
>> + for (j = 0; j < test->num_objs; j++) {
>> + that = test->saved_objects + j;
>> +
>> + if (this->obj != that->obj)
>> + continue;
> How about VMAs? There might be multiple mappings to an object, isn't it
> enough to depend on the required VMA instead of the whole object?
The object is what we get coming in from user land through the IOCTL. So
why make things more complicated? If there are multiple VMAs referring
to the same object then we can't just track an individual VMA as that
would loose the dependency on all the other VMAs. Just because the
object is mapped to someone else's address space doesn't mean that this
batch buffer can't overwrite data they are reading.
>
>> +
>> + /* Only need to worry about writes */
>> + if (this->read_only && that->read_only)
>> + continue;
>> +
>> + found = true;
>> + break;
>> + }
>> + }
>> +
> The following block is not needed.
>
>> + if (!found)
>> + continue;
>> +
>> +depends:
> Rather like this, in order to avoid a goto label;
> if (found) {
> ...
>
>
>> + node->dep_list[node->num_deps] = test;
>> + node->num_deps++;
>> + }
>> +}
>> +
> Please add a brief kerneldoc above each function in the header, it's
> required. Adding it to non-trivial inline helper functions too will
> make reviewing much easier.
>
>> +int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
>> +{
>> + struct drm_i915_private *dev_priv = qe->params.dev->dev_private;
>> + struct i915_scheduler *scheduler = dev_priv->scheduler;
>> + struct intel_engine_cs *ring = qe->params.ring;
>> + struct i915_scheduler_queue_entry *node;
>> + struct i915_scheduler_queue_entry *test;
>> + unsigned long flags;
>> + bool not_flying;
>> + int i, r;
>> + int incomplete = 0;
>> +
>> + WARN_ON(!scheduler);
>> +
> This kind of situations should have a be a BUG_ON, because scheduler
> being zero is literally going to cause an OOPS in the next dereference
> which is going to happen unconditionally. WARN + OOPS is kind of what
> BUG_ON should be used avoid. But this should be removed anyway after
> scheduler is made a data member of dev_priv.
The WARNs were originally BUGs but Daniel Vetter had the opposite
demand. His view was the driver should never BUG under any
circumstances. A WARN followed by an oops is better than a BUG because
maybe it won't actually oops.
>
>> + if (1/*i915.scheduler_override & i915_so_direct_submit*/) {
> I assume this is going to be addressed in a future commit. Could have
> been introduced in this patch, too.
>
>> + int ret;
>> +
>> + scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
>> + ret = dev_priv->gt.execbuf_final(&qe->params);
>> + scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
>> +
> The kerneldoc should mention locking requirements of this function.
>
>> + /*
>> + * Don't do any clean up on failure because the caller will
>> + * do it all anyway.
>> + */
>> + if (ret)
>> + return ret;
>> +
>> + /* Free everything that is owned by the QE structure: */
>> + kfree(qe->params.cliprects);
>> + if (qe->params.dispatch_flags & I915_DISPATCH_SECURE)
>> + i915_gem_execbuff_release_batch_obj(qe->params.batch_obj);
>> +
>> + return 0;
> Above piece of code looks like its own function, so it should probably
> be one.
>
>> + }
>> +
>> + node = kmalloc(sizeof(*node), GFP_KERNEL);
>> + if (!node)
>> + return -ENOMEM;
>> +
>> + *node = *qe;
> Any reason we can't simply move ownership of qe? If not, I'd rather
> make a clone function
The qe pointer passed in is a reference to a stack local object in the
execbuff code path. Thus ownership cannot be transferred. Doing it this
way keeps the execbuff code nice and simple and all the dynamic memory
management and list tracking is self contained within the scheduler.
>
>> + INIT_LIST_HEAD(&node->link);
>> + node->status = i915_sqs_queued;
>> + node->stamp = jiffies;
>> + i915_gem_request_reference(node->params.request);
>> +
>> + /* Need to determine the number of incomplete entries in the list as
>> + * that will be the maximum size of the dependency list.
>> + *
>> + * Note that the allocation must not be made with the spinlock acquired
>> + * as kmalloc can sleep. However, the unlock/relock is safe because no
>> + * new entries can be queued up during the unlock as the i915 driver
>> + * mutex is still held. Entries could be removed from the list but that
>> + * just means the dep_list will be over-allocated which is fine.
>> + */
>> + spin_lock_irqsave(&scheduler->lock, flags);
>> + for (r = 0; r < I915_NUM_RINGS; r++) {
>> + list_for_each_entry(test, &scheduler->node_queue[r], link) {
>> + if (I915_SQS_IS_COMPLETE(test))
>> + continue;
>> +
>> + incomplete++;
>> + }
>> + }
>> +
>> + /* Temporarily unlock to allocate memory: */
>> + spin_unlock_irqrestore(&scheduler->lock, flags);
> I'd make the above piece of code a helper, these stats are to be
> counted for debugfs anyway, too?
The debugfs code counts all states, not just complete, and requires a
scheduler stats object. Also, the above explanation about locking is not
applicable in the debugfs case, only here. And it makes much more sense
to me to keep the comment and the code all together in one very obvious
and understandable block than splitting it up.
>
>> + if (incomplete) {
>> + node->dep_list = kmalloc_array(incomplete,
>> + sizeof(*node->dep_list),
>> + GFP_KERNEL);
>> + if (!node->dep_list) {
>> + kfree(node);
>> + return -ENOMEM;
>> + }
>> + } else
>> + node->dep_list = NULL;
>> +
>> + spin_lock_irqsave(&scheduler->lock, flags);
>> + node->num_deps = 0;
>> +
>> + if (node->dep_list) {
>> + for (r = 0; r < I915_NUM_RINGS; r++)
>> + i915_generate_dependencies(scheduler, node, r);
>> +
>> + WARN_ON(node->num_deps > incomplete);
> Wouldn't this condition again lead to a crash? If so, should be BUG_ON
> to cause that crash as early as possible. WARN_ON is only good if there
> is a way of coping with the situation and no imminent system crash is
> bound to happen.
>
>> + }
>> +
>> + if (node->priority > scheduler->priority_level_max)
>> + node->priority = scheduler->priority_level_max;
>> + else if (node->priority < scheduler->priority_level_min)
>> + node->priority = scheduler->priority_level_min;
>> +
> There is clamp_val macro in linux/kernel.h .
>
>> + if ((node->priority > 0) && node->num_deps) {
>> + i915_scheduler_priority_bump_clear(scheduler);
>> +
>> + for (i = 0; i < node->num_deps; i++)
>> + i915_scheduler_priority_bump(scheduler,
>> + node->dep_list[i], node->priority);
>> + }
>> +
>> + list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
>> +
>> + not_flying = i915_scheduler_count_flying(scheduler, ring) <
>> + scheduler->min_flying;
>> +
>> + spin_unlock_irqrestore(&scheduler->lock, flags);
>> +
>> + if (not_flying)
>> + i915_scheduler_submit(ring, true);
>> +
>> + return 0;
>> +}
>> +
>> +static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
>> +{
>> + struct drm_i915_private *dev_priv = node->params.dev->dev_private;
>> + struct i915_scheduler *scheduler = dev_priv->scheduler;
>> + struct intel_engine_cs *ring;
>> +
>> + WARN_ON(!scheduler);
>> + WARN_ON(!node);
>> + WARN_ON(node->status != i915_sqs_popped);
> Other states had their I915_SQS_IS_* macro, why some don't?
The purpose of the macro is to allow the combining of individual states
into classes. E.g. dead and complete can both be considered complete for
the majority of cases. Only in certain situations do you need to know
that it really was dead. Hence most places that don't really care just
use the merged macros, whereas places like this that do care use the
explicit enum value.
>
>> +
>> + ring = node->params.ring;
>> +
>> + /*
>> + * Add the node (which should currently be in state popped) to the
>> + * front of the queue. This ensure that flying nodes are always held
>> + * in hardware submission order.
>> + */
>> + list_add(&node->link, &scheduler->node_queue[ring->id]);
>> +
>> + node->status = i915_sqs_flying;
>> +
>> + if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
>> + bool success = true;
>> +
>> + success = ring->irq_get(ring);
>> + if (success)
>> + scheduler->flags[ring->id] |= i915_sf_interrupts_enabled;
>> + else
>> + return -EINVAL;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Nodes are considered valid dependencies if they are queued on any ring or
>> + * if they are in flight on a different ring. In flight on the same ring is no
>> + * longer interesting for non-premptive nodes as the ring serialises execution.
>> + * For pre-empting nodes, all in flight dependencies are valid as they must not
>> + * be jumped by the act of pre-empting.
>> + *
>> + * Anything that is neither queued nor flying is uninteresting.
>> + */
>> +static inline bool i915_scheduler_is_dependency_valid(
>> + struct i915_scheduler_queue_entry *node, uint32_t idx)
>> +{
>> + struct i915_scheduler_queue_entry *dep;
>> +
>> + dep = node->dep_list[idx];
>> + if (!dep)
>> + return false;
>> +
>> + if (I915_SQS_IS_QUEUED(dep))
>> + return true;
>> +
>> + if (I915_SQS_IS_FLYING(dep)) {
>> + if (node->params.ring != dep->params.ring)
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>> +
>> +static uint32_t i915_scheduler_count_flying(struct i915_scheduler *scheduler,
>> + struct intel_engine_cs *ring)
>> +{
>> + struct i915_scheduler_queue_entry *node;
>> + uint32_t flying = 0;
>> +
>> + list_for_each_entry(node, &scheduler->node_queue[ring->id], link)
> This loop keeps popping up, it could use a define similar to the ones
> in i915_drv.h ;
>
> #define for_each_hpd_pin(__pin) \
>
>> + if (I915_SQS_IS_FLYING(node))
>> + flying++;
>> +
>> + return flying;
>> +}
>> +
>> +/*
>> + * Add a popped node back in to the queue. For example, because the ring was
>> + * hung when execfinal() was called and thus the ring submission needs to be
>> + * retried later.
>> + */
>> +static void i915_scheduler_node_requeue(struct i915_scheduler_queue_entry *node)
>> +{
>> + WARN_ON(!node);
> if (WARN_ON(!node))
> return;
>
> Or rather no check at all, it's going to crash anyway even in the
> calling function if there's NULL, and it's internal function. It's
> relevant to check if the userspace.
>
>> + WARN_ON(!I915_SQS_IS_FLYING(node));
>> +
>> + /* Seqno will be reassigned on relaunch */
>> + node->params.request->seqno = 0;
>> + node->status = i915_sqs_queued;
>> +}
>> +
>> +/*
>> + * Give up on a node completely. For example, because it is causing the
>> + * ring to hang or is using some resource that no longer exists.
>> + */
>> +static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node)
>> +{
>> + WARN_ON(!node);
>> + WARN_ON(I915_SQS_IS_COMPLETE(node));
>> +
>> + node->status = i915_sqs_dead;
>> +}
>> +
>> +/*
>> + * A sequence number has popped out of the hardware and the request handling
>> + * code has mapped it back to a request and will mark that request complete.
>> + * It also calls this function to notify the scheduler about the completion
>> + * so the scheduler's node can be updated appropriately.
>> + * Returns true if the request is scheduler managed, false if not.
>> + */
>> +bool i915_scheduler_notify_request(struct drm_i915_gem_request *req)
>> +{
>> + struct drm_i915_private *dev_priv = to_i915(req->ring->dev);
>> + struct i915_scheduler *scheduler = dev_priv->scheduler;
>> + /* XXX: Need to map back from request to node */
>> + struct i915_scheduler_queue_entry *node = NULL;
>> + unsigned long flags;
>> +
>> + if (!node)
>> + return false;
> Not so sure if slicing the series down to an extent that functions are
> impossible to review, was a good idea. Idea is to slice things down, to
> reviewable pieces. It's hard to predict or keep looking forward the
> series what is going to come.
>
>> +
>> + spin_lock_irqsave(&scheduler->lock, flags);
>> +
>> + WARN_ON(!I915_SQS_IS_FLYING(node));
>> +
>> + /* Node was in flight so mark it as complete. */
>> + if (req->cancelled)
>> + node->status = i915_sqs_dead;
>> + else
>> + node->status = i915_sqs_complete;
>> +
>> + spin_unlock_irqrestore(&scheduler->lock, flags);
>> +
>> + /*
>> + * XXX: If the in-flight list is now empty then new work should be
>> + * submitted. However, this function is called from interrupt context
>> + * and thus cannot acquire mutex locks and other such things that are
>> + * necessary for fresh submission.
>> + */
>> +
>> + return true;
>> +}
>> +
>> +int i915_scheduler_remove(struct intel_engine_cs *ring)
>> +{
>> + struct drm_i915_private *dev_priv = ring->dev->dev_private;
>> + struct i915_scheduler *scheduler = dev_priv->scheduler;
>> + struct i915_scheduler_queue_entry *node, *node_next;
>> + unsigned long flags;
>> + int flying = 0, queued = 0;
>> + int ret = 0;
>> + bool do_submit;
>> + uint32_t min_seqno;
>> + struct list_head remove;
>> +
>> + if (list_empty(&scheduler->node_queue[ring->id]))
>> + return 0;
>> +
>> + spin_lock_irqsave(&scheduler->lock, flags);
>> +
>> + /* /i915_scheduler_dump_locked(ring, "remove/pre");/ */
>> +
> This should not be here at all.
Oops. Old debug code. Hadn't spotted it was still there.
>
>> + /*
>> + * In the case where the system is idle, starting 'min_seqno' from a big
>> + * number will cause all nodes to be removed as they are now back to
>> + * being in-order. However, this will be a problem if the last one to
>> + * complete was actually out-of-order as the ring seqno value will be
>> + * lower than one or more completed buffers. Thus code looking for the
>> + * completion of said buffers will wait forever.
>> + * Instead, use the hardware seqno as the starting point. This means
>> + * that some buffers might be kept around even in a completely idle
>> + * system but it should guarantee that no-one ever gets confused when
>> + * waiting for buffer completion.
>> + */
>> + min_seqno = ring->get_seqno(ring, true);
>> +
>> + list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
>> + if (I915_SQS_IS_QUEUED(node))
>> + queued++;
>> + else if (I915_SQS_IS_FLYING(node))
>> + flying++;
>> + else if (I915_SQS_IS_COMPLETE(node))
>> + continue;
>> +
>> + if (node->params.request->seqno == 0)
>> + continue;
>> +
>> + if (!i915_seqno_passed(node->params.request->seqno, min_seqno))
>> + min_seqno = node->params.request->seqno;
>> + }
> Couldn't these values be kept cached, instead of counting them at each
> function?
The 'queued' and flying totals could be kept cached but min_seqno is
dependent upon the state of the hardware so needs to be recalculated. In
which case calculating the totals here is trivial and avoids having
extra code elsewhere to keep them up to date.
>
>> +
>> + INIT_LIST_HEAD(&remove);
>> + list_for_each_entry_safe(node, node_next, &scheduler->node_queue[ring->id], link) {
>> + /*
>> + * Only remove completed nodes which have a lower seqno than
>> + * all pending nodes. While there is the possibility of the
>> + * ring's seqno counting backwards, all higher buffers must
>> + * be remembered so that the 'i915_seqno_passed()' test can
>> + * report that they have in fact passed.
>> + *
>> + * NB: This is not true for 'dead' nodes. The GPU reset causes
>> + * the software seqno to restart from its initial value. Thus
>> + * the dead nodes must be removed even though their seqno values
>> + * are potentially vastly greater than the current ring seqno.
>> + */
>> + if (!I915_SQS_IS_COMPLETE(node))
>> + continue;
>> +
>> + if (node->status != i915_sqs_dead) {
>> + if (i915_seqno_passed(node->params.request->seqno, min_seqno) &&
>> + (node->params.request->seqno != min_seqno))
>> + continue;
>> + }
>> +
>> + list_del(&node->link);
>> + list_add(&node->link, &remove);
>> +
>> + /* Strip the dependency info while the mutex is still locked */
>> + i915_scheduler_remove_dependent(scheduler, node);
>> +
>> + continue;
>> + }
>> +
>> + /*
>> + * Release the interrupt reference count if there are no longer any
>> + * nodes to worry about.
>> + */
>> + if (!flying && !queued &&
>> + (scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
>> + ring->irq_put(ring);
>> + scheduler->flags[ring->id] &= ~i915_sf_interrupts_enabled;
>> + }
>> +
>> + /* Launch more packets now? */
>> + do_submit = (queued > 0) && (flying < scheduler->min_flying);
>> +
>> + spin_unlock_irqrestore(&scheduler->lock, flags);
>> +
>> + if (!do_submit && list_empty(&remove))
>> + return ret;
>> +
>> + mutex_lock(&ring->dev->struct_mutex);
>> +
>> + if (do_submit)
>> + ret = i915_scheduler_submit(ring, true);
> Confusing to have this at a remove function. Function naming needs to
> be reconsidered or moved out from here.
>
>> +
>> + while (!list_empty(&remove)) {
>> + node = list_first_entry(&remove, typeof(*node), link);
>> + list_del(&node->link);
>> +
>> + /*
>> + * The batch buffer must be unpinned before it is unreferenced
>> + * otherwise the unpin fails with a missing vma!?
>> + */priority_bump_clear
>> + if (node->params.dispatch_flags & I915_DISPATCH_SECURE)
>> + i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
>> +
>> + /* Free everything that is owned by the node: */
>> + i915_gem_request_unreference(node->params.request);
>> + kfree(node->params.cliprects);
>> + kfree(node->dep_list);
>> + kfree(node);
>> + }
>> +
>> + mutex_unlock(&ring->dev->struct_mutex);
>> +
>> + return ret;
>> +}
>> +
>> +static void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler)
>> +{
>> + struct i915_scheduler_queue_entry *node;
>> + int i;
>> +
>> + /*
>> + * Ensure circular dependencies don't cause problems and that a bump
>> + * by object usage only bumps each using buffer once:
>> + */
>> + for (i = 0; i < I915_NUM_RINGS; i++) {
>> + list_for_each_entry(node, &scheduler->node_queue[i], link)
>> + node->bumped = false;
>> + }
>> +}
>> +
>> +static int i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
>> + struct i915_scheduler_queue_entry *target,
>> + uint32_t bump)
>> +{
>> + uint32_t new_priority;
>> + int i, count;
>> +
>> + if (target->priority >= scheduler->priority_level_max)
>> + return 1;
>> +
>> + if (target->bumped)
>> + return 0;
>> +
>> + new_priority = target->priority + bump;
>> + if ((new_priority <= target->priority) ||
>> + (new_priority > scheduler->priority_level_max))
>> + target->priority = scheduler->priority_level_max;
>> + else
>> + target->priority = new_priority;
>> +
>> + count = 1;
>> + target->bumped = true;
>> +
>> + for (i = 0; i < target->num_deps; i++) {
>> + if (!target->dep_list[i])
>> + continue;
>> +
>> + if (target->dep_list[i]->bumped)
>> + continue;
>> +
>> + count += i915_scheduler_priority_bump(scheduler,
>> + target->dep_list[i],
>> + bump);
>> + }
>> +
>> + return count;
>> +}
>> +
>> +static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
>> + struct i915_scheduler_queue_entry **pop_node,
>> + unsigned long *flags)
>> +{
>> + struct drm_i915_private *dev_priv = ring->dev->dev_private;
>> + struct i915_scheduler *scheduler = dev_priv->scheduler;
>> + struct i915_scheduler_queue_entry *best;
>> + struct i915_scheduler_queue_entry *node;
>> + int ret;
>> + int i;
>> + bool any_queued;
>> + bool has_local, has_remote, only_remote;
>> +
>> + *pop_node = NULL;
>> + ret = -ENODATA;
>> +
>> + any_queued = false;
>> + only_remote = false;
>> + best = NULL;
> These should just be initialized in-place. But looking at the code
> forward.
>
>> +
>> + list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
>> + if (!I915_SQS_IS_QUEUED(node))
>> + continue;
>> + any_queued = true;
>> +
>> + has_local = false;
>> + has_remote = false;
>> + for (i = 0; i < node->num_deps; i++) {
>> + if (!i915_scheduler_is_dependency_valid(node, i))
>> + continue;
>> +
>> + if (node->dep_list[i]->params.ring == node->params.ring)
>> + has_local = true;
>> + else
>> + has_remote = true;
>> + }
>> +
>> + if (has_remote && !has_local)
>> + only_remote = true;
>> +
>> + if (!has_local && !has_remote) {
>> + if (!best ||
>> + (node->priority > best->priority))
>> + best = node;
>> + }
>> + }
>> +
>> + if (best) {
>> + list_del(&best->link);
>> +
>> + INIT_LIST_HEAD(&best->link);
>> + best->status = i915_sqs_popped;
>> +
>> + ret = 0;
>> + } else {
>> + /* Can only get here if:
>> + * (a) there are no buffers in the queue
>> + * (b) all queued buffers are dependent on other buffers
>> + * e.g. on a buffer that is in flight on a different ring
>> + */
>> + if (only_remote) {
>> + /* The only dependent buffers are on another ring. */
>> + ret = -EAGAIN;
>> + } else if (any_queued) {
>> + /* It seems that something has gone horribly wrong! */
>> + DRM_ERROR("Broken dependency tracking on ring %d!\n",
>> + (int) ring->id);
>> + }
>> + }
>> +
>> + /* i915_scheduler_dump_queue_pop(ring, best); */
>> +
>> + *pop_node = best;
>> + return ret;
>> +}
>> +
>> +static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
>> +{
>> + struct drm_device *dev = ring->dev;
>> + struct drm_i915_private *dev_priv = dev->dev_private;
>> + struct i915_scheduler *scheduler = dev_priv->scheduler;
>> + struct i915_scheduler_queue_entry *node;
>> + unsigned long flags;
>> + int ret = 0, count = 0;
>> +
>> + if (!was_locked) {
>> + ret = i915_mutex_lock_interruptible(dev);
>> + if (ret)
>> + return ret;
>> + }
>> +
> I don't really fancy this construct. Should be moved outside of this
> function for proper lockdep tracking.
>
>> + WARN_ON(!mutex_is_locked(&dev->struct_mutex));
>> +
>> + spin_lock_irqsave(&scheduler->lock, flags);
>> +
>> + /* First time around, complain if anything unexpected occurs: */
>> + ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
>> + if (ret) {
>> + spin_unlock_irqrestore(&scheduler->lock, flags);
>> +
>> + if (!was_locked)
>> + mutex_unlock(&dev->struct_mutex);
>> +
>> + return ret;
>> + }
>> +
> Dropping the was_locked stuff, this should become a proper goto error
> label. e.g. out_unlock
>
>> + do {
>> + WARN_ON(!node);
>> + WARN_ON(node->params.ring != ring);
>> + WARN_ON(node->status != i915_sqs_popped);
>> + count++;
>> +
>> + /*
>> + * The call to pop above will have removed the node from the
>> + * list. So add it back in and mark it as in flight.
>> + */
>> + i915_scheduler_fly_node(node);
> Why do we want to pull an object out of the list inside spin lock and
> push it back immediately in our critical code path? Seems like a waste
> for no obvious gain at this point. Why do not we rather just select an
> entry and modify it in-place, if it's going to stay in the same queue
> anyway.
The list order is significant. The element must be moved to the front to
keep the submitted items in submission order. Doing it this way also
keeps the code nicely partitioned and easier to understand/maintain.
Plus, there is a plan to optimise the code by splitting the one single
list into three separate ones - queued, flying, complete. If/when that
happens, the element will have to be removed from one list and added to
another.
>
>> +
>> + scheduler->flags[ring->id] |= i915_sf_submitting;
>> + spin_unlock_irqrestore(&scheduler->lock, flags);
>> + ret = dev_priv->gt.execbuf_final(&node->params);
>> + spin_lock_irqsave(&scheduler->lock, flags);
>> + scheduler->flags[ring->id] &= ~i915_sf_submitting;
>> +
>> + if (ret) {
>> + int requeue = 1;
> Multipurpose variable, not really a good idea. And as commented
> further, should not exist at all.
>
>> +
>> + /*
>> + * Oh dear! Either the node is broken or the ring is
>> + * busy. So need to kill the node or requeue it and try
>> + * again later as appropriate.
>> + */
>> +
>> + switch (-ret) {
>> + case ENODEV:
>> + case ENOENT:
>> + /* Fatal errors. Kill the node. */
>> + requeue = -1;
>> + break;
> "break" indent is wrong.
>
>> +
>> + case EAGAIN:
>> + case EBUSY:
>> + case EIO:
>> + case ENOMEM:
>> + case ERESTARTSYS:
>> + case EINTR:
>> + /* Supposedly recoverable errors. */
>> + break;
>> +
>> + default:
>> + /*
>> + * Assume the error is recoverable and hope
>> + * for the best.
>> + */
>> + DRM_DEBUG_DRIVER("<%s> Got unexpected error from execfinal(): %d!\n",
>> + ring->name, ret);
> There's MISSING_CASE macro, should use it.
>
>> + break;
>> + }
>> +
> Just move the code below this point to the switch, no point having a
> switch to categorize your options and then doing bunch of ifs to
> execute code that could be in switch.
One of the 'if' paths is to break out of the while loop. Can't do that
from inside the switch.
>> + /*
>> + * Check that the watchdog/reset code has not nuked
>> + * the node while we weren't looking:
>> + */
>> + if (node->status == i915_sqs_dead)
>> + requeue = 0;
>> +
>> + if (requeue == 1) {
>> + i915_scheduler_node_requeue(node);
>> + /*
>> + * No point spinning if the ring is currently
>> + * unavailable so just give up and come back
>> + * later.
>> + */
>> + break;
>> + } else if (requeue == -1)
>> + i915_scheduler_node_kill(node);
>> + }
>> +
> Ending here, this actual submission of a single node could go to its
> own helper function, these functions now become too long to follow,
> although they really are not doing anything complicated.
>
>> + /* Keep launching until the sky is sufficiently full. */
>> + if (i915_scheduler_count_flying(scheduler, ring) >=
>> + scheduler->min_flying)
>> + break;
>> +
>> + ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
>> + } while (ret == 0);
>> +
>> + spin_unlock_irqrestore(&scheduler->lock, flags);
>> +
>> + if (!was_locked)
>> + mutex_unlock(&dev->struct_mutex);
>> +
>> + /* Don't complain about not being able to submit extra entries */
>> + if (ret == -ENODATA)
>> + ret = 0;
>> +
>> + return (ret < 0) ? ret : count;
> This is a combined error and success path, keeping to the convention of
> kernel drivers is preferred;
>
> if (ret != -ENODATA)
> goto out_foo;
>
> return count;
>
> out_unlock:
> spin_unlock(...);
> out_foo:
> return ret;
>
>
>
>> +}
>> +
>> +static int i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
>> + struct i915_scheduler_queue_entry *remove)
>> +{
>> + struct i915_scheduler_queue_entry *node;
>> + int i, r;
>> + int count = 0;
>> +
>> + /*
>> + * Ensure that a node is not being removed which is still dependent
>> + * upon other (not completed) work. If that happens, it implies
>> + * something has gone very wrong with the dependency tracking! Note
>> + * that there is no need to worry if this node has been explicitly
>> + * killed for some reason - it might be being killed before it got
>> + * sent to the hardware.
>> + */
>> + if (remove->status != i915_sqs_dead) {
>> + for (i = 0; i < remove->num_deps; i++)
>> + if ((remove->dep_list[i]) &&
>> + (!I915_SQS_IS_COMPLETE(remove->dep_list[i])))
>> + count++;
>> + WARN_ON(count);
>> + }
>> +
>> + /*
>> + * Remove this node from the dependency lists of any other node which
>> + * might be waiting on it.
>> + */
>> + for (r = 0; r < I915_NUM_RINGS; r++) {
>> + list_for_each_entry(node, &scheduler->node_queue[r], link) {
>> + for (i = 0; i < node->num_deps; i++) {
>> + if (node->dep_list[i] != remove)
>> + continue;
>> +
>> + node->dep_list[i] = NULL;
>> + }
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
>> new file mode 100644
>> index 0000000..00dc7f3
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
>> @@ -0,0 +1,91 @@
>> +/*
>> + * Copyright (c) 2014 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef _I915_SCHEDULER_H_
>> +#define _I915_SCHEDULER_H_
>> +
>> +enum i915_scheduler_queue_status {
>> + /* Limbo: */
>> + i915_sqs_none = 0,
>> + /* Not yet submitted to hardware: */
>> + i915_sqs_queued,
>> + /* Popped from queue, ready to fly: */
>> + i915_sqs_popped,
>> + /* Sent to hardware for processing: */
>> + i915_sqs_flying,
>> + /* Finished processing on the hardware: */
>> + i915_sqs_complete,
>> + /* Killed by watchdog or catastrophic submission failure: */
>> + i915_sqs_dead,
>> + /* Limit value for use with arrays/loops */
>> + i915_sqs_MAX
>> +};
> These should be UPPERCASE_FOR_ENUM_VALUES . See i915_drv.h for samples.
>
>> +
>> +#define I915_SQS_IS_QUEUED(node) (((node)->status == i915_sqs_queued))
>> +#define I915_SQS_IS_FLYING(node) (((node)->status == i915_sqs_flying))
>> +#define I915_SQS_IS_COMPLETE(node) (((node)->status == i915_sqs_complete) || \
>> + ((node)->status == i915_sqs_dead))
>> +
> Might be slightly confusing that name is IS_COMPLETE and there is
> actual COMPLETE value. Rather have the test like IS_DONE and then test
> for COMPLETE or DEAD, no confusion. Also, some states have their IS_*
> macro and others don't, is there going to be more?
>
>> +struct i915_scheduler_obj_entry {
>> + struct drm_i915_gem_object *obj;
>> + bool read_only;
>> +};
>> +
>> +struct i915_scheduler_queue_entry {
>> + struct i915_execbuffer_params params;
>> + /* -1023 = lowest priority, 0 = default, 1023 = highest */
>> + int32_t priority;
>> + struct i915_scheduler_obj_entry *saved_objects;
>> + int num_objs;
>> + bool bumped;
>> + struct i915_scheduler_queue_entry **dep_list;
>> + int num_deps;
>> + enum i915_scheduler_queue_status status;
>> + unsigned long stamp;
>> + struct list_head link;
>> +};
>> +
>> +struct i915_scheduler {
>> + struct list_head node_queue[I915_NUM_RINGS];
>> + uint32_t flags[I915_NUM_RINGS];
>> + spinlock_t lock;
>> +
>> + /* Tuning parameters: */
>> + int32_t priority_level_min;
>> + int32_t priority_level_max;
>> + int32_t priority_level_preempt;
>> + uint32_t min_flying;
>> +};
>> +
>> +/* Flag bits for i915_scheduler::flags */
>> +enum {
>> + i915_sf_interrupts_enabled = (1 << 0),
>> + i915_sf_submitting = (1 << 1),
> Again, should be uppercase. Also, enums to the beginning of file.
>
>> +};
>> +
>> +int i915_scheduler_init(struct drm_device *dev);
>> +int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
>> +bool i915_scheduler_notify_request(struct drm_i915_gem_request *req);
>> +
>> +#endif /* _I915_SCHEDULER_H_ */
More information about the Intel-gfx
mailing list