[Intel-gfx] [PATCH] drm/i915: Keep all engine locks across scheduling
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Mon Mar 27 11:39:38 UTC 2017
On 27/03/2017 11:31, Chris Wilson wrote:
> On Mon, Mar 27, 2017 at 11:11:47AM +0100, Tvrtko Ursulin wrote:
>>
>> On 26/03/2017 09:46, Chris Wilson wrote:
>>> Unlocking is dangerous. In this case we combine an early update to the
>>> out-of-queue request, because we know that it will be inserted into the
>>> correct FIFO priority-ordered slot when it becomes ready in the future.
>>> However, given sufficient enthusiasm, it may become ready as we are
>>> continuing to reschedule, and so may gazump the FIFO if we have since
>>> dropped its spinlock. The result is that it may be executed too early,
>>> before its dependees.
>>>
>>> Fixes: 20311bd35060 ("drm/i915/scheduler: Execute requests in order of priorities")
>>> Testcase: igt/gem_exec_whisper
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> Cc: <stable at vger.kernel.org> # v4.10+
>>> ---
>>> drivers/gpu/drm/i915/intel_lrc.c | 54 +++++++++++++++++++++++++++-------------
>>> 1 file changed, 37 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index dd0e9d587852..3fdabba0a32d 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -658,30 +658,47 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
>>> spin_unlock_irqrestore(&engine->timeline->lock, flags);
>>> }
>>>
>>> -static struct intel_engine_cs *
>>> -pt_lock_engine(struct i915_priotree *pt, struct intel_engine_cs *locked)
>>> +static inline struct intel_engine_cs *
>>> +pt_lock_engine(struct i915_priotree *pt, unsigned long *locked)
>>> {
>>> - struct intel_engine_cs *engine;
>>> -
>>> - engine = container_of(pt,
>>> - struct drm_i915_gem_request,
>>> - priotree)->engine;
>>> - if (engine != locked) {
>>> - if (locked)
>>> - spin_unlock_irq(&locked->timeline->lock);
>>> - spin_lock_irq(&engine->timeline->lock);
>>> - }
>>> + struct intel_engine_cs *engine =
>>> + container_of(pt, struct drm_i915_gem_request, priotree)->engine;
>>> +
>>> + /* Locking the engines in a random order will rightfully trigger a
>>> + * spasm in lockdep. However, we can ignore lockdep (by marking each
>>> + * as a seperate nesting) so long as we never nest the
>>> + * engine->timeline->lock elsewhere. Also the number of nesting
>>> + * subclasses is severely limited (7) which is going to cause an
>>> + * issue at some point.
>>> + * BUILD_BUG_ON(I915_NUM_ENGINES >= MAX_LOCKDEP_SUBCLASSES);
>>
>> Lets bite the bullet and not hide this BUILD_BUG_ON in a comment. :I
>
> The code would continue to work nevertheless, just lockdep would
> eventually give up. I like it slightly better than taking either a
> global spinlock for engine->execlists_queue insertion, or taking the
> spinlock on every engine for scheduling. How often will we reschedule
> across engines? Not sure.
I think counting on "doesn't happen often" and "it still works" falls
short of your high standards! ;) So a global execlist_queue lock if it
must be..
>>> + */
>>> + if (!__test_and_set_bit(engine->id, locked))
>>> + spin_lock_nested(&engine->timeline->lock,
>>> + hweight_long(*locked));
>>>
>>> return engine;
>>> }
>>>
>>> +static void
>>> +unlock_engines(struct drm_i915_private *i915, unsigned long locked)
>>> +{
>>> + struct intel_engine_cs *engine;
>>> + unsigned long tmp;
>>> +
>>> + for_each_engine_masked(engine, i915, locked, tmp)
>>> + spin_unlock(&engine->timeline->lock);
>>> +}
>>> +
>>> static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> {
>>> - struct intel_engine_cs *engine = NULL;
>>> + struct intel_engine_cs *engine;
>>> struct i915_dependency *dep, *p;
>>> struct i915_dependency stack;
>>> + unsigned long locked = 0;
>>> LIST_HEAD(dfs);
>>>
>>> + BUILD_BUG_ON(I915_NUM_ENGINES > BITS_PER_LONG);
>>> +
>>> if (prio <= READ_ONCE(request->priotree.priority))
>>> return;
>>>
>>> @@ -691,6 +708,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> stack.signaler = &request->priotree;
>>> list_add(&stack.dfs_link, &dfs);
>>>
>>> + GEM_BUG_ON(irqs_disabled());
>>> + local_irq_disable();
>>> +
>>
>> Why not just irqsave/restore? Sounds like too low level for this
>> position in the flow. If just optimisation it would need a comment I
>> think.
>
> It was because we are not taking the spin lock/unlock inside the same
> block, so it felt dangerous. Who holds the irqflags?
Hm yes, it cannot be made elegant.
>>> /* Recursively bump all dependent priorities to match the new request.
>>> *
>>> * A naive approach would be to use recursion:
>>> @@ -719,7 +739,7 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> if (!RB_EMPTY_NODE(&pt->node))
>>> continue;
>>>
>>> - engine = pt_lock_engine(pt, engine);
>>> + engine = pt_lock_engine(pt, &locked);
>>>
>>> /* If it is not already in the rbtree, we can update the
>>> * priority inplace and skip over it (and its dependencies)
>>> @@ -737,7 +757,7 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>>
>>> INIT_LIST_HEAD(&dep->dfs_link);
>>>
>>> - engine = pt_lock_engine(pt, engine);
>>> + engine = pt_lock_engine(pt, &locked);
>>>
>>> if (prio <= pt->priority)
>>> continue;
>>> @@ -750,8 +770,8 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> engine->execlist_first = &pt->node;
>>> }
>>>
>>> - if (engine)
>>> - spin_unlock_irq(&engine->timeline->lock);
>>> + unlock_engines(request->i915, locked);
>>> + local_irq_enable();
>>>
>>> /* XXX Do we need to preempt to make room for us and our deps? */
>>> }
>>>
>>
>> I am trying to think whether removing the skip on requests not in
>> the execution tree would work and help any.
>
> It's dangerous due to the duplicate branches in the dependency graph that
> we are resolving to generate the topological ordering. We need a way to
> do a mark-and-sweep whilst also ensuring that we end up with the correct
> order. I'm open to (better :) suggestions.
>
>> Or if the above scheme
>> is completely safe or we would need to lock atomically all engines
>> requests on which will be touched. Especially since the code is only
>> dealing with adjusting the priorities so I don't immediately see how
>> it can cause out of order execution.
>
> interrupt leading to submit_request, which wants to then insert a
> request into the execlist_queue rbtree vs ->schedule() also trying to
> manipulate the rbtree (and in this case elements currently outside of the
> rbtree). Our insertion into the rbtree ensures fifo so that we don't
> reorder the equivalent priority dependencies during ->schedule(), hence
> if we mark an out-of-rbtree request as a higher priority before
> inserting all of its dependencies into the tree, if the submit_notify
> occurs, it will insert the request into the tree before we get to insert
> its dependencies, hence reordering.
Ok I get the general idea. I don't have any better suggestions at the
moment than trying the global lock. Luckily you have just removed one
atomic from the irq handler so one step forward, two steps back. :)
Regards,
Tvrtko
More information about the Intel-gfx
mailing list