[Intel-gfx] [PATCH v3 1/6] drm/i915: Fix request locking during error capture & debugfs dump
John Harrison
john.c.harrison at intel.com
Fri Jan 20 23:06:02 UTC 2023
On 1/19/2023 07:16, Andy Shevchenko wrote:
> On Wed, Jan 18, 2023 at 10:49:55PM -0800, John.C.Harrison at Intel.com wrote:
>> From: John Harrison <John.C.Harrison at Intel.com>
>>
>> When GuC support was added to error capture, the locking around the
>> request object was broken. Fix it up.
>>
>> The context based search manages the spinlocking around the search
>> internally. So it needs to grab the reference count internally as
>> well. The execlist only request based search relies on external
>> locking, so it needs an external reference count but within the
>> spinlock not outside it.
>>
>> The only other caller of the context based search is the code for
>> dumping engine state to debugfs. That code wasn't previously getting
>> an explicit reference at all as it does everything while holding the
>> execlist specific spinlock. So, that needs updaing as well as that
>> spinlock doesn't help when using GuC submission. Rather than trying to
>> conditionally get/put depending on submission model, just change it to
>> always do the get/put.
>>
>> In addition, intel_guc_find_hung_context() was not acquiring the
>> correct spinlock before searching the request list. So fix that up
>> too. While at it, add some extra whitespace padding for readability.
> ...
>
>> + found = false;
>> + spin_lock(&ce->guc_state.lock);
>> list_for_each_entry(rq, &ce->guc_state.requests, sched.link) {
>> if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE)
>> continue;
>>
>> + found = true;
>> + break;
>> + }
> This can be combined to (see also below)
>
> list_for_each_entry(rq, &ce->guc_state.requests, sched.link) {
> if (i915_test_request_state(rq) == I915_REQUEST_ACTIVE)
> break;
> }
>
>> + spin_unlock(&ce->guc_state.lock);
> Instead of 'found' you can check the current entry pointer
>
> if (!list_entry_is_head(...))
>
> And because requests can only be messed up with the guc_state itself, I think
> you don't need to perform the above check under spinlock, so it's safe.
I'm not following the argument as to why it is safe to test a guc_state
owned list outside of holding the guc_state spinlock.
I also think that having an explicit 'found' flag makes the code more
readable and immediately obvious as to what is going on. For the sake of
one bool (which the compiler would optimise out anyway), I don't think
it is worth the obfuscation of behaviour and the risk of "I think this
will work".
John.
>
>> + if (found) {
>> intel_engine_set_hung_context(engine, ce);
More information about the Intel-gfx
mailing list