[Intel-gfx] [RFC 01/11] drm/i915: Early exit from semaphore_waits_for for execlist mode.
Tomas Elf
tomas.elf at intel.com
Tue Jun 9 04:02:56 PDT 2015
On 08/06/2015 18:36, Chris Wilson wrote:
> On Mon, Jun 08, 2015 at 06:03:19PM +0100, Tomas Elf wrote:
>> When submitting semaphores in execlist mode the hang checker crashes in this
>> function because it is only runnable in ring submission mode. The reason this
>> is of particular interest to the TDR patch series is because we use semaphores
>> as a mean to induce hangs during testing (which is the recommended way to
>> induce hangs for gen8+). It's not clear how this is supposed to work in
>> execlist mode since:
>>
>> 1. This function requires a ring buffer.
>>
>> 2. Retrieving a ring buffer in execlist mode requires us to retrieve the
>> corresponding context, which we get from a request.
>>
>> 3. Retieving a request from the hang checker is not straight-forward since that
>> requires us to grab the struct_mutex in order to synchronize against the
>> request retirement thread.
>>
>> 4. Grabbing the struct_mutex from the hang checker is nothing that we will do
>> since that puts us at risk of deadlock since a hung thread might be holding the
>> struct_mutex already.
>>
>> Therefore it's not obvious how we're supposed to deal with this. For now, we're
>> doing an early exit from this function, which avoids any kernel panic situation
>> when running our own internal TDR ULT.
>>
>> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_irq.c | 20 ++++++++++++++++++++
>> 1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> index 46bcbff..40c44fc 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -2698,6 +2698,26 @@ semaphore_waits_for(struct intel_engine_cs *ring, u32 *seqno)
>> u64 offset = 0;
>> int i, backwards;
>>
>> + /*
>> + * This function does not support execlist mode - any attempt to
>> + * proceed further into this function will result in a kernel panic
>> + * when dereferencing ring->buffer, which is not set up in execlist
>> + * mode.
>> + *
>> + * The correct way of doing it would be to derive the currently
>> + * executing ring buffer from the current context, which is derived
>> + * from the currently running request. Unfortunately, to get the
>> + * current request we would have to grab the struct_mutex before doing
>> + * anything else, which would be ill-advised since some other thread
>> + * might have grabbed it already and managed to hang itself, causing
>> + * the hang checker to deadlock.
>> + *
>> + * Therefore, this function does not support execlist mode in its
>> + * current form. Just return NULL and move on.
>> + */
>> + if (i915.enable_execlists)
>> + return NULL;
>
> if (ring->buffer == NULL)
> return NULL;
>
> if (i915.enable_execlists) is a layering violation that needs to be
> erradicated.
> -Chris
>
Fair enough.
Thanks,
Tomas
More information about the Intel-gfx
mailing list