[Intel-gfx] [RFC 01/11] drm/i915: Early exit from semaphore_waits_for for execlist mode.
Daniel Vetter
daniel at ffwll.ch
Wed Jun 17 04:43:34 PDT 2015
On Tue, Jun 16, 2015 at 04:46:05PM +0100, Tomas Elf wrote:
> On 16/06/2015 14:44, Daniel Vetter wrote:
> >On Mon, Jun 08, 2015 at 06:03:19PM +0100, Tomas Elf wrote:
> >>When submitting semaphores in execlist mode the hang checker crashes in this
> >>function because it is only runnable in ring submission mode. The reason this
> >>is of particular interest to the TDR patch series is because we use semaphores
> >>as a mean to induce hangs during testing (which is the recommended way to
> >>induce hangs for gen8+). It's not clear how this is supposed to work in
> >>execlist mode since:
> >>
> >>1. This function requires a ring buffer.
> >>
> >>2. Retrieving a ring buffer in execlist mode requires us to retrieve the
> >>corresponding context, which we get from a request.
> >>
> >>3. Retieving a request from the hang checker is not straight-forward since that
> >>requires us to grab the struct_mutex in order to synchronize against the
> >>request retirement thread.
> >>
> >>4. Grabbing the struct_mutex from the hang checker is nothing that we will do
> >>since that puts us at risk of deadlock since a hung thread might be holding the
> >>struct_mutex already.
> >>
> >>Therefore it's not obvious how we're supposed to deal with this. For now, we're
> >>doing an early exit from this function, which avoids any kernel panic situation
> >>when running our own internal TDR ULT.
> >>
> >>Signed-off-by: Tomas Elf <tomas.elf at intel.com>
> >
> >We should have a Testcase: line here which mentions the igt testcase which
> >provoke this bug. Or we need to fill this gap asap.
> >-Daniel
>
> You know this better than I do: Is there an IGT test that submits a
> semaphore in execlist mode? Because that's all you need to do to reproduce
> this. We could certainly add one if there is none like that already.
Not that I know of, so needs to be added. But it's a security fix and
should go to -fixes+cc:stable since any kind of userspace could Oops the
kernel with this.
-Daniel
>
> Thanks,
> Tomas
>
> >
> >>---
> >> drivers/gpu/drm/i915/i915_irq.c | 20 ++++++++++++++++++++
> >> 1 file changed, 20 insertions(+)
> >>
> >>diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >>index 46bcbff..40c44fc 100644
> >>--- a/drivers/gpu/drm/i915/i915_irq.c
> >>+++ b/drivers/gpu/drm/i915/i915_irq.c
> >>@@ -2698,6 +2698,26 @@ semaphore_waits_for(struct intel_engine_cs *ring, u32 *seqno)
> >> u64 offset = 0;
> >> int i, backwards;
> >>
> >>+ /*
> >>+ * This function does not support execlist mode - any attempt to
> >>+ * proceed further into this function will result in a kernel panic
> >>+ * when dereferencing ring->buffer, which is not set up in execlist
> >>+ * mode.
> >>+ *
> >>+ * The correct way of doing it would be to derive the currently
> >>+ * executing ring buffer from the current context, which is derived
> >>+ * from the currently running request. Unfortunately, to get the
> >>+ * current request we would have to grab the struct_mutex before doing
> >>+ * anything else, which would be ill-advised since some other thread
> >>+ * might have grabbed it already and managed to hang itself, causing
> >>+ * the hang checker to deadlock.
> >>+ *
> >>+ * Therefore, this function does not support execlist mode in its
> >>+ * current form. Just return NULL and move on.
> >>+ */
> >>+ if (i915.enable_execlists)
> >>+ return NULL;
> >>+
> >> ipehr = I915_READ(RING_IPEHR(ring->mmio_base));
> >> if (!ipehr_is_semaphore_wait(ring->dev, ipehr))
> >> return NULL;
> >>--
> >>1.7.9.5
> >>
> >>_______________________________________________
> >>Intel-gfx mailing list
> >>Intel-gfx at lists.freedesktop.org
> >>http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list