[Intel-gfx] [PATCH v3 17/28] drm/i915: Convert 'trace_irq' to use requests rather than seqnos
Daniel Vetter
daniel at ffwll.ch
Wed Nov 26 19:23:36 CET 2014
On Wed, Nov 26, 2014 at 02:58:49PM +0000, John Harrison wrote:
> On 26/11/2014 14:42, Daniel Vetter wrote:
> >On Wed, Nov 26, 2014 at 3:12 PM, John Harrison
> ><John.C.Harrison at intel.com> wrote:
> >>>>diff --git a/drivers/gpu/drm/i915/i915_drv.h
> >>>>b/drivers/gpu/drm/i915/i915_drv.h
> >>>>index 8bfdac6..831fae2 100644
> >>>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>>@@ -3130,4 +3130,11 @@ wait_remaining_ms_from_jiffies(unsigned long
> >>>>timestamp_jiffies, int to_wait_ms)
> >>>> }
> >>>> }
> >>>> +static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
> >>>>+ struct drm_i915_gem_request *req)
> >>>>+{
> >>>>+ if (ring->trace_irq_req == NULL && ring->irq_get(ring))
> >>>>+ i915_gem_request_assign(&ring->trace_irq_req, req);
> >>>This looks a bit suspiciuos. Thus far ring->trace_irq_req was essentially
> >>>ot protected at all by anything. But that was nothing troublesome since we
> >>>didn't hang a real resource of it.
> >>>
> >>>But now there's a refcounted request in that pointer, which means if we
> >>>race we leak. I'll skip this patch for now.
> >>>-Daniel
> >>
> >>Race how? The assignment only ever occurs from inside execbuffer submission
> >>at which point the driver mutex lock is held. Therefore it is very
> >>definitely protected. Not doing the reference count means that there is now
> >>the possibility of a dangling pointer and thus the possibility of going bang
> >>with a kernel oops.
> >Hm, ->trace_irq_seqno is indeed always touched from the a calling
> >context with dev->struct_mutex held. Somehow I've misrembered that
> >since the realtime/tracing folks are really freaked out about what
> >we're doing here. But from that pov your patch doesn't really make
> >things worse, so I'll pull it in.
> >
> >Btw I don't see the oops really without this patch. What would blow up?
> >-Daniel
>
> The sole access (and clear to null) of the trace pointer is done from retire
> requests after the requests have been retired. Thus the request structure
> could have just been freed immediately before it is used. The code could be
> re-ordered to be safer but I'm not entirely sure what the trace pointer is
> for or what it might potentially be used for in the future. With the
> reference counting, the ordering is irrelevant. If the pointer exists then
> it is safe to use.
>
> The point is that anywhere that keeps a copy of a request pointer really
> should reference count that copy. Otherwise there is the possibility that
> the pointer could become stale. Either now or with future code changes. If
> the copy is always done with the request_assign() function then the pointer
> is guaranteed safe for all time.
Oh, I guess you've misunderstood what I've done. Ive dropped the entire
patch here, not just the refcounting. Dropping the refcounting alone is
obviously oops-worthy.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list