[Intel-gfx] [PATCH v3 17/28] drm/i915: Convert 'trace_irq' to use requests rather than seqnos

John Harrison John.C.Harrison at Intel.com
Wed Nov 26 15:58:49 CET 2014


On 26/11/2014 14:42, Daniel Vetter wrote:
> On Wed, Nov 26, 2014 at 3:12 PM, John Harrison
> <John.C.Harrison at intel.com> wrote:
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>>> b/drivers/gpu/drm/i915/i915_drv.h
>>>> index 8bfdac6..831fae2 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -3130,4 +3130,11 @@ wait_remaining_ms_from_jiffies(unsigned long
>>>> timestamp_jiffies, int to_wait_ms)
>>>>          }
>>>>    }
>>>>    +static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
>>>> +                                     struct drm_i915_gem_request *req)
>>>> +{
>>>> +       if (ring->trace_irq_req == NULL && ring->irq_get(ring))
>>>> +               i915_gem_request_assign(&ring->trace_irq_req, req);
>>> This looks a bit suspiciuos. Thus far ring->trace_irq_req was essentially
>>> ot protected at all by anything. But that was nothing troublesome since we
>>> didn't hang a real resource of it.
>>>
>>> But now there's a refcounted request in that pointer, which means if we
>>> race we leak. I'll skip this patch for now.
>>> -Daniel
>>
>> Race how? The assignment only ever occurs from inside execbuffer submission
>> at which point the driver mutex lock is held. Therefore it is very
>> definitely protected. Not doing the reference count means that there is now
>> the possibility of a dangling pointer and thus the possibility of going bang
>> with a kernel oops.
> Hm, ->trace_irq_seqno is indeed always touched from the a calling
> context with dev->struct_mutex held. Somehow I've misrembered that
> since the realtime/tracing folks are really freaked out about what
> we're doing here. But from that pov your patch doesn't really make
> things worse, so I'll pull it in.
>
> Btw I don't see the oops really without this patch. What would blow up?
> -Daniel

The sole access (and clear to null) of the trace pointer is done from 
retire requests after the requests have been retired. Thus the request 
structure could have just been freed immediately before it is used. The 
code could be re-ordered to be safer but I'm not entirely sure what the 
trace pointer is for or what it might potentially be used for in the 
future. With the reference counting, the ordering is irrelevant. If the 
pointer exists then it is safe to use.

The point is that anywhere that keeps a copy of a request pointer really 
should reference count that copy. Otherwise there is the possibility 
that the pointer could become stale. Either now or with future code 
changes. If the copy is always done with the request_assign() function 
then the pointer is guaranteed safe for all time.




More information about the Intel-gfx mailing list