[Intel-gfx] [PATCH] drm/i915/request: fix early tracepoints

Matthew Auld matthew.auld at intel.com
Thu Sep 9 16:16:55 UTC 2021


On 08/09/2021 18:38, Daniel Vetter wrote:
> On Fri, Sep 03, 2021 at 12:24:05PM +0100, Matthew Auld wrote:
>> Currently we blow up in trace_dma_fence_init, when calling into
>> get_driver_name or get_timeline_name, since both the engine and context
>> might be NULL(or contain some garbage address) in the case of newly
>> allocated slab objects via the request ctor. Note that we also use
>> SLAB_TYPESAFE_BY_RCU here, which allows requests to be immediately
>> freed, but delay freeing the underlying page by an RCU grace period.
>> With this scheme requests can be re-allocated, at the same time as they
>> are also being read by some lockless RCU lookup mechanism.
>>
>> One possible fix, since we don't yet have a fully initialised request
>> when in the ctor, is just setting the context/engine as NULL and adding
>> some extra handling in get_driver_name etc. And since the ctor is only
>> called for new slab objects(i.e allocate new page and call the ctor for
>> each object) it's safe to reset the context/engine prior to calling into
>> dma_fence_init, since we can be certain that no one is doing an RCU
>> lookup which might depend on peeking at the engine/context, like in
>> active_engine(), since the object can't yet be externally visible.
>>
>> In the recycled case(which might also be externally visible) the request
>> refcount always transitions from 0->1 after we set the context/engine
>> etc, which should ensure it's valid to dereference the engine for
>> example, when doing an RCU list-walk, so long as we can also increment
>> the refcount first. If the refcount is already zero, then the request is
>> considered complete/released.  If it's non-zero, then the request might
>> be in the process of being re-allocated, or potentially still in flight,
>> however after successfully incrementing the refcount, it's possible to
>> carefully inspect the request state, to determine if the request is
>> still what we were looking for. Note that all externally visible
>> requests returned to the cache must have zero refcount.
> 
> The commit message here is a bit confusing, since you start out with
> describing a solution that you're not actually implementing it. I usually
> do this by putting alternate solutions at the bottom, starting with "An
> alternate solution would be ..." or so.
> 
> And then closing with why we don't do that, here it would be that we do
> no longer have a need for these partially set up i915_requests, and
> therefore just reverting that complication is the simplest solution.
> 
>> An alternative fix then is to instead move the dma_fence_init out from
>> the request ctor. Originally this was how it was done, but it was moved
>> in:
>>
>> commit 855e39e65cfc33a73724f1cc644ffc5754864a20
>> Author: Chris Wilson <chris at chris-wilson.co.uk>
>> Date:   Mon Feb 3 09:41:48 2020 +0000
>>
>>      drm/i915: Initialise basic fence before acquiring seqno
>>
>> where it looks like intel_timeline_get_seqno() relied on some of the
>> rq->fence state, but that is no longer the case since:
>>
>> commit 12ca695d2c1ed26b2dcbb528b42813bd0f216cfc
>> Author: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>> Date:   Tue Mar 23 16:49:50 2021 +0100
>>
>>      drm/i915: Do not share hwsp across contexts any more, v8.
>>
>> intel_timeline_get_seqno() could also be cleaned up slightly by dropping
>> the request argument.
>>
>> Moving dma_fence_init back out of the ctor, should ensure we have enough
>> of the request initialised in case of trace_dma_fence_init.
>> Functionally this should be the same, and is effectively what we were
>> already open coding before, except now we also assign the fence->lock
>> and fence->ops, but since these are invariant for recycled
>> requests(which might be externally visible), and will therefore already
>> hold the same value, it shouldn't matter. We still leave the
>> spin_lock_init() in the ctor, since we can't re-init the rq->lock in
>> case it is already held.
> 
> Holding rq->lock without having a full reference to it sounds like really
> bad taste. I think it would be good to have a (kerneldoc) comment next to
> i915_request.lock about this, with a FIXME. But separate patch.

Sorry, I guess I just meant that we can't blindly move the lock_init() 
i.e if for some unknown reason we moved it out from the ctor then it 
needs to go before the ref transitions from 0->1. Touching rq->lock 
should require the full ref. I'll reword.

> 
>> Fixes: 855e39e65cfc ("drm/i915: Initialise basic fence before acquiring seqno")
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> Cc: Michael Mason <michael.w.mason at intel.com>
>> Cc: Daniel Vetter <daniel at ffwll.ch>
> 
> With the commit message restructured a bit, and assuming this one actually
> works:
> 
> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> 
> But I'm really not confident :-(
> -Daniel
> 
>> ---
>>   drivers/gpu/drm/i915/i915_request.c | 11 ++---------
>>   1 file changed, 2 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>> index ce446716d092..79da5eca60af 100644
>> --- a/drivers/gpu/drm/i915/i915_request.c
>> +++ b/drivers/gpu/drm/i915/i915_request.c
>> @@ -829,8 +829,6 @@ static void __i915_request_ctor(void *arg)
>>   	i915_sw_fence_init(&rq->submit, submit_notify);
>>   	i915_sw_fence_init(&rq->semaphore, semaphore_notify);
>>   
>> -	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock, 0, 0);
>> -
>>   	rq->capture_list = NULL;
>>   
>>   	init_llist_head(&rq->execute_cb);
>> @@ -905,17 +903,12 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
>>   	rq->ring = ce->ring;
>>   	rq->execution_mask = ce->engine->mask;
>>   
>> -	kref_init(&rq->fence.refcount);
>> -	rq->fence.flags = 0;
>> -	rq->fence.error = 0;
>> -	INIT_LIST_HEAD(&rq->fence.cb_list);
>> -
>>   	ret = intel_timeline_get_seqno(tl, rq, &seqno);
>>   	if (ret)
>>   		goto err_free;
>>   
>> -	rq->fence.context = tl->fence_context;
>> -	rq->fence.seqno = seqno;
>> +	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
>> +		       tl->fence_context, seqno);
>>   
>>   	RCU_INIT_POINTER(rq->timeline, tl);
>>   	rq->hwsp_seqno = tl->hwsp_seqno;
>> -- 
>> 2.26.3
>>
> 


More information about the Intel-gfx mailing list