[Intel-gfx] [PATCH 01/45] drm/i915: Seal races between async GPU cancellation, retirement and signaling

Chris Wilson chris at chris-wilson.co.uk
Thu Apr 25 10:42:03 UTC 2019


Quoting Tvrtko Ursulin (2019-04-25 11:35:01)
> 
> On 25/04/2019 10:19, Chris Wilson wrote:
> > Currently there is an underlying assumption that i915_request_unsubmit()
> > is synchronous wrt the GPU -- that is the request is no longer in flight
> > as we remove it. In the near future that may change, and this may upset
> > our signaling as we can process an interrupt for that request while it
> > is no longer in flight.
> > 
> > CPU0                                  CPU1
> > intel_engine_breadcrumbs_irq
> > (queue request completion)
> >                                       i915_request_cancel_signaling
> > ...                                   ...
> >                                       i915_request_enable_signaling
> > dma_fence_signal
> > 
> > Hence in the time it took us to drop the lock to signal the request, a
> > preemption event may have occurred and re-queued the request. In the
> > process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
> > so reused the rq->signal_link that was in use on CPU0, leading to bad
> > pointer chasing in intel_engine_breadcrumbs_irq.
> > 
> > A related issue was that if someone started listening for a signal on a
> > completed but no longer in-flight request, we missed the opportunity to
> > immediately signal that request.
> > 
> > Furthermore, as intel_contexts may be immediately released during
> > request retirement, in order to be entirely sure that
> > intel_engine_breadcrumbs_irq may no longer dereference the intel_context
> > (ce->signals and ce->signal_link), we must wait for irq spinlock.
> > 
> > In order to prevent the race, we use a bit in the fence.flags to signal
> > the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
> > For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
> > quickly signals to any outside observer that the fence is indeed signaled.
> > 
> > Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> >   drivers/dma-buf/dma-fence.c                 |  1 +
> >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 58 +++++++++++++--------
> >   drivers/gpu/drm/i915/i915_request.c         |  1 +
> >   3 files changed, 39 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 3aa8733f832a..9bf06042619a 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -29,6 +29,7 @@
> >   
> >   EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
> >   EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
> > +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
> >   
> >   static DEFINE_SPINLOCK(dma_fence_stub_lock);
> >   static struct dma_fence dma_fence_stub;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > index 3cbffd400b1b..4283224249d4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > @@ -23,6 +23,7 @@
> >    */
> >   
> >   #include <linux/kthread.h>
> > +#include <trace/events/dma_fence.h>
> >   #include <uapi/linux/sched/types.h>
> >   
> >   #include "i915_drv.h"
> > @@ -83,6 +84,7 @@ static inline bool __request_completed(const struct i915_request *rq)
> >   void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> >   {
> >       struct intel_breadcrumbs *b = &engine->breadcrumbs;
> > +     const ktime_t timestamp = ktime_get();
> >       struct intel_context *ce, *cn;
> >       struct list_head *pos, *next;
> >       LIST_HEAD(signal);
> > @@ -104,6 +106,11 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> >   
> >                       GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
> >                                            &rq->fence.flags));
> > +                     clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> > +
> > +                     if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > +                                          &rq->fence.flags))
> > +                             continue;
> >   
> >                       /*
> >                        * Queue for execution after dropping the signaling
> > @@ -111,14 +118,6 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> >                        * more signalers to the same context or engine.
> >                        */
> >                       i915_request_get(rq);
> > -
> > -                     /*
> > -                      * We may race with direct invocation of
> > -                      * dma_fence_signal(), e.g. i915_request_retire(),
> > -                      * so we need to acquire our reference to the request
> > -                      * before we cancel the breadcrumb.
> > -                      */
> > -                     clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> >                       list_add_tail(&rq->signal_link, &signal);
> >               }
> >   
> > @@ -140,8 +139,21 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> >       list_for_each_safe(pos, next, &signal) {
> >               struct i915_request *rq =
> >                       list_entry(pos, typeof(*rq), signal_link);
> > +             struct dma_fence_cb *cur, *tmp;
> > +
> > +             trace_dma_fence_signaled(&rq->fence);
> > +
> > +             rq->fence.timestamp = timestamp;
> > +             set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &rq->fence.flags);
> > +
> > +             spin_lock(&rq->lock);
> > +             list_for_each_entry_safe(cur, tmp, &rq->fence.cb_list, node) {
> > +                     INIT_LIST_HEAD(&cur->node);
> > +                     cur->func(&rq->fence, cur);
> > +             }
> > +             INIT_LIST_HEAD(&rq->fence.cb_list);
> > +             spin_unlock(&rq->lock);
> >   
> > -             dma_fence_signal(&rq->fence);
> 
> I posted some comments on this patch already. In essence it was a 
> suggestion to not open-code-and-optimize dma_fence_signal, but split it 
> into two low-level helpers and export from the parent location. Like 
> __dma_fence_maybe/start_signal and __dma_fence_finish_signal.
> 
> bool __dma_fence_start_signal(...)
> {
>         return test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>                                 fence->flags);
> }
> EXPORT_SYMBOL(...)
> 
> void __dma_fence_finish_signal(...)
> {
>         trace_dma_fence_signaled...
>         timestamp...
>         callbacks...
> }
> EXPORT_SYMBOL(...)
> 
> This way we don't add coupling to low level implementation details in i915.

You mean midlayer implementation details :-p

The alternative I mentioned was that we use a redundant bit. But that
has duplicity of purpose.

The patch needs to be stable-friendly for the fixes, so I opted for
taking control.
-Chris


More information about the Intel-gfx mailing list