[Intel-gfx] [PATCH 01/45] drm/i915: Seal races between async GPU cancellation, retirement and signaling
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Apr 26 10:40:12 UTC 2019
On 25/04/2019 11:42, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-04-25 11:35:01)
>>
>> On 25/04/2019 10:19, Chris Wilson wrote:
>>> Currently there is an underlying assumption that i915_request_unsubmit()
>>> is synchronous wrt the GPU -- that is the request is no longer in flight
>>> as we remove it. In the near future that may change, and this may upset
>>> our signaling as we can process an interrupt for that request while it
>>> is no longer in flight.
>>>
>>> CPU0 CPU1
>>> intel_engine_breadcrumbs_irq
>>> (queue request completion)
>>> i915_request_cancel_signaling
>>> ... ...
>>> i915_request_enable_signaling
>>> dma_fence_signal
>>>
>>> Hence in the time it took us to drop the lock to signal the request, a
>>> preemption event may have occurred and re-queued the request. In the
>>> process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
>>> so reused the rq->signal_link that was in use on CPU0, leading to bad
>>> pointer chasing in intel_engine_breadcrumbs_irq.
>>>
>>> A related issue was that if someone started listening for a signal on a
>>> completed but no longer in-flight request, we missed the opportunity to
>>> immediately signal that request.
>>>
>>> Furthermore, as intel_contexts may be immediately released during
>>> request retirement, in order to be entirely sure that
>>> intel_engine_breadcrumbs_irq may no longer dereference the intel_context
>>> (ce->signals and ce->signal_link), we must wait for irq spinlock.
>>>
>>> In order to prevent the race, we use a bit in the fence.flags to signal
>>> the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
>>> For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
>>> quickly signals to any outside observer that the fence is indeed signaled.
>>>
>>> Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>> drivers/dma-buf/dma-fence.c | 1 +
>>> drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 58 +++++++++++++--------
>>> drivers/gpu/drm/i915/i915_request.c | 1 +
>>> 3 files changed, 39 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index 3aa8733f832a..9bf06042619a 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -29,6 +29,7 @@
>>>
>>> EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
>>> EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
>>> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
>>>
>>> static DEFINE_SPINLOCK(dma_fence_stub_lock);
>>> static struct dma_fence dma_fence_stub;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>>> index 3cbffd400b1b..4283224249d4 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>>> @@ -23,6 +23,7 @@
>>> */
>>>
>>> #include <linux/kthread.h>
>>> +#include <trace/events/dma_fence.h>
>>> #include <uapi/linux/sched/types.h>
>>>
>>> #include "i915_drv.h"
>>> @@ -83,6 +84,7 @@ static inline bool __request_completed(const struct i915_request *rq)
>>> void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>>> {
>>> struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>> + const ktime_t timestamp = ktime_get();
>>> struct intel_context *ce, *cn;
>>> struct list_head *pos, *next;
>>> LIST_HEAD(signal);
>>> @@ -104,6 +106,11 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>>>
>>> GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
>>> &rq->fence.flags));
>>> + clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>>> +
>>> + if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>>> + &rq->fence.flags))
>>> + continue;
>>>
>>> /*
>>> * Queue for execution after dropping the signaling
>>> @@ -111,14 +118,6 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>>> * more signalers to the same context or engine.
>>> */
>>> i915_request_get(rq);
>>> -
>>> - /*
>>> - * We may race with direct invocation of
>>> - * dma_fence_signal(), e.g. i915_request_retire(),
>>> - * so we need to acquire our reference to the request
>>> - * before we cancel the breadcrumb.
>>> - */
>>> - clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>>> list_add_tail(&rq->signal_link, &signal);
>>> }
>>>
>>> @@ -140,8 +139,21 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>>> list_for_each_safe(pos, next, &signal) {
>>> struct i915_request *rq =
>>> list_entry(pos, typeof(*rq), signal_link);
>>> + struct dma_fence_cb *cur, *tmp;
>>> +
>>> + trace_dma_fence_signaled(&rq->fence);
>>> +
>>> + rq->fence.timestamp = timestamp;
>>> + set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &rq->fence.flags);
>>> +
>>> + spin_lock(&rq->lock);
>>> + list_for_each_entry_safe(cur, tmp, &rq->fence.cb_list, node) {
>>> + INIT_LIST_HEAD(&cur->node);
>>> + cur->func(&rq->fence, cur);
>>> + }
>>> + INIT_LIST_HEAD(&rq->fence.cb_list);
>>> + spin_unlock(&rq->lock);
>>>
>>> - dma_fence_signal(&rq->fence);
>>
>> I posted some comments on this patch already. In essence it was a
>> suggestion to not open-code-and-optimize dma_fence_signal, but split it
>> into two low-level helpers and export from the parent location. Like
>> __dma_fence_maybe/start_signal and __dma_fence_finish_signal.
>>
>> bool __dma_fence_start_signal(...)
>> {
>> return test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>> fence->flags);
>> }
>> EXPORT_SYMBOL(...)
>>
>> void __dma_fence_finish_signal(...)
>> {
>> trace_dma_fence_signaled...
>> timestamp...
>> callbacks...
>> }
>> EXPORT_SYMBOL(...)
>>
>> This way we don't add coupling to low level implementation details in i915.
>
> You mean midlayer implementation details :-p
>
> The alternative I mentioned was that we use a redundant bit. But that
> has duplicity of purpose.
>
> The patch needs to be stable-friendly for the fixes, so I opted for
> taking control.
I might buy it if you put a big fat comment this is open-coded
dma_fence_singal and follow up with a patch to move code back into the
midlayer following the stable marked patch. ;)
Regards,
Tvrtko
More information about the Intel-gfx
mailing list