[Intel-gfx] [RFC 7/9] drm/i915: Interrupt driven fences
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Mon Jul 27 04:33:02 PDT 2015
Hi,
On 07/17/2015 03:31 PM, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> The intended usage model for struct fence is that the signalled status should be
> set on demand rather than polled. That is, there should not be a need for a
> 'signaled' function to be called everytime the status is queried. Instead,
> 'something' should be done to enable a signal callback from the hardware which
> will update the state directly. In the case of requests, this is the seqno
> update interrupt. The idea is that this callback will only be enabled on demand
> when something actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback scheme.
> Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke me' list
> when a new seqno pops out and signals any matching fence/request. The fence is
> then removed from the list so the entire request stack does not need to be
> scanned every time. Note that the fence is added to the list before the commands
> to generate the seqno interrupt are added to the ring. Thus the sequence is
> guaranteed to be race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> called). Thus there is still a potential race when enabling the interrupt as the
> request may already have completed. However, this is simply solved by calling
> the interrupt processing code immediately after enabling the interrupt and
> thereby checking for already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel outstanding
> requests (e.g. because TDR has reset the ring). These requests will never get
> signalled and so must be removed from the signal list manually. This is done by
> setting a 'cancelled' flag and then calling the regular notify/retire code path
> rather than attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the cancellation
> request might occur after/during the completion interrupt actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer requiring the
> mutex lock.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 8 ++
> drivers/gpu/drm/i915/i915_gem.c | 132 +++++++++++++++++++++++++++++---
> drivers/gpu/drm/i915/i915_irq.c | 2 +
> drivers/gpu/drm/i915/intel_lrc.c | 1 +
> drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
> drivers/gpu/drm/i915/intel_ringbuffer.h | 1 +
> 6 files changed, 136 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 61c3db2..d7f1aa5 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2163,7 +2163,11 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
> struct drm_i915_gem_request {
> /** Underlying object for implementing the signal/wait stuff. */
> struct fence fence;
> + struct list_head signal_list;
> + struct list_head unsignal_list;
In addition to what Daniel said (one list_head looks enough) it is
customary to call it _link.
> struct list_head delay_free_list;
> + bool cancelled;
> + bool irq_enabled;
>
> /** On Which ring this request was generated */
> struct drm_i915_private *i915;
> @@ -2241,6 +2245,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
> struct drm_i915_gem_request **req_out);
> void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>
> +void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req);
> +void i915_gem_request_notify(struct intel_engine_cs *ring);
> +
> int i915_create_fence_timeline(struct drm_device *dev,
> struct intel_context *ctx,
> struct intel_engine_cs *ring);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 482835a..7c589a9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1222,6 +1222,11 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> if (list_empty(&req->list))
> return 0;
>
> + /*
> + * Enable interrupt completion of the request.
> + */
> + i915_gem_request_enable_interrupt(req);
> +
> if (i915_gem_request_completed(req))
> return 0;
>
> @@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> list_del_init(&request->list);
> i915_gem_request_remove_from_client(request);
>
> + /* In case the request is still in the signal pending list */
> + if (!list_empty(&request->signal_list))
> + request->cancelled = true;
> +
> i915_gem_request_unreference(request);
> }
>
> @@ -2534,6 +2543,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
> */
> request->postfix = intel_ring_get_tail(ringbuf);
>
> + /*
> + * Add the fence to the pending list before emitting the commands to
> + * generate a seqno notification interrupt.
> + */
> + i915_gem_request_submit(request);
> +
> if (i915.enable_execlists)
> ret = ring->emit_request(request);
> else {
> @@ -2653,6 +2668,9 @@ static void i915_gem_request_free(struct drm_i915_gem_request *req)
> i915_gem_context_unreference(ctx);
> }
>
> + if (req->irq_enabled)
> + req->ring->irq_put(req->ring);
> +
We get here with interrupts still enabled only if userspace is
abandoning a wait on an unsignaled fence, did I get that right?
> kmem_cache_free(req->i915->requests, req);
> }
>
> @@ -2668,24 +2686,105 @@ static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
> return req->ring->name;
> }
>
> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +/*
> + * The request has been submitted to the hardware so add the fence to the
> + * list of signalable fences.
> + *
> + * NB: This does not enable interrupts yet. That only occurs on demand when
> + * the request is actually waited on. However, adding it to the list early
> + * ensures that there is no race condition where the interrupt could pop
> + * out prematurely and thus be completely lost. The race is merely that the
> + * interrupt must be manually checked for after being enabled.
> + */
> +void i915_gem_request_submit(struct drm_i915_gem_request *req)
> {
> - /* Interrupt driven fences are not implemented yet.*/
> - WARN(true, "This should not be called!");
> - return true;
> + fence_enable_sw_signaling(&req->fence);
> }
>
> -static bool i915_gem_request_is_completed(struct fence *req_fence)
> +/*
> + * The request is being actively waited on, so enable interrupt based
> + * completion signalling.
> + */
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req)
> +{
> + if (req->irq_enabled)
> + return;
> +
> + WARN_ON(!req->ring->irq_get(req->ring));
> + req->irq_enabled = true;
req->irq_enabled manipulations look racy. Here and in request free it is
protected by struct_mutex, but that is not held in
i915_gem_request_notify. Initial feeling is you should use
ring->fence_lock everyplace you query/manipulate req->irq_enabled.
> +
> + /*
> + * Because the interrupt is only enabled on demand, there is a race
> + * where the interrupt can fire before anyone is looking for it. So
> + * do an explicit check for missed interrupts.
> + */
> + i915_gem_request_notify(req->ring);
> +}
> +
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> {
> struct drm_i915_gem_request *req = container_of(req_fence,
> typeof(*req), fence);
> +
> + i915_gem_request_reference(req);
> + WARN_ON(!list_empty(&req->signal_list));
It looks very unsafe to proceed normally after this WARN_ON. It should
probably return false here to preserve data structure sanity.
> + list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
> +
> + /*
> + * Note that signalling is always enabled for every request before
> + * that request is submitted to the hardware. Therefore there is
> + * no race condition whereby the signal could pop out before the
> + * request has been added to the list. Hence no need to check
> + * for completion, undo the list add and return false.
> + *
> + * NB: Interrupts are only enabled on demand. Thus there is still a
> + * race where the request could complete before the interrupt has
> + * been enabled. Thus care must be taken at that point.
> + */
> +
> + return true;
> +}
> +
> +void i915_gem_request_notify(struct intel_engine_cs *ring)
> +{
> + struct drm_i915_gem_request *req, *req_next;
> + unsigned long flags;
> u32 seqno;
> + LIST_HEAD(free_list);
>
> - BUG_ON(req == NULL);
> + if (list_empty(&ring->fence_signal_list))
> + return;
> +
> + seqno = ring->get_seqno(ring, false);
> +
> + spin_lock_irqsave(&ring->fence_lock, flags);
> + list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_list) {
> + if (!req->cancelled) {
> + if (!i915_seqno_passed(seqno, req->seqno))
> + continue;
>
> - seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
> + fence_signal_locked(&req->fence);
> + }
> +
> + list_del_init(&req->signal_list);
I haven't managed to figure out why is this apparently removing requests
which have not been signalled from the signal_list?
Shouldn't they be moved to free_list only if i915_seqno_passed?
> + if (req->irq_enabled) {
> + req->ring->irq_put(req->ring);
> + req->irq_enabled = false;
> + }
>
> - return i915_seqno_passed(seqno, req->seqno);
> + /* Can't unreference here because that might grab fence_lock */
> + list_add_tail(&req->unsignal_list, &free_list);
> + }
> + spin_unlock_irqrestore(&ring->fence_lock, flags);
> +
> + /* It should now be safe to actually free the requests */
> + while (!list_empty(&free_list)) {
> + req = list_first_entry(&free_list,
> + struct drm_i915_gem_request, unsignal_list);
> + list_del(&req->unsignal_list);
> +
> + i915_gem_request_unreference(req);
> + }
> }
>
> static void i915_fence_timeline_value_str(struct fence *fence, char *str, int size)
> @@ -2711,7 +2810,6 @@ static const struct fence_ops i915_gem_request_fops = {
> .get_driver_name = i915_gem_request_get_driver_name,
> .get_timeline_name = i915_gem_request_get_timeline_name,
> .enable_signaling = i915_gem_request_enable_signaling,
> - .signaled = i915_gem_request_is_completed,
> .wait = fence_default_wait,
> .release = i915_gem_request_release,
> .fence_value_str = i915_fence_value_str,
> @@ -2791,6 +2889,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
> goto err;
> }
>
> + INIT_LIST_HEAD(&req->signal_list);
> fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
> ctx->engine[ring->id].fence_timeline.fence_context,
> i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
> @@ -2913,6 +3012,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>
> i915_gem_request_retire(request);
> }
> +
> + /*
> + * Make sure any requests that were on the signal pending list get
> + * cleaned up.
> + */
> + i915_gem_request_notify(ring);
> + i915_gem_retire_requests_ring(ring);
Would i915_gem_retire_requests_ring be enough given how it calls
i915_gem_request_notify itself as the first thing below?
> }
>
> void i915_gem_restore_fences(struct drm_device *dev)
> @@ -2968,6 +3074,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
> {
> WARN_ON(i915_verify_lists(ring->dev));
>
> + /*
> + * If no-one has waited on a request recently then interrupts will
> + * not have been enabled and thus no requests will ever be marked as
> + * completed. So do an interrupt check now.
> + */
> + i915_gem_request_notify(ring);
> +
> /* Retire requests first as we use it above for the early return.
> * If we retire requests last, we may use a later seqno and so clear
> * the requests lists without clearing the active list, leading to
> @@ -5345,6 +5458,7 @@ init_ring_lists(struct intel_engine_cs *ring)
> {
> INIT_LIST_HEAD(&ring->active_list);
> INIT_LIST_HEAD(&ring->request_list);
> + INIT_LIST_HEAD(&ring->fence_signal_list);
> INIT_LIST_HEAD(&ring->delayed_free_list);
> }
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index d87f173..e446509 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -853,6 +853,8 @@ static void notify_ring(struct intel_engine_cs *ring)
>
> trace_i915_gem_request_notify(ring);
>
> + i915_gem_request_notify(ring);
> +
How many requests are typically on signal_list on some typical
workloads? This could be a significant performance change since on every
user interrupt it would talk it all potentially only removing one
request at a time.
These are just review comments on this particular patch without thinking
yet of the bigger design questions Daniel has raised.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list