[Intel-gfx] [PATCH v9 4/6] drm/i915: Interrupt driven fences

John Harrison John.C.Harrison at Intel.com
Mon Jun 13 15:51:13 UTC 2016


On 07/06/2016 13:02, Maarten Lankhorst wrote:
> Op 02-06-16 om 15:25 schreef Tvrtko Ursulin:
>> On 01/06/16 18:07, John.C.Harrison at Intel.com wrote:
>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>
>>> The intended usage model for struct fence is that the signalled status
>>> should be set on demand rather than polled. That is, there should not
>>> be a need for a 'signaled' function to be called everytime the status
>>> is queried. Instead, 'something' should be done to enable a signal
>>> callback from the hardware which will update the state directly. In
>>> the case of requests, this is the seqno update interrupt. The idea is
>>> that this callback will only be enabled on demand when something
>>> actually tries to wait on the fence.
>>>
>>> This change removes the polling test and replaces it with the callback
>>> scheme. Each fence is added to a 'please poke me' list at the start of
>>> i915_add_request(). The interrupt handler then scans through the 'poke
>>> me' list when a new seqno pops out and signals any matching
>>> fence/request. The fence is then removed from the list so the entire
>>> request stack does not need to be scanned every time. Note that the
>>> fence is added to the list before the commands to generate the seqno
>>> interrupt are added to the ring. Thus the sequence is guaranteed to be
>>> race free if the interrupt is already enabled.
>>>
>>> Note that the interrupt is only enabled on demand (i.e. when
>>> __wait_request() is called). Thus there is still a potential race when
>>> enabling the interrupt as the request may already have completed.
>>> However, this is simply solved by calling the interrupt processing
>>> code immediately after enabling the interrupt and thereby checking for
>>> already completed requests.
>>>
>>> Lastly, the ring clean up code has the possibility to cancel
>>> outstanding requests (e.g. because TDR has reset the ring). These
>>> requests will never get signalled and so must be removed from the
>>> signal list manually. This is done by setting a 'cancelled' flag and
>>> then calling the regular notify/retire code path rather than
>>> attempting to duplicate the list manipulatation and clean up code in
>>> multiple places. This also avoid any race condition where the
>>> cancellation request might occur after/during the completion interrupt
>>> actually arriving.
>>>
>>> v2: Updated to take advantage of the request unreference no longer
>>> requiring the mutex lock.
>>>
>>> v3: Move the signal list processing around to prevent unsubmitted
>>> requests being added to the list. This was occurring on Android
>>> because the native sync implementation calls the
>>> fence->enable_signalling API immediately on fence creation.
>>>
>>> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
>>> 'link' instead of 'list'. Added support for returning an error code on
>>> a cancelled fence. Update list processing to be more efficient/safer
>>> with respect to spinlocks.
>>>
>>> v5: Made i915_gem_request_submit a static as it is only ever called
>>> from one place.
>>>
>>> Fixed up the low latency wait optimisation. The time delay between the
>>> seqno value being to memory and the drive's ISR running can be
>>> significant, at least for the wait request micro-benchmark. This can
>>> be greatly improved by explicitly checking for seqno updates in the
>>> pre-wait busy poll loop. Also added some documentation comments to the
>>> busy poll code.
>>>
>>> Fixed up support for the faking of lost interrupts
>>> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
>>> tells the driver to loose interrupts deliberately and then check that
>>> everything still works as expected (albeit much slower).
>>>
>>> Updates from review comments: use non IRQ-save spinlocking, early exit
>>> on WARN and improved comments (Tvrtko Ursulin).
>>>
>>> v6: Updated to newer nigthly and resolved conflicts around the
>>> wait_request busy spin optimisation. Also fixed a race condition
>>> between this early exit path and the regular completion path.
>>>
>>> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
>>> interface change on get_seqno(). Also added a list_empty() check
>>> before acquring spinlocks and doing list processing.
>>>
>>> v8: Updated to newer nightly - changes to request clean up code mean
>>> non of the deferred free mess is needed any more.
>>>
>>> v9: Moved the request completion processing out of the interrupt
>>> handler and into a worker thread (Chris Wilson).
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_dma.c         |   9 +-
>>>    drivers/gpu/drm/i915/i915_drv.h         |  11 ++
>>>    drivers/gpu/drm/i915/i915_gem.c         | 248 +++++++++++++++++++++++++++++---
>>>    drivers/gpu/drm/i915/i915_irq.c         |   2 +
>>>    drivers/gpu/drm/i915/intel_lrc.c        |   5 +
>>>    drivers/gpu/drm/i915/intel_ringbuffer.c |   5 +
>>>    drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
>>>    7 files changed, 260 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
>>> index 07edaed..f8f60bb 100644
>>> --- a/drivers/gpu/drm/i915/i915_dma.c
>>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>>> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>>>        if (dev_priv->wq == NULL)
>>>            goto out_err;
>>>
>>> +    dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
>>> +    if (dev_priv->req_wq == NULL)
>>> +        goto out_free_wq;
>>> +
>> Single (per-device) ordered workqueue will serialize interrupt processing across all engines to one thread. Together with the fact request worker does not seem to need the sleeping context, I am thinking that a tasklet per engine would be much better (see engine->irq_tasklet for an example).
Other conversations have stated that tasklets are not the best option. I 
did think about having a work queue per engine but that seemed 
excessive. Plus any subsequent work triggered by the fence completion 
will almost certainly require grabbing the driver mutex lock (because 
everything requires the mutex lock) so serialisation of engines doesn't 
sound like much of an issue.

>>
>>>        dev_priv->hotplug.dp_wq = alloc_ordered_workqueue("i915-dp", 0);
>>>        if (dev_priv->hotplug.dp_wq == NULL)
>>> -        goto out_free_wq;
>>> +        goto out_free_req_wq;
>>>
>>>        dev_priv->gpu_error.hangcheck_wq =
>>>            alloc_ordered_workqueue("i915-hangcheck", 0);
>>> @@ -1032,6 +1036,8 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>>>
>>>    out_free_dp_wq:
>>>        destroy_workqueue(dev_priv->hotplug.dp_wq);
>>> +out_free_req_wq:
>>> +    destroy_workqueue(dev_priv->req_wq);
>>>    out_free_wq:
>>>        destroy_workqueue(dev_priv->wq);
>>>    out_err:
>>> @@ -1044,6 +1050,7 @@ static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
>>>    {
>>>        destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
>>>        destroy_workqueue(dev_priv->hotplug.dp_wq);
>>> +    destroy_workqueue(dev_priv->req_wq);
>>>        destroy_workqueue(dev_priv->wq);
>>>    }
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index 69c3412..5a7f256 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -1851,6 +1851,9 @@ struct drm_i915_private {
>>>         */
>>>        struct workqueue_struct *wq;
>>>
>>> +    /* Work queue for request completion processing */
>>> +    struct workqueue_struct *req_wq;
>>> +
>>>        /* Display functions */
>>>        struct drm_i915_display_funcs display;
>>>
>>> @@ -2359,6 +2362,10 @@ struct drm_i915_gem_request {
>>>         */
>>>        struct fence fence;
>>>        struct rcu_head rcu_head;
>>> +    struct list_head signal_link;
>>> +    bool cancelled;
>>> +    bool irq_enabled;
>>> +    bool signal_requested;
>>>
>>>        /** On Which ring this request was generated */
>>>        struct drm_i915_private *i915;
>>> @@ -2460,6 +2467,10 @@ struct drm_i915_gem_request {
>>>    struct drm_i915_gem_request * __must_check
>>>    i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>                   struct i915_gem_context *ctx);
>>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>>> +                       bool fence_locked);
>>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>>> +void i915_gem_request_worker(struct work_struct *work);
>>>
>>>    static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>>>    {
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 97e3138..83cf9b0 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -39,6 +39,8 @@
>>>    #include <linux/pci.h>
>>>    #include <linux/dma-buf.h>
>>>
>>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
>>> +
>>>    static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>>>    static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
>>>    static void
>>> @@ -1237,9 +1239,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>    {
>>>        struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>>>        struct drm_i915_private *dev_priv = req->i915;
>>> -    const bool irq_test_in_progress =
>>> -        ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>>>        int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>>> +    uint32_t seqno;
>>>        DEFINE_WAIT(wait);
>>>        unsigned long timeout_expire;
>>>        s64 before = 0; /* Only to silence a compiler warning. */
>>> @@ -1247,9 +1248,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>
>>>        WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>>>
>>> -    if (list_empty(&req->list))
>>> -        return 0;
>>> -
>>>        if (i915_gem_request_completed(req))
>>>            return 0;
>>>
>>> @@ -1275,15 +1273,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>        trace_i915_gem_request_wait_begin(req);
>>>
>>>        /* Optimistic spin for the next jiffie before touching IRQs */
>>> -    ret = __i915_spin_request(req, state);
>>> -    if (ret == 0)
>>> -        goto out;
>>> -
>>> -    if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
>>> -        ret = -ENODEV;
>>> -        goto out;
>>> +    if (req->seqno) {
>> This needs a comment I think because it is so unusual and new that req->seqno == 0 is a special path. To explain why and how it can happen here.
Hmm, I think that is left over from earlier re-org of the patches. 
Invalid seqnos only come in with the scheduler as it requires being able 
to dynamically change the seqno, e.g. due to pre-empting a request. 
Although that would disappear again with the move away from global 
seqnos to the per context/engine seqno of the fence. I'll drop the test 
and leave the spin unconditional.

>>
>>> +        ret = __i915_spin_request(req, state);
>>> +        if (ret == 0)
>>> +            goto out;
>>>        }
>>>
>>> +    /*
>>> +     * Enable interrupt completion of the request.
>>> +     */
>>> +    fence_enable_sw_signaling(&req->fence);
>>> +
>>>        for (;;) {
>>>            struct timer_list timer;
>>>
>>> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>                break;
>>>            }
>>>
>>> +        if (req->seqno) {
>>> +            /*
>>> +             * There is quite a lot of latency in the user interrupt
>>> +             * path. So do an explicit seqno check and potentially
>>> +             * remove all that delay.
>>> +             */
>>> +            if (req->engine->irq_seqno_barrier)
>>> +                req->engine->irq_seqno_barrier(req->engine);
>>> +            seqno = engine->get_seqno(engine);
>>> +            if (i915_seqno_passed(seqno, req->seqno)) {
>>> +                ret = 0;
>>> +                break;
>>> +            }
>>> +        }
>>> +
>>>            if (signal_pending_state(state, current)) {
>>>                ret = -ERESTARTSYS;
>>>                break;
>>> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>                destroy_timer_on_stack(&timer);
>>>            }
>>>        }
>>> -    if (!irq_test_in_progress)
>>> -        engine->irq_put(engine);
>>>
>>>        finish_wait(&engine->irq_queue, &wait);
>> Hm I don't understand why our custom waiting remains? Shouldn't fence_wait just be called after the optimistic spin, more or less?
That would solve the 'thundering problem' if we could. In theory, the 
entire wait function should just be a call to 'fence_wait(&req->fence)'. 
Unfortunately, the wait function goes to sleep holding the mutex lock 
and requires having a bail out option on the wait which is not currently 
part of the fence API. I have a work-in-progress patch that almost 
solves the issues but it isn't quite there yet (and I haven't had much 
chance to work on it for a while).

>>
>>>    out:
>>>        trace_i915_gem_request_wait_end(req);
>>>
>>> +    if ((ret == 0) && (req->seqno)) {
>>> +        if (req->engine->irq_seqno_barrier)
>>> +            req->engine->irq_seqno_barrier(req->engine);
>>> +        seqno = engine->get_seqno(engine);
>>> +        if (i915_seqno_passed(seqno, req->seqno) &&
>>> +            !i915_gem_request_completed(req)) {
>>> +            /*
>>> +             * Make sure the request is marked as completed before
>>> +             * returning. NB: Need to acquire the spinlock around
>>> +             * the whole call to avoid a race condition with the
>>> +             * interrupt handler is running concurrently and could
>>> +             * cause this invocation to early exit even though the
>>> +             * request has not actually been fully processed yet.
>>> +             */
>>> +            spin_lock_irq(&req->engine->fence_lock);
>>> +            i915_gem_request_notify(req->engine, true);
>>> +            spin_unlock_irq(&req->engine->fence_lock);
>>> +        }
>>> +    }
>>> +
>>>        if (timeout) {
>>>            s64 tres = *timeout - (ktime_get_raw_ns() - before);
>>>
>>> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>>>    {
>>>        trace_i915_gem_request_retire(request);
>>>
>>> +    if (request->irq_enabled) {
>>> +        request->engine->irq_put(request->engine);
>>> +        request->irq_enabled = false;
>> What protects request->irq_enabled? Here versus enable_signalling bit? It can be called from the external fence users which would take the fence_lock, but here it does not.
The flag can only be set when enabling interrupt driven completion 
(which can only happen once and only if the fence is not already 
signalled). The flag can only be cleared when the fence is signalled or 
when the request is retired. And retire without signal can only happen 
if the request is being cancelled in some way (e.g. GPU reset) and thus 
will not ever be signalled. So if we get here then none of the other 
paths are possible anymore.

>>
>>> +    }
>>> +
>>>        /* We know the GPU must have read the request to have
>>>         * sent us the seqno + interrupt, so use the position
>>>         * of tail of the request to update the last known position
>>> @@ -1418,6 +1456,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>>>        list_del_init(&request->list);
>>>        i915_gem_request_remove_from_client(request);
>>>
>>> +    /*
>>> +     * In case the request is still in the signal pending list,
>>> +     * e.g. due to being cancelled by TDR, preemption, etc.
>>> +     */
>>> +    if (!list_empty(&request->signal_link)) {
>> No locking required here?
> Considering the locked function is used, I'm assuming this function holds the fence_lock.
>
> If not, something's seriously wrong.
No. If a request is being retired that still has the ability to be 
signaled then something is seriously wrong. So nothing can be modifying 
request->signal_link at this point. The code below however is wrong and 
should not be assuming the fence_lock is held because it won't be. Not 
sure how that crept in!

>>> +        /*
>>> +         * The request must be marked as cancelled and the underlying
>>> +         * fence as failed. NB: There is no explicit fence fail API,
>>> +         * there is only a manual poke and signal.
>>> +         */
>>> +        request->cancelled = true;
>>> +        /* How to propagate to any associated sync_fence??? */
> ^This way works, so comment can be removed.
>
> There's deliberately no way to cancel a fence, it's the same path but with status member set.
>
> If you have a fence for another driver, there's really no good way to handle failure. So you have to treat
> it as if it succeeded.
Yeah, it didn't used to be like that. There was a halfway house before 
with Android sync points built on top of struct fence and both had a 
signalled state (with the Android one also carrying error information). 
The comment is left over from then. I'll remove it.

>>> +        request->fence.status = -EIO;
>>> +        fence_signal_locked(&request->fence);
>> And here?
See comment above.

>>
>>> +    }
>>> +
>>>        if (request->previous_context) {
>>>            if (i915.enable_execlists)
>>>                intel_lr_context_unpin(request->previous_context,
>>> @@ -2670,6 +2724,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>>>         */
>>>        request->postfix = intel_ring_get_tail(ringbuf);
>>>
>>> +    /*
>>> +     * Add the fence to the pending list before emitting the commands to
>>> +     * generate a seqno notification interrupt.
>>> +     */
>>> +    i915_gem_request_submit(request);
>>> +
>>>        if (i915.enable_execlists)
>>>            ret = engine->emit_request(request);
>>>        else {
>>> @@ -2755,25 +2815,154 @@ static void i915_gem_request_free(struct fence *req_fence)
>>>        struct drm_i915_gem_request *req;
>>>
>>>        req = container_of(req_fence, typeof(*req), fence);
>>> +
>>> +    WARN_ON(req->irq_enabled);
>> How useful is this? If it went wrong engine irq reference counting would be bad. Okay no one would notice, but we could then stick some other warns here, list !list_empy(req->list) and who knows what, which we don't have, so I am just wondering if this one brings any value.
It could be removed. It was specifically added as part of the debug/test 
cycle for writing this patch. Now that I'm pretty sure there are no 
logic errors, I guess it can be dropped.

>>
>>> +
>>>        call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>>>    }
>>>
>>> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>> +/*
>>> + * The request is about to be submitted to the hardware so add the fence to
>>> + * the list of signalable fences.
>>> + *
>>> + * NB: This does not necessarily enable interrupts yet. That only occurs on
>>> + * demand when the request is actually waited on. However, adding it to the
>>> + * list early ensures that there is no race condition where the interrupt
>>> + * could pop out prematurely and thus be completely lost. The race is merely
>>> + * that the interrupt must be manually checked for after being enabled.
>>> + */
>>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req)
>>>    {
>>> -    /* Interrupt driven fences are not implemented yet.*/
>>> -    WARN(true, "This should not be called!");
>>> -    return true;
>>> +    /*
>>> +     * Always enable signal processing for the request's fence object
>>> +     * before that request is submitted to the hardware. Thus there is no
>>> +     * race condition whereby the interrupt could pop out before the
>>> +     * request has been added to the signal list. Hence no need to check
>>> +     * for completion, undo the list add and return false.
>>> +     */
>>> +    i915_gem_request_reference(req);
>>> +    spin_lock_irq(&req->engine->fence_lock);
>>> +    WARN_ON(!list_empty(&req->signal_link));
>>> +    list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
>>> +    spin_unlock_irq(&req->engine->fence_lock);
>>> +
>>> +    /*
>>> +     * NB: Interrupts are only enabled on demand. Thus there is still a
>>> +     * race where the request could complete before the interrupt has
>>> +     * been enabled. Thus care must be taken at that point.
>>> +     */
>>> +
>>> +    /* Have interrupts already been requested? */
>>> +    if (req->signal_requested)
>>> +        i915_gem_request_enable_interrupt(req, false);
>> I am thinking that the fence lock could be held here until the end of the function and in such way i915_gem_request_enable_interrupt would not need the fence_locked parameter any more.
>>
>> It would probably also be safer with regards to accesing the req->signal_requested. I am not sure that enable signalling and this otherwise can't race and miss the signal_requested getting set?
Yeah, makes sense to move the enable_interrupt inside the spin lock. 
Will do.

>>> +}
>>> +
>>> +/*
>>> + * The request is being actively waited on, so enable interrupt based
>>> + * completion signalling.
>>> + */
>>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>>> +                       bool fence_locked)
>>> +{
>>> +    struct drm_i915_private *dev_priv = req->engine->i915;
>>> +    const bool irq_test_in_progress =
>>> +        ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
>>> +                        intel_engine_flag(req->engine);
>>> +
>>> +    if (req->irq_enabled)
>>> +        return;
>>> +
>>> +    if (irq_test_in_progress)
>>> +        return;
>>> +
>>> +    if (!WARN_ON(!req->engine->irq_get(req->engine)))
>>> +        req->irq_enabled = true;
>> The double negation confused me a bit. It is probably not ideal since WARN_ONs go to the out of line section so in a way it is deliberately penalising the fast and expected path. I think it would be better to put a WARN on the else path.
Will do.

>>> +
>>> +    /*
>>> +     * Because the interrupt is only enabled on demand, there is a race
>>> +     * where the interrupt can fire before anyone is looking for it. So
>>> +     * do an explicit check for missed interrupts.
>>> +     */
>>> +    i915_gem_request_notify(req->engine, fence_locked);
>>>    }
>>>
>>> -static bool i915_gem_request_is_completed(struct fence *req_fence)
>>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>>    {
>>>        struct drm_i915_gem_request *req = container_of(req_fence,
>>>                             typeof(*req), fence);
>>> +
>>> +    /*
>>> +     * No need to actually enable interrupt based processing until the
>>> +     * request has been submitted to the hardware. At which point
>>> +     * 'i915_gem_request_submit()' is called. So only really enable
>>> +     * signalling in there. Just set a flag to say that interrupts are
>>> +     * wanted when the request is eventually submitted. On the other hand
>>> +     * if the request has already been submitted then interrupts do need
>>> +     * to be enabled now.
>>> +     */
>>> +
>>> +    req->signal_requested = true;
>>> +
>>> +    if (!list_empty(&req->signal_link))
>> In what scenarios is the list_empty check needed? Someone can somehow enable signalling on a fence not yet submitted?
Yes. It is guaranteed to happen in Android. And when the scheduler 
arrives (which this series is officially prep work for) and submission 
is deferred, it becomes a lot easier in a non-Android system. I don't 
think there is currently a code path that will hit this case but because 
such is definitely coming, I would rather include the facility from the 
start.

>>> +        i915_gem_request_enable_interrupt(req, true);
>>> +
>>> +    return true;
>>> +}
>>> +
>>> +/**
>>> + * i915_gem_request_worker - request work handler callback.
>>> + * @work: Work structure
>>> + * Called in response to a seqno interrupt to process the completed requests.
>>> + */
>>> +void i915_gem_request_worker(struct work_struct *work)
>>> +{
>>> +    struct intel_engine_cs *engine;
>>> +
>>> +    engine = container_of(work, struct intel_engine_cs, request_work);
>>> +    i915_gem_request_notify(engine, false);
>>> +}
>>> +
>>> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
>>> +{
>>> +    struct drm_i915_gem_request *req, *req_next;
>>> +    unsigned long flags;
>>>        u32 seqno;
>>>
>>> -    seqno = req->engine->get_seqno(req->engine);
>>> +    if (list_empty(&engine->fence_signal_list))
>> Okay this without the lock still makes me nervous. I'd rather not having to think about why it is safe and can't miss a wakeup.
I don't see how list_empty() can return a false negative. Even if the 
implementation was such that it could see a partially updated state 
across multiple memory accesses, that will just lead to it thinking 
not-empty which is fine. Any update which takes it from empty to 
not-empty is guaranteed to occur before the act of enabling interrupts 
and thus before notify() can be called. So while it could potentially do 
the full processing when an early exit was fine, it can never early exit 
when it needs to do something.

>> Also I would be leaning toward having i915_gem_request_notify and i915_gem_request_notify__unlocked. With the enable_interrupts simplification I suggested it think it would look better and be more consistent with the rest of the driver.
>>
>>> +        return;
>>> +
>>> +    if (!fence_locked)
>>> +        spin_lock_irqsave(&engine->fence_lock, flags);
>> Not called from hard irq context so can be just spin_lock_irq.
>>
>> But if you agree to go with the tasklet it would then be spin_lock_bh.
> fence is always spin_lock_irq, if this requires _bh then it can't go into the tasklet.
Updated.

>>> -    return i915_seqno_passed(seqno, req->seqno);
>>> +    if (engine->irq_seqno_barrier)
>>> +        engine->irq_seqno_barrier(engine);
>>> +    seqno = engine->get_seqno(engine);
>>> +
>>> +    list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
>>> +        if (!req->cancelled) {
>>> +            if (!i915_seqno_passed(seqno, req->seqno))
>>> +                break;
>> Merge to one if statement?
Will do.

>>
>>> +        }
>>> +
>>> +        /*
>>> +         * Start by removing the fence from the signal list otherwise
>>> +         * the retire code can run concurrently and get confused.
>>> +         */
>>> +        list_del_init(&req->signal_link);
>>> +
>>> +        if (!req->cancelled)
>>> +            fence_signal_locked(&req->fence);
>> I forgot how signalling errors to userspace works? Does that still work for cancelled fences in this series?
Yes, as Maarten mentioned there is no explicit error signal. You just 
set the status to an error code and signal the fence as normal. the 
signal here is only for genuinely completed fences. The cancelled case 
is handled further up.

>>> +
>>> +        if (req->irq_enabled) {
>>> +            req->engine->irq_put(req->engine);
>>> +            req->irq_enabled = false;
>>> +        }
>>> +
>>> +        i915_gem_request_unreference(req);
>>> +    }
>>> +
>>> +    if (!fence_locked)
>>> +        spin_unlock_irqrestore(&engine->fence_lock, flags);
>>>    }
>>>
>>>    static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
>>> @@ -2816,7 +3005,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence,
>>>
>>>    static const struct fence_ops i915_gem_request_fops = {
>>>        .enable_signaling    = i915_gem_request_enable_signaling,
>>> -    .signaled        = i915_gem_request_is_completed,
>>>        .wait            = fence_default_wait,
>>>        .release        = i915_gem_request_free,
>>>        .get_driver_name    = i915_gem_request_get_driver_name,
>>> @@ -2902,6 +3090,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>        req->ctx  = ctx;
>>>        i915_gem_context_reference(req->ctx);
>>>
>>> +    INIT_LIST_HEAD(&req->signal_link);
>>>        fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
>>>               ctx->engine[engine->id].fence_timeline.fence_context,
>>>               i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
>>> @@ -3036,6 +3225,13 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
>>>            i915_gem_request_retire(request);
>>>        }
>>>
>>> +    /*
>>> +     * Tidy up anything left over. This includes a call to
>>> +     * i915_gem_request_notify() which will make sure that any requests
>>> +     * that were on the signal pending list get also cleaned up.
>>> +     */
>>> +    i915_gem_retire_requests_ring(engine);
>> Hmm.. but this function has just walked the same lists this will, and done the same processing. Why call this from here? It looks bad to me, the two are different special cases of the similar thing so I can't see that calling this from here makes sense.
Hmm, not sure. Possibly left over from an earlier version that was 
slightly different. Will tidy this up.

>>> +
>>>        /* Having flushed all requests from all queues, we know that all
>>>         * ringbuffers must now be empty. However, since we do not reclaim
>>>         * all space when retiring the request (to prevent HEADs colliding
>>> @@ -3082,6 +3278,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
>>>    {
>>>        WARN_ON(i915_verify_lists(engine->dev));
>>>
>>> +    /*
>>> +     * If no-one has waited on a request recently then interrupts will
>>> +     * not have been enabled and thus no requests will ever be marked as
>>> +     * completed. So do an interrupt check now.
>>> +     */
>>> +    i915_gem_request_notify(engine, false);
>> Would it work to signal the fence from the existing loop a bit above in this function which already walks the request list in search for completed ones? Or maybe even in i915_gem_request_retire?
>>
>> I am thinking about doing less list walking and better integration with the core GEM. Downside would be more traffic on the fence_lock, hmm.. not sure then. It just looks a bit bolted on like this.
>>
>> I don't see it being a noticeable cost so perhaps it can stay like this for now.
>>
>>> +
>>>        /* Retire requests first as we use it above for the early return.
>>>         * If we retire requests last, we may use a later seqno and so clear
>>>         * the requests lists without clearing the active list, leading to
>>> @@ -5102,6 +5305,7 @@ init_engine_lists(struct intel_engine_cs *engine)
>>>    {
>>>        INIT_LIST_HEAD(&engine->active_list);
>>>        INIT_LIST_HEAD(&engine->request_list);
>>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>>    }
>>>
>>>    void
>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>> index f780421..a87a3c5 100644
>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>> @@ -994,6 +994,8 @@ static void notify_ring(struct intel_engine_cs *engine)
>>>        trace_i915_gem_request_notify(engine);
>>>        engine->user_interrupts++;
>>>
>>> +    queue_work(engine->i915->req_wq, &engine->request_work);
>>> +
>>>        wake_up_all(&engine->irq_queue);
>> Yes that is the weird part, why the engine->irq_queue has to remain with this patch?
See earlier comments about not being able to use the wait-on-fence 
interface instead due to complications with GPU reset and such. 
Certainly the aim is to get rid of the wake_up_all() eventually, but 
that requires a lot more work over and above this patch.

>>
>>>    }
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index f126bcb..134759d 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -1879,6 +1879,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>>>
>>>        dev_priv = engine->i915;
>>>
>>> +    cancel_work_sync(&engine->request_work);
>>> +
>>>        if (engine->buffer) {
>>>            intel_logical_ring_stop(engine);
>>>            WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0);
>>> @@ -2027,6 +2029,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>>>
>>>        INIT_LIST_HEAD(&engine->active_list);
>>>        INIT_LIST_HEAD(&engine->request_list);
>>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>>        INIT_LIST_HEAD(&engine->buffers);
>>>        INIT_LIST_HEAD(&engine->execlist_queue);
>>>        spin_lock_init(&engine->execlist_lock);
>>> @@ -2035,6 +2038,8 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>>>        tasklet_init(&engine->irq_tasklet,
>>>                 intel_lrc_irq_handler, (unsigned long)engine);
>>>
>>> +    INIT_WORK(&engine->request_work, i915_gem_request_worker);
>>> +
>>>        logical_ring_init_platform_invariants(engine);
>>>        logical_ring_default_vfuncs(engine);
>>>        logical_ring_default_irqs(engine, info->irq_shift);
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> index fbd3f12..1641096 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>>        INIT_LIST_HEAD(&engine->request_list);
>>>        INIT_LIST_HEAD(&engine->execlist_queue);
>>>        INIT_LIST_HEAD(&engine->buffers);
>>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>>        spin_lock_init(&engine->fence_lock);
>>>        i915_gem_batch_pool_init(dev, &engine->batch_pool);
>>>        memset(engine->semaphore.sync_seqno, 0,
>>> @@ -2261,6 +2262,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>>
>>>        init_waitqueue_head(&engine->irq_queue);
>>>
>>> +    INIT_WORK(&engine->request_work, i915_gem_request_worker);
>>> +
>>>        ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
>>>        if (IS_ERR(ringbuf)) {
>>>            ret = PTR_ERR(ringbuf);
>>> @@ -2307,6 +2310,8 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
>>>
>>>        dev_priv = engine->i915;
>>>
>>> +    cancel_work_sync(&engine->request_work);
>>> +
>>>        if (engine->buffer) {
>>>            intel_stop_engine(engine);
>>>            WARN_ON(!IS_GEN2(dev_priv) && (I915_READ_MODE(engine) & MODE_IDLE) == 0);
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> index 3f39daf..51779b4 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> @@ -347,6 +347,9 @@ struct intel_engine_cs {
>>>        u32 (*get_cmd_length_mask)(u32 cmd_header);
>>>
>>>        spinlock_t fence_lock;
>>> +    struct list_head fence_signal_list;
>>> +
>>> +    struct work_struct request_work;
>>>    };
>>>
>>>    static inline bool



More information about the Intel-gfx mailing list