[Intel-gfx] [PATCH v9 4/6] drm/i915: Interrupt driven fences
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Jun 14 11:35:30 UTC 2016
On 13/06/16 16:51, John Harrison wrote:
[snip]
>>>> diff --git a/drivers/gpu/drm/i915/i915_dma.c
>>>> b/drivers/gpu/drm/i915/i915_dma.c
>>>> index 07edaed..f8f60bb 100644
>>>> --- a/drivers/gpu/drm/i915/i915_dma.c
>>>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>>>> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct
>>>> drm_i915_private *dev_priv)
>>>> if (dev_priv->wq == NULL)
>>>> goto out_err;
>>>>
>>>> + dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
>>>> + if (dev_priv->req_wq == NULL)
>>>> + goto out_free_wq;
>>>> +
>>> Single (per-device) ordered workqueue will serialize interrupt
>>> processing across all engines to one thread. Together with the fact
>>> request worker does not seem to need the sleeping context, I am
>>> thinking that a tasklet per engine would be much better (see
>>> engine->irq_tasklet for an example).
> Other conversations have stated that tasklets are not the best option. I
> did think about having a work queue per engine but that seemed
> excessive. Plus any subsequent work triggered by the fence completion
> will almost certainly require grabbing the driver mutex lock (because
> everything requires the mutex lock) so serialisation of engines doesn't
> sound like much of an issue.
In this patch AFAICS i915_gem_request_worker calls
i915_gem_request_notify on the engine which only holds the
engine->fence_lock. So if you had a per engine wq all engines could do
that processing in parallel.
It is only a matter of when fence_signal_locked wakes up the waiters
that they might go and hammer on struct_mutex, but for any work they
want to do between waking up and submitting new work, serializing via a
single wq will be bad for latency.
>>>
>>>> + ret = __i915_spin_request(req, state);
>>>> + if (ret == 0)
>>>> + goto out;
>>>> }
>>>>
>>>> + /*
>>>> + * Enable interrupt completion of the request.
>>>> + */
>>>> + fence_enable_sw_signaling(&req->fence);
>>>> +
>>>> for (;;) {
>>>> struct timer_list timer;
>>>>
>>>> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct
>>>> drm_i915_gem_request *req,
>>>> break;
>>>> }
>>>>
>>>> + if (req->seqno) {
>>>> + /*
>>>> + * There is quite a lot of latency in the user interrupt
>>>> + * path. So do an explicit seqno check and potentially
>>>> + * remove all that delay.
>>>> + */
>>>> + if (req->engine->irq_seqno_barrier)
>>>> + req->engine->irq_seqno_barrier(req->engine);
>>>> + seqno = engine->get_seqno(engine);
>>>> + if (i915_seqno_passed(seqno, req->seqno)) {
>>>> + ret = 0;
>>>> + break;
>>>> + }
>>>> + }
>>>> +
>>>> if (signal_pending_state(state, current)) {
>>>> ret = -ERESTARTSYS;
>>>> break;
>>>> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct
>>>> drm_i915_gem_request *req,
>>>> destroy_timer_on_stack(&timer);
>>>> }
>>>> }
>>>> - if (!irq_test_in_progress)
>>>> - engine->irq_put(engine);
>>>>
>>>> finish_wait(&engine->irq_queue, &wait);
>>> Hm I don't understand why our custom waiting remains? Shouldn't
>>> fence_wait just be called after the optimistic spin, more or less?
> That would solve the 'thundering problem' if we could. In theory, the
> entire wait function should just be a call to 'fence_wait(&req->fence)'.
> Unfortunately, the wait function goes to sleep holding the mutex lock
> and requires having a bail out option on the wait which is not currently
> part of the fence API. I have a work-in-progress patch that almost
> solves the issues but it isn't quite there yet (and I haven't had much
> chance to work on it for a while).
[Btw does it also mean i915 can not handle the incoming 3rd party fences?]
So the userspace waiters can either wait on a dma-buf or i915 wq,
depending on what ioctl they have called. Then on interrupt i915 waiters
are woken directly, while the fence api ones go via a worker.
i915 waiters also do a second wake up the fence API waiters by...
>>>
>>>> out:
>>>> trace_i915_gem_request_wait_end(req);
>>>>
>>>> + if ((ret == 0) && (req->seqno)) {
>>>> + if (req->engine->irq_seqno_barrier)
>>>> + req->engine->irq_seqno_barrier(req->engine);
>>>> + seqno = engine->get_seqno(engine);
>>>> + if (i915_seqno_passed(seqno, req->seqno) &&
>>>> + !i915_gem_request_completed(req)) {
>>>> + /*
>>>> + * Make sure the request is marked as completed before
>>>> + * returning. NB: Need to acquire the spinlock around
>>>> + * the whole call to avoid a race condition with the
>>>> + * interrupt handler is running concurrently and could
>>>> + * cause this invocation to early exit even though the
>>>> + * request has not actually been fully processed yet.
>>>> + */
>>>> + spin_lock_irq(&req->engine->fence_lock);
>>>> + i915_gem_request_notify(req->engine, true);
... this call here.
If there are both classes of waiters one of those calls will be a waste
of cycles (spin lock, irq toggle, list iter, coherent seqno read) correct?
>>>> + spin_unlock_irq(&req->engine->fence_lock);
>>>> + }
>>>> + }
>>>> +
>>>> if (timeout) {
>>>> s64 tres = *timeout - (ktime_get_raw_ns() - before);
>>>>
>>>> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct
>>>> drm_i915_gem_request *request)
>>>> {
>>>> trace_i915_gem_request_retire(request);
>>>>
>>>> + if (request->irq_enabled) {
>>>> + request->engine->irq_put(request->engine);
>>>> + request->irq_enabled = false;
>>> What protects request->irq_enabled? Here versus enable_signalling
>>> bit? It can be called from the external fence users which would take
>>> the fence_lock, but here it does not.
> The flag can only be set when enabling interrupt driven completion
> (which can only happen once and only if the fence is not already
> signalled). The flag can only be cleared when the fence is signalled or
> when the request is retired. And retire without signal can only happen
> if the request is being cancelled in some way (e.g. GPU reset) and thus
> will not ever be signalled. So if we get here then none of the other
> paths are possible anymore.
Couldn't retire without signal happen via execbuf paths calling
i915_gem_retire_requests_ring?
>>>> + i915_gem_request_enable_interrupt(req, true);
>>>> +
>>>> + return true;
>>>> +}
>>>> +
>>>> +/**
>>>> + * i915_gem_request_worker - request work handler callback.
>>>> + * @work: Work structure
>>>> + * Called in response to a seqno interrupt to process the completed
>>>> requests.
>>>> + */
>>>> +void i915_gem_request_worker(struct work_struct *work)
>>>> +{
>>>> + struct intel_engine_cs *engine;
>>>> +
>>>> + engine = container_of(work, struct intel_engine_cs, request_work);
>>>> + i915_gem_request_notify(engine, false);
>>>> +}
>>>> +
>>>> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool
>>>> fence_locked)
>>>> +{
>>>> + struct drm_i915_gem_request *req, *req_next;
>>>> + unsigned long flags;
>>>> u32 seqno;
>>>>
>>>> - seqno = req->engine->get_seqno(req->engine);
>>>> + if (list_empty(&engine->fence_signal_list))
>>> Okay this without the lock still makes me nervous. I'd rather not
>>> having to think about why it is safe and can't miss a wakeup.
> I don't see how list_empty() can return a false negative. Even if the
> implementation was such that it could see a partially updated state
> across multiple memory accesses, that will just lead to it thinking
> not-empty which is fine. Any update which takes it from empty to
> not-empty is guaranteed to occur before the act of enabling interrupts
> and thus before notify() can be called. So while it could potentially do
> the full processing when an early exit was fine, it can never early exit
> when it needs to do something.
Something like that sounds like a good comment to put above then! :)
Regards,
Tvrtko
More information about the Intel-gfx
mailing list