[Intel-gfx] [RFC 15/44] drm/i915: Added deferred work handler for scheduler

John Harrison John.C.Harrison at Intel.com
Thu Jul 24 17:42:55 CEST 2014


On 23/07/2014 19:50, Daniel Vetter wrote:
> On Wed, Jul 23, 2014 at 5:37 PM, John Harrison
> <John.C.Harrison at intel.com> wrote:
>>>>    diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>>> b/drivers/gpu/drm/i915/i915_drv.h
>>>> index 0977653..fbafa68 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -1075,6 +1075,16 @@ struct i915_gem_mm {
>>>>          struct delayed_work idle_work;
>>>>          /**
>>>> +        * New scheme is to get an interrupt after every work packet
>>>> +        * in order to allow the low latency scheduling of pending
>>>> +        * packets. The idea behind adding new packets to a pending
>>>> +        * queue rather than directly into the hardware ring buffer
>>>> +        * is to allow high priority packets to over take low priority
>>>> +        * ones.
>>>> +        */
>>>> +       struct work_struct scheduler_work;
>>> Latency for work items isn't too awesome, and e.g. Oscar's execlist code
>>> latches the next context right away from the irq handler. Why can't we do
>>> something similar for the scheduler? Fishing the next item out of a
>>> priority queue shouldn't be expensive ...
>>> -Daniel
>>
>> The problem is that taking batch buffers from the scheduler's queue and
>> submitting them to the hardware requires lots of processing that is not IRQ
>> compatible. It isn't just a simple register write. Half of the code in
>> 'i915_gem_do_execbuffer()' must be executed. Probably/possibly it could be
>> made IRQ friendly but that would place a lot of restrictions on a lot of
>> code that currently doesn't expect to be restricted. Instead, the submission
>> is done via a work handler that acquires the driver mutex lock.
>>
>> In order to cover the extra latency, the scheduler operates in a
>> multi-buffered mode and aims to keep eight batch buffers in flight at all
>> times. That number being obtained empirically by running lots of benchmarks
>> on Android with lots of different settings and seeing where the buffer size
>> stopped making a difference.
> So I've tried to stitch together that part of the scheduler from the
> patch series. Afaics you do the actual scheduling under the protection
> of irqsave spinlocks (well you also hold the dev->struct_mutex). That
> means you disable local interrupts. Up to the actual submit point I
> spotted two such critcial sections encompassing pretty much all the
> code.
>
> If we'd run the same code from the interrupt handler then only our own
> interrupt handler is blocked, all other interrupt processing can
> continue. So that's actually a lot nicer than what you have. In any
> case you can't do expensive operations under an irqsave spinlock
> anyway.
>
> So either I've missed something big here, or this justification doesn't hold up.
> -Daniel

The irqsave spinlock is only held while manipulating the internal 
scheduler data structures. It is released immediately prior to calling 
i915_gem_do_execbuffer_final(). So the actual submission code path is 
done with the driver mutex but no spinlocks. I'm sure I got 'scheduling 
while atomic' bug checks the one time I accidentally left the spinlock held.




More information about the Intel-gfx mailing list