[Intel-gfx] [PATCH v4 18/38] drm/i915: Added scheduler support to __wait_request() calls

Tue Jan 12 03:28:58 PST 2016

On 11/01/2016 23:14, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 06:42:47PM +0000, John.C.Harrison at Intel.com wrote:
>> From: John Harrison <John.C.Harrison at Intel.com>
>>
>> The scheduler can cause batch buffers, and hence requests, to be
>> submitted to the ring out of order and asynchronously to their
>> submission to the driver. Thus at the point of waiting for the
>> completion of a given request, it is not even guaranteed that the
>> request has actually been sent to the hardware yet. Even it is has
>> been sent, it is possible that it could be pre-empted and thus
>> 'unsent'.
>>
>> This means that it is necessary to be able to submit requests to the
>> hardware during the wait call itself. Unfortunately, while some
>> callers of __wait_request() release the mutex lock first, others do
>> not (and apparently can not). Hence there is the ability to deadlock
>> as the wait stalls for submission but the asynchronous submission is
>> stalled for the mutex lock.
> That is a nonsequitor. Do you mean to say that unless we take action
> inside GEM, the request will never be submitted to hardware by the
> scheduler?

Potentially. The scheduler holds on to batches inside its internal queue 
for later submission. It can also preempt batches that have already been 
sent to the hardware. Thus the wait call might be waiting on a batch 
with has or has not been submitted but even it is currently executing, 
it might get kicked out and need re-submitting. That submission requires 
calling the back end submission part of the driver - legacy ring buffer, 
execlist or GuC. Those back ends all require grabbing the mutex lock.

>   
>> This change hooks the scheduler in to the __wait_request() code to
>> ensure correct behaviour. That is, flush the target batch buffer
>> through to the hardware and do not deadlock waiting for something that
>> cannot currently be submitted.
> The dependencies are known during request construction, how could we
> generate a cyclic graph?
It is not so much a cyclic graph as a bouncing one. As noted above, even 
a batch buffer which is currently executing could get preempted and need 
to be resubmitted again while someone is waiting one it.

>   The scheduler itself does not need the
> struct_mutex (other than the buggy code),
The scheduler internally does not need the mutex lock at all - it has 
its own private spinlock for critical data. However, the back end 
submission paths do currently require the mutex lock.

>   so GEM holding the
> struct_mutex will not prevent the scheduler from eventually submitting
> the request we are waiting for. So as far as I can see, you are papering
> over your own bugs.
> -Chris
>