[Intel-gfx] [PATCH i-g-t] i915/gem_exec_balancer: Race breadcrumb signaling against timeslicing

Fri Jul 17 10:19:16 UTC 2020

Quoting Tvrtko Ursulin (2020-07-17 09:34:07)
> 
> On 16/07/2020 21:44, Chris Wilson wrote:
> I am not sure if the batch duration is not too short in practice, the 
> add loop will really rapidly end all, just needs 64 iterations on 
> average to end all 32 I think. So 64 WC writes from the CPU compared to 
> CSB processing and breadcrumb signaling latencies might be too short. 
> Maybe some small random udelays in the loop would be more realistic. 
> Maybe as a 2nd flavour of the test just in case.. more coverage the better.

GPU			kernel			IGT
semaphore wait
  -> raise interrupt
			handle interrupt
			  -> kick tasklet
			begin preempt-to-busy   semaphore signal
semaphore completes
request completes
			submit new ELSP[]
			  -> stale unwound request

Duration of the batch/semaphore itself doesn't really factor into it,
it's that we have to let batch complete after we begin the process of
scheduling it out for an expired timeslice. It's such a small window and
I don't see a good way of hitting it reliably from userspace.

With some printk, I was able to confirm that we were timeslicing virtual
requests and moving them between engines with active breadcrumbs. But I
never once saw any of the bugs with the stale requests, using this test.

Somehow we want to length the preempt-to-busy window and coincide the
request completion at the same time. So far all I have is yucky (too
single purpose, we would be better off writing unit tests for each of
the steps involved).
-Chris