[Intel-gfx] [PATCH] drm/i915: Wait for completion of pending flips when starved of fences

Mon Jan 20 17:17:16 CET 2014

On Mon, Jan 20, 2014 at 10:44:27AM +0000, Chris Wilson wrote:
> On Mon, Jan 20, 2014 at 11:37:42AM +0100, Daniel Vetter wrote:
> > On Mon, Jan 20, 2014 at 09:49:24AM +0000, Chris Wilson wrote:
> > > On Sun, Jan 19, 2014 at 10:55:26PM +0100, Daniel Vetter wrote:
> > > > Also there's a certain chance we'll starve
> > > > the unpin work, similar to the issues around flushing the unpin work
> > > > in our pageflip implementation.
> > > 
> > > If you mean that we will never run the unpin workqueue, that's what the
> > > implementation will fix, eventually, after a busy-spin in userspace since
> > > set_need_resched() was removed. I can teach userspace to yield() after
> > > an EAGAIN which seems a reasonable compromise (userspace gets a bonus
> > > for being cooperative rather than penalized for using up its timeslice.)
> > 
> > yield won't help, we need to block on the work-queue draining like we do
> > in the pageflip code with flush_workqueue. At least we've had bug reports
> > in the past where someone found it intriguing to run his entire userspace
> > with rt prio, which ended up starving the sched_normal workqueue and so
> > livelocked the entire system.
> 
> But isn't that exactly the behaviour the RT user programmed?

Well userspace asked for pageflips and execbuffers and the kernel
delivered a deadlock. So not quite imo ;-)

I know it's a ridiculous corner-case but in general I think the kernel
should ensure that forward process of userspace requests happens and that
offloading things to workqueues is just an implementation details. Also,
epxlicit waits and locks have better debug instrumentation compared to
hand-rolled busy-loops.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch