[Intel-gfx] [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
Peter Zijlstra
peterz at infradead.org
Thu Sep 12 17:14:20 CEST 2013
On Thu, Sep 12, 2013 at 05:11:25PM +0200, Maarten Lankhorst wrote:
> Op 12-09-13 17:06, Peter Zijlstra schreef:
> > Hi Dave,
> >
> > So I'm poking around the preemption code and stumbled upon:
> >
> > drivers/gpu/drm/i915/i915_gem.c: set_need_resched();
> > drivers/gpu/drm/ttm/ttm_bo_vm.c: set_need_resched();
> > drivers/gpu/drm/ttm/ttm_bo_vm.c: set_need_resched();
> > drivers/gpu/drm/udl/udl_gem.c: set_need_resched();
> >
> > All these sites basically do:
> >
> > while (!trylock())
> > yield();
> >
> > which is a horrible and broken locking pattern.
> >
> > Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
> > task that preempted the lock holder at FIFOn.
> >
> > Secondly the implementation is worse than usual by abusing
> > VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
> > doesn't retry, but you're using it as a get out of fault path. And
> > you're using set_need_resched() which is not something a driver should
> > _ever_ touch.
> >
> > Now I'm going to take away set_need_resched() -- and while you can
> > 'reimplement' it using set_thread_flag() you're not going to do that
> > because it will be broken due to changes to the preempt code.
> >
> > So please as to fix ASAP and don't allow anybody to trick you into
> > merging silly things like that again ;-)
> >
> Agreed, but this is a case of locking inversion. How do you propose to get around that?
me? No idea, I've never looked at the actual locking in DRM. Someone
who's familiar with that code would have to tackle that.
I just spotted the fail because I was going to remove set_need_resched()
and had a WTF moment when I tripped over this stuff.
More information about the Intel-gfx
mailing list