[Intel-gfx] [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

Thu Sep 12 17:11:25 CEST 2013

Op 12-09-13 17:06, Peter Zijlstra schreef:
> Hi Dave,
>
> So I'm poking around the preemption code and stumbled upon:
>
> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>
> All these sites basically do:
>
>   while (!trylock())
>   	yield();
>
> which is a horrible and broken locking pattern. 
>
> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
> task that preempted the lock holder at FIFOn.
>
> Secondly the implementation is worse than usual by abusing
> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
> doesn't retry, but you're using it as a get out of fault path. And
> you're using set_need_resched() which is not something a driver should
> _ever_ touch.
>
> Now I'm going to take away set_need_resched() -- and while you can
> 'reimplement' it using set_thread_flag() you're not going to do that
> because it will be broken due to changes to the preempt code.
>
> So please as to fix ASAP and don't allow anybody to trick you into
> merging silly things like that again ;-)
>
Agreed, but this is a case of locking inversion. How do you propose to get around that?

~Maarten