Deadlock on PTEs update for HMM

Philip Yang philip.yang at amd.com
Fri Nov 29 14:26:20 UTC 2019


Yes, this can work using the same way as dqm_lock. This is trivial part, 
Felix and Christian is discussing the solution of lock problem.

Regards,
Philip

On 2019-11-28 7:35 p.m., Zeng, Oak wrote:
> [AMD Official Use Only - Internal Distribution Only]
> 
> Is kmalloc with GFP_NOWAIT an option here?
> 
> Regards,
> 
> Oak
> 
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> *On Behalf Of * 
> Sierra Guiza, Alejandro (Alex)
> *Sent:* Wednesday, November 27, 2019 9:55 AM
> *To:* Koenig, Christian <Christian.Koenig at amd.com>; Kuehling, Felix 
> <Felix.Kuehling at amd.com>
> *Cc:* amd-gfx at lists.freedesktop.org
> *Subject:* Deadlock on PTEs update for HMM
> 
> Hi Christian,
> 
> As you know, we’re working on the HMM enablement. Im working on the dGPU 
> page table entries invalidation on the userptr mapping case. Currently, 
> the MMU notifiers handle stops all user mode queues, schedule a delayed 
> worker to re-validate userptr mappings and restart the queues.
> 
> Part of the HMM functionality, we need to invalidate the page table 
> entries instead of stopping the queues. At the same time we need to move 
> the revalidation of the userptr mappings into the page fault handler.
> 
> We’re seeing a deadlock warning after we try to invalidate the PTEs 
> inside the MMU notifier handler. More specific, when we try to update 
> the BOs to invalidate PTEs using amdgpu_vm_bo_update. This uses kmalloc 
> on the amdgpu_job_alloc which seems to be causing this problem.
> 
> Based on @Kuehling, Felix <mailto:Felix.Kuehling at amd.com> comments, 
> kmalloc without any special flags can cause memory reclaim. Doing that 
> inside an MMU notifier is problematic, because an MMU notifier may be 
> called inside a memory-reclaim operation itself. That would result in 
> recursion. Also, reclaim shouldn't be done while holding a lock that can 
> be taken in an MMU notifier for the same reason. If you cause a reclaim 
> while holding that lock, then an MMU notifier called by the reclaim 
> can deadlock trying to take the same lock.
> 
> Please let us know if you have any advice to enable this the right way
> 
> Thanks in advanced,
> 
> Alejandro
> 
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cphilip.yang%40amd.com%7Cff5def1cf2de44a6bca608d77464043e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637105845255754611&sdata=Sz3LnrlJ8E56eftV3YCh6YdT6nNlMeaA5JFpDtKBPkc%3D&reserved=0
> 


More information about the amd-gfx mailing list