[PATCH v2 hmm 00/11] Various revisions from a locking/code review
Yang, Philip
Philip.Yang at amd.com
Wed Jun 12 21:49:12 UTC 2019
Rebase to https://github.com/jgunthorpe/linux.git hmm branch, need some
changes because of interface hmm_range_register change. Then run a quick
amdgpu_test. Test is finished, result is ok. But there is below kernel
BUG message, seems hmm_free_rcu calls down_write.....
[ 1171.919921] BUG: sleeping function called from invalid context at
/home/yangp/git/compute_staging/kernel/kernel/locking/rwsem.c:65
[ 1171.919933] in_atomic(): 1, irqs_disabled(): 0, pid: 53, name:
kworker/1:1
[ 1171.919938] 2 locks held by kworker/1:1/53:
[ 1171.919940] #0: 000000001c7c19d4 ((wq_completion)rcu_gp){+.+.}, at:
process_one_work+0x20e/0x630
[ 1171.919951] #1: 00000000923f2cfa
((work_completion)(&sdp->work)){+.+.}, at: process_one_work+0x20e/0x630
[ 1171.919959] CPU: 1 PID: 53 Comm: kworker/1:1 Tainted: G W
5.2.0-rc1-kfd-yangp #196
[ 1171.919961] Hardware name: ASUS All Series/Z97-PRO(Wi-Fi ac)/USB 3.1,
BIOS 9001 03/07/2016
[ 1171.919965] Workqueue: rcu_gp srcu_invoke_callbacks
[ 1171.919968] Call Trace:
[ 1171.919974] dump_stack+0x67/0x9b
[ 1171.919980] ___might_sleep+0x149/0x230
[ 1171.919985] down_write+0x1c/0x70
[ 1171.919989] hmm_free_rcu+0x24/0x80
[ 1171.919993] srcu_invoke_callbacks+0xc9/0x150
[ 1171.920000] process_one_work+0x28e/0x630
[ 1171.920008] worker_thread+0x39/0x3f0
[ 1171.920014] ? process_one_work+0x630/0x630
[ 1171.920017] kthread+0x11c/0x140
[ 1171.920021] ? kthread_park+0x90/0x90
[ 1171.920026] ret_from_fork+0x24/0x30
Philip
On 2019-06-12 1:54 p.m., Kuehling, Felix wrote:
> [+Philip]
>
> Hi Jason,
>
> I'm out of the office this week.
>
> Hi Philip, can you give this a go? Not sure how much you've been
> following this patch series review. Message or call me on Skype to
> discuss any questions.
>
> Thanks,
> Felix
>
> On 2019-06-11 12:48, Jason Gunthorpe wrote:
>> On Thu, Jun 06, 2019 at 03:44:27PM -0300, Jason Gunthorpe wrote:
>>> From: Jason Gunthorpe <jgg at mellanox.com>
>>>
>>> For hmm.git:
>>>
>>> This patch series arised out of discussions with Jerome when looking at the
>>> ODP changes, particularly informed by use after free races we have already
>>> found and fixed in the ODP code (thanks to syzkaller) working with mmu
>>> notifiers, and the discussion with Ralph on how to resolve the lifetime model.
>>>
>>> Overall this brings in a simplified locking scheme and easy to explain
>>> lifetime model:
>>>
>>> If a hmm_range is valid, then the hmm is valid, if a hmm is valid then the mm
>>> is allocated memory.
>>>
>>> If the mm needs to still be alive (ie to lock the mmap_sem, find a vma, etc)
>>> then the mmget must be obtained via mmget_not_zero().
>>>
>>> Locking of mm->hmm is shifted to use the mmap_sem consistently for all
>>> read/write and unlocked accesses are removed.
>>>
>>> The use unlocked reads on 'hmm->dead' are also eliminated in favour of using
>>> standard mmget() locking to prevent the mm from being released. Many of the
>>> debugging checks of !range->hmm and !hmm->mm are dropped in favour of poison -
>>> which is much clearer as to the lifetime intent.
>>>
>>> The trailing patches are just some random cleanups I noticed when reviewing
>>> this code.
>>>
>>> This v2 incorporates alot of the good off list changes & feedback Jerome had,
>>> and all the on-list comments too. However, now that we have the shared git I
>>> have kept the one line change to nouveau_svm.c rather than the compat
>>> funtions.
>>>
>>> I believe we can resolve this merge in the DRM tree now and keep the core
>>> mm/hmm.c clean. DRM maintainers, please correct me if I'm wrong.
>>>
>>> It is on top of hmm.git, and I have a git tree of this series to ease testing
>>> here:
>>>
>>> https://github.com/jgunthorpe/linux/tree/hmm
>>>
>>> There are still some open locking issues, as I think this remains unaddressed:
>>>
>>> https://lore.kernel.org/linux-mm/20190527195829.GB18019@mellanox.com/
>>>
>>> I'm looking for some more acks, reviews and tests so this can move ahead to
>>> hmm.git.
>> AMD Folks, this is looking pretty good now, can you please give at
>> least a Tested-by for the new driver code using this that I see in
>> linux-next?
>>
>> Thanks,
>> Jason
More information about the dri-devel
mailing list