[Nouveau] [PATCH] drm/nouveau/dmem: missing mutex_lock in error path
Ralph Campbell
rcampbell at nvidia.com
Fri Jun 14 17:39:38 UTC 2019
On 6/13/19 5:49 PM, John Hubbard wrote:
> On 6/13/19 5:11 PM, Ralph Campbell wrote:
>> In nouveau_dmem_pages_alloc(), the drm->dmem->mutex is unlocked before
>> calling nouveau_dmem_chunk_alloc().
>> Reacquire the lock before continuing to the next page.
>>
>> Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
>> ---
>>
>> I found this while testing Jason Gunthorpe's hmm tree but this is
>> independent of those changes. I guess it could go through
>> David Airlie's tree for nouveau or Jason's tree.
>>
>
> Hi Ralph,
>
> btw, was this the fix for the crash you were seeing? It might be nice to
> mention in the commit description, if you are seeing real symptoms.
>
>
>> drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> index 27aa4e72abe9..00f7236af1b9 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> @@ -379,9 +379,10 @@ nouveau_dmem_pages_alloc(struct nouveau_drm *drm,
>> ret = nouveau_dmem_chunk_alloc(drm);
>> if (ret) {
>> if (c)
>> - break;
>
> Actually, the pre-existing code is a little concerning. Your change preserves
> the behavior, but it seems questionable to be doing a "return 0" (whether
> via the above break, or your change) when it's in this partially allocated
> state. It's reporting success when it only allocates part of what was requested,
> and it doesn't fill in the pages array either.
>
>
>
>> + return 0;
>> return ret;
>> }
>> + mutex_lock(&drm->dmem->mutex);
>> continue;
>> }
>>
>>
>
> The above comment is about pre-existing potential problems, but your patch itself
> looks correct, so:
>
> Reviewed-by: John Hubbard <jhubbard at nvidia.com>
>
>
> thanks,
>
The crash was the NULL pointer bug in Christoph's patch #10.
I sent a separate reply for that.
Below is the console output I got, then I made the changes just based on
code inspection. Do you think I should include it in the change log?
As for the "return 0", If you follow the call chain,
nouveau_dmem_pages_alloc() is only ever called for one page so this
currently "works" but I agree it is a bit of a time bomb. There are a
number of other bugs that I can see that need fixing but I think those
should be separate patches.
[ 1294.871933] =====================================
[ 1294.876656] WARNING: bad unlock balance detected!
[ 1294.881375] 5.2.0-rc3+ #5 Not tainted
[ 1294.885048] -------------------------------------
[ 1294.889773] test-malloc-vra/6299 is trying to release lock
(&drm->dmem->mutex) at:
[ 1294.897482] [<ffffffffa01a220f>]
nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1294.905782] but there are no more locks to release!
[ 1294.910690]
[ 1294.910690] other info that might help us debug this:
[ 1294.917249] 1 lock held by test-malloc-vra/6299:
[ 1294.921881] #0: 0000000016e10454 (&mm->mmap_sem#2){++++}, at:
nouveau_svmm_bind+0x142/0x210 [nouveau]
[ 1294.931313]
[ 1294.931313] stack backtrace:
[ 1294.935702] CPU: 4 PID: 6299 Comm: test-malloc-vra Not tainted
5.2.0-rc3+ #5
[ 1294.942786] Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1401 05/21/2018
[ 1294.949590] Call Trace:
[ 1294.952059] dump_stack+0x7c/0xc0
[ 1294.955469] ? nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1294.962213] print_unlock_imbalance_bug.cold.52+0xca/0xcf
[ 1294.967641] lock_release+0x306/0x380
[ 1294.971383] ? nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1294.978089] ? lock_downgrade+0x2d0/0x2d0
[ 1294.982121] ? find_held_lock+0xac/0xd0
[ 1294.985979] __mutex_unlock_slowpath+0x8f/0x3f0
[ 1294.990540] ? wait_for_completion+0x230/0x230
[ 1294.995002] ? rwlock_bug.part.2+0x60/0x60
[ 1294.999197] nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1295.005751] ? page_mapping+0x98/0x110
[ 1295.009511] migrate_vma+0xa74/0x1090
[ 1295.013186] ? move_to_new_page+0x480/0x480
[ 1295.017400] ? __kmalloc+0x153/0x300
[ 1295.021052] ? nouveau_dmem_migrate_vma+0xd8/0x1e0 [nouveau]
[ 1295.026796] nouveau_dmem_migrate_vma+0x157/0x1e0 [nouveau]
[ 1295.032466] ? nouveau_dmem_init+0x490/0x490 [nouveau]
[ 1295.037612] ? vmacache_find+0xc2/0x110
[ 1295.041537] nouveau_svmm_bind+0x1b4/0x210 [nouveau]
[ 1295.046583] ? nouveau_svm_fault+0x13e0/0x13e0 [nouveau]
[ 1295.051912] drm_ioctl_kernel+0x14d/0x1a0
[ 1295.055930] ? drm_setversion+0x330/0x330
[ 1295.059971] drm_ioctl+0x308/0x530
[ 1295.063384] ? drm_version+0x150/0x150
[ 1295.067153] ? find_held_lock+0xac/0xd0
[ 1295.070996] ? __pm_runtime_resume+0x3f/0xa0
[ 1295.075285] ? mark_held_locks+0x29/0xa0
[ 1295.079230] ? _raw_spin_unlock_irqrestore+0x3c/0x50
[ 1295.084232] ? lockdep_hardirqs_on+0x17d/0x250
[ 1295.088768] nouveau_drm_ioctl+0x9a/0x100 [nouveau]
[ 1295.093661] do_vfs_ioctl+0x137/0x9a0
[ 1295.097341] ? ioctl_preallocate+0x140/0x140
[ 1295.101623] ? match_held_lock+0x1b/0x230
[ 1295.105646] ? match_held_lock+0x1b/0x230
[ 1295.109660] ? find_held_lock+0xac/0xd0
[ 1295.113512] ? __do_page_fault+0x324/0x630
[ 1295.117617] ? lock_downgrade+0x2d0/0x2d0
[ 1295.121648] ? mark_held_locks+0x79/0xa0
[ 1295.125583] ? handle_mm_fault+0x352/0x430
[ 1295.129687] ksys_ioctl+0x60/0x90
[ 1295.133020] ? mark_held_locks+0x29/0xa0
[ 1295.136964] __x64_sys_ioctl+0x3d/0x50
[ 1295.140726] do_syscall_64+0x68/0x250
[ 1295.144400] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1295.149465] RIP: 0033:0x7f1a3495809b
[ 1295.153053] Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
[ 1295.171850] RSP: 002b:00007ffef7ed1358 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 1295.179451] RAX: ffffffffffffffda RBX: 00007ffef7ed1628 RCX:
00007f1a3495809b
[ 1295.186601] RDX: 00007ffef7ed13b0 RSI: 0000000040406449 RDI:
0000000000000004
[ 1295.193759] RBP: 00007ffef7ed13b0 R08: 0000000000000000 R09:
000000000157e770
[ 1295.200917] R10: 000000000151c010 R11: 0000000000000246 R12:
0000000040406449
[ 1295.208083] R13: 0000000000000004 R14: 0000000000000000 R15:
0000000000000000
More information about the Nouveau
mailing list