[PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU

Felix Kuehling felix.kuehling at amd.com
Fri Aug 11 20:06:58 UTC 2023


On 2023-08-11 15:11, James Zhu wrote:
> update_list could be big in list_for_each_entry(prange, &update_list, update_list),
> mmap_read_lock(mm) is kept hold all the time, adding schedule() can remove
> RCU stall on CPU for this case.
>
> RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu]

You're just showing the backtrace here, but not what the problem is. Can 
you include more context, e.g. the message that says something about a 
stall?


> Code: 00 00 00 bf 00 02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65 48 8b 14 25 00 bd 01 00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89> 42 2c e8 51 dd 2d e1 48 8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8
> RSP: 0018:ffffc9000ffd7b10 EFLAGS: 00000206
> RAX: 0000000000000100 RBX: ffff88c493968d80 RCX: ffff88d1a6469b18
> RDX: ffff88e18ef1ec80 RSI: ffffc9000ffd7be0 RDI: ffff88c493968d38
> RBP: 000000000003062e R08: 000000003042f000 R09: 000000003062efff
> R10: 0000000000001000 R11: ffff88c1ad255000 R12: 000000000003042f
> R13: ffff88c493968c00 R14: ffffc9000ffd7be0 R15: ffff88c493968c00
> __mmu_notifier_invalidate_range_start+0x132/0x1d0
> ? amdgpu_vm_bo_update+0x3fd/0x520 [amdgpu]
> migrate_vma_setup+0x6c7/0x8f0
> ? kfd_smi_event_migration_start+0x5f/0x80 [amdgpu]
> svm_migrate_ram_to_vram+0x14e/0x580 [amdgpu]
> svm_range_set_attr+0xe34/0x11a0 [amdgpu]
> kfd_ioctl+0x271/0x4e0 [amdgpu]
> ? kfd_ioctl_set_xnack_mode+0xd0/0xd0 [amdgpu]
> __x64_sys_ioctl+0x92/0xd0
>
> Signed-off-by: James Zhu <James.Zhu at amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 113fd11aa96e..9f2d48ade7fa 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -3573,6 +3573,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
>   		r = svm_range_trigger_migration(mm, prange, &migrated);
>   		if (r)
>   			goto out_unlock_range;
> +		schedule();

I'm not sure that unconditionally scheduling here in every loop 
iteration is a good solution. This could lead to performance degradation 
when there are many small ranges. I think a better option is to call 
cond_resched. That would only reschedule only "if necessary", though I 
haven't quite figured out the criteria for rescheduling being necessary.

Regards,
   Felix


>   
>   		if (migrated && (!p->xnack_enabled ||
>   		    (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&


More information about the amd-gfx mailing list