[PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU
Chen, Xiaogang
xiaogang.chen at amd.com
Fri Aug 11 21:12:38 UTC 2023
I know the original jira ticket. The system got RCU cpu stall, then
kernel enter panic, then no response or ssh. This patch let prange list
update task yield cpu after each range update. It can prevent task
holding mm lock too long. mm lock is rw_semophore, not RCU mechanism.
Can you explain how that can prevent RCU cpu stall in this case?
Regards
Xiaogang
On 8/11/2023 2:11 PM, James Zhu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> update_list could be big in list_for_each_entry(prange, &update_list, update_list),
> mmap_read_lock(mm) is kept hold all the time, adding schedule() can remove
> RCU stall on CPU for this case.
>
> RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu]
> Code: 00 00 00 bf 00 02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65 48 8b 14 25 00 bd 01 00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89> 42 2c e8 51 dd 2d e1 48 8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8
> RSP: 0018:ffffc9000ffd7b10 EFLAGS: 00000206
> RAX: 0000000000000100 RBX: ffff88c493968d80 RCX: ffff88d1a6469b18
> RDX: ffff88e18ef1ec80 RSI: ffffc9000ffd7be0 RDI: ffff88c493968d38
> RBP: 000000000003062e R08: 000000003042f000 R09: 000000003062efff
> R10: 0000000000001000 R11: ffff88c1ad255000 R12: 000000000003042f
> R13: ffff88c493968c00 R14: ffffc9000ffd7be0 R15: ffff88c493968c00
> __mmu_notifier_invalidate_range_start+0x132/0x1d0
> ? amdgpu_vm_bo_update+0x3fd/0x520 [amdgpu]
> migrate_vma_setup+0x6c7/0x8f0
> ? kfd_smi_event_migration_start+0x5f/0x80 [amdgpu]
> svm_migrate_ram_to_vram+0x14e/0x580 [amdgpu]
> svm_range_set_attr+0xe34/0x11a0 [amdgpu]
> kfd_ioctl+0x271/0x4e0 [amdgpu]
> ? kfd_ioctl_set_xnack_mode+0xd0/0xd0 [amdgpu]
> __x64_sys_ioctl+0x92/0xd0
>
> Signed-off-by: James Zhu <James.Zhu at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 113fd11aa96e..9f2d48ade7fa 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -3573,6 +3573,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
> r = svm_range_trigger_migration(mm, prange, &migrated);
> if (r)
> goto out_unlock_range;
> + schedule();
>
> if (migrated && (!p->xnack_enabled ||
> (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
> --
> 2.34.1
>
More information about the amd-gfx
mailing list