[PATCH] drm/amdkfd: Fixed kfd_process cleanup on module exit.

Mon Mar 6 23:45:57 UTC 2023

Am 2023-03-06 um 16:58 schrieb David Belanger:
> Handle case when module is unloaded (kfd_exit) before a process space
> (mm_struct) is released.
>
> Signed-off-by: David Belanger <david.belanger at amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_module.c  |  4 ++
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 57 ++++++++++++++++++++++++
>   2 files changed, 61 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> index 09b966dc3768..8ef4bd9e4f7d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> @@ -26,6 +26,9 @@
>   #include "kfd_priv.h"
>   #include "amdgpu_amdkfd.h"
>   
> +void kfd_cleanup_processes(void);
> +
> +
>   static int kfd_init(void)
>   {
>   	int err;
> @@ -77,6 +80,7 @@ static int kfd_init(void)
>   
>   static void kfd_exit(void)
>   {
> +	kfd_cleanup_processes();
>   	kfd_debugfs_fini();
>   	kfd_process_destroy_wq();
>   	kfd_procfs_shutdown();
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index ebabe92f7edb..b5b28a32639d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1181,6 +1181,17 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
>   		return;
>   
>   	mutex_lock(&kfd_processes_mutex);
> +	/*
> +	 * Do early return if p is not in the table.
> +	 *
> +	 * This could potentially happen if this function is called concurrently
> +	 * by mmu_notifier and by kfd_cleanup_pocesses.
> +	 *
> +	 */
> +	if (!hash_hashed(&p->kfd_processes)) {
> +		mutex_unlock(&kfd_processes_mutex);
> +		return;
> +	}
>   	hash_del_rcu(&p->kfd_processes);
>   	mutex_unlock(&kfd_processes_mutex);
>   	synchronize_srcu(&kfd_processes_srcu);
> @@ -1200,6 +1211,52 @@ static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
>   	.free_notifier = kfd_process_free_notifier,
>   };
>   
> +
> +void kfd_cleanup_processes(void)
> +{
> +	struct kfd_process *p;
> +	unsigned int temp;
> +
> +	/*
> +	 * Iterate over remaining processes in table, calling notifier release
> +	 * to free up notifier and process resources.
> +	 *
> +	 * This code handles the case when driver is unloaded before all mm_struct
> +	 * are released.
> +	 */
> +	int idx = srcu_read_lock(&kfd_processes_srcu);
> +
> +	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> +		if (p) {
> +			/*
> +			 * Obtain a reference on p to avoid a late mmu_notifier release
> +			 * call triggering freeing the process.
> +			 */
> +
> +			kref_get(&p->ref);
> +
> +			srcu_read_unlock(&kfd_processes_srcu, idx);

I don't think it's valid to drop the lock in the middle of the loop. You 
need to hold the lock throughout the loop to protect the consistency of 
the data structure. I guess you're doing this because you got a deadlock 
from synchronize_srcu in kfd_process_notifier_release.

> +
> +			kfd_process_notifier_release(&p->mmu_notifier, p->mm);

This calls hash_del_rcu to remove the process from the hash table. To 
make this safe, you need to hold the kfd_processes_mutex.

Since this is outside the RCU read lock, the entry in the hlist can be 
freed, which can cause problems when the hash_for_each_rcu loop tries to 
find the next entry in the hlist.

> +
> +			kfd_unref_process(p);

This schedules a worker that can free the process at any time, which 
also frees the hlist_node p->kfd_processes that is still needed by 
hash_for_each_rcu to find the next entry. If you're unlucky, the worker 
will be scheduled before the next loop iteration, and you can get a 
kernel oops.

I suggest a safer strategy: Make a loop using hash_for_each_safe to move 
all the processes into a new hlist. You can do that while holding the 
kfd_processes_mutex, so you can safely remove all entries from the hash 
table and move them into your own private hlist.

Then you can safely release all the processes from your private hlist 
without having to hold either the srcu read lock or the mutex.

Regards,
   Felix

> +
> +			idx = srcu_read_lock(&kfd_processes_srcu);
> +		}
> +	}
> +	srcu_read_unlock(&kfd_processes_srcu, idx);
> +
> +	/*
> +	 * Must be called after all mmu_notifier_put are done and before
> +	 * kfd_process_wq is released.
> +	 *
> +	 * Ensures that all outstanding free_notifier gets called, triggering the release
> +	 * of the process.
> +	 */
> +	mmu_notifier_synchronize();
> +}
> +
> +
>   static int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep)
>   {
>   	unsigned long  offset;