[PATCH] drm/amdkfd: Fixed kfd_process cleanup on module exit.

Wed Mar 8 16:38:07 UTC 2023

On 2023-03-08 11:20, Christian König wrote:
> Am 08.03.23 um 17:17 schrieb Felix Kuehling:
>> On 2023-03-08 04:07, Christian König wrote:
>>> Am 07.03.23 um 16:28 schrieb Belanger, David:
>>>> [AMD Official Use Only - General]
>>>>
>>>>
>>>> The test case is a python program that will load the driver, do 
>>>> some operations, then unload the driver.
>>>
>>> What do you mean with unloading the driver? Removing the module? Or 
>>> destroying the device?
>>>
>>>> When the driver exists, there is still the python process space 
>>>> around holding on the address space.
>>>> When the python process space exits, the mmu_notifier gets called 
>>>> but the driver has already been unloaded.
>>>>
>>>> The goal of the fix is to address case where there could be 
>>>> outstanding address space / worker threads for process
>>>> cleanup that needs to be cleared/completed at exit time.
>>>
>>> Yeah and when the module is unloaded this is a completely futile 
>>> effort.
>>>
>>> The general upstream approach is to take references on the struct 
>>> device and module and prevent unloading as long as those references 
>>> exists.
>>
>> That's not how it always works. In case of RCU callbacks, the 
>> documented strategy is to use rcu_barrier in the module exit function 
>> to ensure the grace period and all callbacks have completed 
>> (https://www.kernel.org/doc/html/latest/RCU/rcubarrier.html). 
>> mmu_notifier_synchronize is meant to do something similar for pending 
>> mmu_notifier_put work 
>> (https://elixir.bootlin.com/linux/v6.2.2/source/mm/mmu_notifier.c#L1116).
>>
>> But this implies that we need to call mmu_notifier_put for all the 
>> MMU notifiers registered by the module first. I think closing 
>> /dev/kfd drops the module reference count, but the MMU notifiers we 
>> register for process cleanup persist until the address space is 
>> destroyed. We need to trigger that cleanup for any processes that 
>> still exist in that state when the module is unloaded. Or we need to 
>> find a way to increment the module refcount for every process that 
>> registers a KFD cleanup MMU notifier.
>
> The later is what I've meant. Cleaning up when the module unloads is 
> certainly possible as well, but harder to get right.

I think we can get the cleanup right. I suggested a strategy to David in 
my code review.

>
> And I don't really see an use case that we should do the cleanup way.

I'm not sure it's a question of use cases. I see it more as a risk 
trade-off. If we manually add module refcounts for our cleanup notifiers 
(try_module_get(THIS_MODULE)/module_put(THIS_MODULE)), there is a risk 
of leaks that could prevent module unloading, or underflows that could 
allow the module to be unloaded too early.

I guess this particular test (app trying to unload the module after 
using KFD) would just fail if we add module refcounts. But I agree that 
this is not a valid usecase.

Regards,
   Felix

>
> Regards,
> Christian.
>
>>
>> Regards,
>>   Felix
>>
>>
>>>
>>>
>>> The device might be non-functional any more (because for example of 
>>> hot plug), but the driver should never be unloaded before the python 
>>> program exits.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Regards,
>>>> David B.
>>>>
>>>>> -----Original Message-----
>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>> Sent: Tuesday, March 7, 2023 2:05 AM
>>>>> To: Belanger, David <David.Belanger at amd.com>; amd-
>>>>> gfx at lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/amdkfd: Fixed kfd_process cleanup on module
>>>>> exit.
>>>>>
>>>>> Am 06.03.23 um 22:58 schrieb David Belanger:
>>>>>> Handle case when module is unloaded (kfd_exit) before a process 
>>>>>> space
>>>>>> (mm_struct) is released.
>>>>> Well that should never ever happen in the first place. It sounds 
>>>>> like we are
>>>>> missing grabbing module references.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Signed-off-by: David Belanger <david.belanger at amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdkfd/kfd_module.c  |  4 ++
>>>>>>    drivers/gpu/drm/amd/amdkfd/kfd_process.c | 57
>>>>> ++++++++++++++++++++++++
>>>>>>    2 files changed, 61 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>>>>>> index 09b966dc3768..8ef4bd9e4f7d 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>>>>>> @@ -26,6 +26,9 @@
>>>>>>    #include "kfd_priv.h"
>>>>>>    #include "amdgpu_amdkfd.h"
>>>>>>
>>>>>> +void kfd_cleanup_processes(void);
>>>>>> +
>>>>>> +
>>>>>>    static int kfd_init(void)
>>>>>>    {
>>>>>>        int err;
>>>>>> @@ -77,6 +80,7 @@ static int kfd_init(void)
>>>>>>
>>>>>>    static void kfd_exit(void)
>>>>>>    {
>>>>>> +    kfd_cleanup_processes();
>>>>>>        kfd_debugfs_fini();
>>>>>>        kfd_process_destroy_wq();
>>>>>>        kfd_procfs_shutdown();
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> index ebabe92f7edb..b5b28a32639d 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> @@ -1181,6 +1181,17 @@ static void 
>>>>>> kfd_process_notifier_release(struct
>>>>> mmu_notifier *mn,
>>>>>>            return;
>>>>>>
>>>>>>        mutex_lock(&kfd_processes_mutex);
>>>>>> +    /*
>>>>>> +     * Do early return if p is not in the table.
>>>>>> +     *
>>>>>> +     * This could potentially happen if this function is called 
>>>>>> concurrently
>>>>>> +     * by mmu_notifier and by kfd_cleanup_pocesses.
>>>>>> +     *
>>>>>> +     */
>>>>>> +    if (!hash_hashed(&p->kfd_processes)) {
>>>>>> +        mutex_unlock(&kfd_processes_mutex);
>>>>>> +        return;
>>>>>> +    }
>>>>>>        hash_del_rcu(&p->kfd_processes);
>>>>>>        mutex_unlock(&kfd_processes_mutex);
>>>>>>        synchronize_srcu(&kfd_processes_srcu);
>>>>>> @@ -1200,6 +1211,52 @@ static const struct mmu_notifier_ops
>>>>> kfd_process_mmu_notifier_ops = {
>>>>>>        .free_notifier = kfd_process_free_notifier,
>>>>>>    };
>>>>>>
>>>>>> +
>>>>>> +void kfd_cleanup_processes(void)
>>>>>> +{
>>>>>> +    struct kfd_process *p;
>>>>>> +    unsigned int temp;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Iterate over remaining processes in table, calling 
>>>>>> notifier release
>>>>>> +     * to free up notifier and process resources.
>>>>>> +     *
>>>>>> +     * This code handles the case when driver is unloaded before 
>>>>>> all
>>>>> mm_struct
>>>>>> +     * are released.
>>>>>> +     */
>>>>>> +    int idx = srcu_read_lock(&kfd_processes_srcu);
>>>>>> +
>>>>>> +    hash_for_each_rcu(kfd_processes_table, temp, p, 
>>>>>> kfd_processes) {
>>>>>> +        if (p) {
>>>>>> +            /*
>>>>>> +             * Obtain a reference on p to avoid a late
>>>>> mmu_notifier release
>>>>>> +             * call triggering freeing the process.
>>>>>> +             */
>>>>>> +
>>>>>> +            kref_get(&p->ref);
>>>>>> +
>>>>>> +            srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>>> +
>>>>>> + kfd_process_notifier_release(&p->mmu_notifier, p-
>>>>>> mm);
>>>>>> +
>>>>>> +            kfd_unref_process(p);
>>>>>> +
>>>>>> +            idx = srcu_read_lock(&kfd_processes_srcu);
>>>>>> +        }
>>>>>> +    }
>>>>>> +    srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Must be called after all mmu_notifier_put are done and 
>>>>>> before
>>>>>> +     * kfd_process_wq is released.
>>>>>> +     *
>>>>>> +     * Ensures that all outstanding free_notifier gets called, 
>>>>>> triggering the
>>>>> release
>>>>>> +     * of the process.
>>>>>> +     */
>>>>>> +    mmu_notifier_synchronize();
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>>    static int kfd_process_init_cwsr_apu(struct kfd_process *p, 
>>>>>> struct file
>>>>> *filep)
>>>>>>    {
>>>>>>        unsigned long  offset;
>>>
>