[PATCH v2] drm/amdkfd: Correct svm prange overlapping handling at svm_range_set_attr ioctl

Thu Jul 18 05:25:26 UTC 2024

On 7/17/2024 6:02 PM, Felix Kuehling wrote:
>
> On 2024-06-26 11:06, Xiaogang.Chen wrote:
>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>
>> When user adds new vm range that has overlapping with existing svm 
>> pranges
>> current kfd creats a cloned pragne and split it, then replaces 
>> original prange
>> by it. That destroy original prange locks and the cloned prange locks 
>> do not
>> inherit original prange lock contexts. This may cause issue if code 
>> still need
>> use these locks. In general we should keep using original prange, 
>> update its
>> internal data that got changed during split, then free the cloned 
>> prange.
>
> While splitting/updating ranges, the svms->lock needs to be held. You 
> cannot have concurrent threads accessing ranges while we're updating 
> the range list. If that is a possibility, you have a race condition 
> anyway. You also can't split, shrink or otherwise modify a range while 
> someone else is accessing that range. So keeping the same locking 
> context is a non-issue.
>
We do hold svms->lock when call svm_range_add. The patch does not mean 
to fix race condition. It keeps original svm range context when we need 
split/update it. The current implementation "duplicate" a new one, then 
destroy original svm range.

>>
>> This patch change vm range overlaping handling that does not remove 
>> existing
>> pranges, instead updates it for split and keeps its locks alive.
>
> It sounds like you're trying to fix a problem here. Is this an actual 
> or a hypothetical problem?
>
It is not a problem in reality so far. It uses a way that not destroy 
original svm range when split/update it. So keep its locks(prange->lock, 
prange->migrate_mutex) context. The so called "clone svm range" create 
new locks that are not related to original locks. I think that is not 
reasonable.

>
>>
>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 112 ++++++++++++++++++++-------
>>   1 file changed, 82 insertions(+), 30 deletions(-)
>
> Just looking at the number of added and removed lines, this doesn't 
> look like a simplification. I really question the justification for 
> this change. If it makes the code more complicated or less robust, 
> without a good reason, then it's not a good change.
>
As above it does not make code simpler or more complicated. It 
split/update svm range directly on prange data, not destroy original 
prange, then generate a new one.

>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 407636a68814..a66b8c96ee14 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1967,7 +1967,8 @@ svm_range_evict(struct svm_range *prange, 
>> struct mm_struct *mm,
>>       return r;
>>   }
>>   -static struct svm_range *svm_range_clone(struct svm_range *old)
>> +/* create a prange that has same range/size/addr etc info as old */
>> +static struct svm_range *svm_range_duplicate(struct svm_range *old)
>
> This seems like an unnecessary name change. "clone" and "duplicate" 
> mean the same thing. But "clone" is shorter.
>
'clone" should mean identical to existing one. Here we use some items 
from existing svm_range to build new one, the new one is not totally 
same as existing one, such as the prange->lock is not same between old 
and new svm range.

>>   {
>>       struct svm_range *new;
>>   @@ -1999,6 +2000,25 @@ static struct svm_range 
>> *svm_range_clone(struct svm_range *old)
>>       return new;
>>   }
>>   +/* copy range/size/addr info from src to dst prange */
>> +static void svm_range_copy(struct svm_range *dst, struct svm_range 
>> *src)
>> +{
>> +    dst->npages = src->npages;
>> +    dst->start = src->start;
>> +    dst->last = src->last;
>> +
>> +    dst->vram_pages = src->vram_pages;
>> +    dst->offset = src->offset;
>> +
>> +    for (int i = 0; i < MAX_GPU_INSTANCE; i++) {
>> +        if (!src->dma_addr[i])
>> +            continue;
>> +
>> +         memcpy(dst->dma_addr[i], src->dma_addr[i],
>> +                src->npages * sizeof(*src->dma_addr[i]));
>
> This does not reallocate/resize the dma_addr arrays. Reallocating 
> these arrays can't be done here, because this function is not allowed 
> to fail. That's one reason to use the clone instead of modifying the 
> existing range.

ok, maybe I can swap dma_addr between dst and src svm range here, then 
original dma_addr array got freed when free src svm range.

This patch purpose is to not destroy existing svm range when 
split/update it since svm_range_add has no mean to remove any existing 
svm range, even part of it. svm_range_add split or update existing svm 
ranges.  Maybe we need decide if this idea is valid at first?

Regards

Xiaogang

>
> Regards,
>   Felix
>
>
>> +    }
>> +}
>> +
>>   void svm_range_set_max_pages(struct amdgpu_device *adev)
>>   {
>>       uint64_t max_pages;
>> @@ -2057,20 +2077,19 @@ svm_range_split_new(struct svm_range_list 
>> *svms, uint64_t start, uint64_t last,
>>    * @attrs: array of attributes
>>    * @update_list: output, the ranges need validate and update GPU 
>> mapping
>>    * @insert_list: output, the ranges need insert to svms
>> - * @remove_list: output, the ranges are replaced and need remove 
>> from svms
>>    * @remap_list: output, remap unaligned svm ranges
>>    *
>>    * Check if the virtual address range has overlap with any existing 
>> ranges,
>>    * split partly overlapping ranges and add new ranges in the gaps. 
>> All changes
>>    * should be applied to the range_list and interval tree 
>> transactionally. If
>>    * any range split or allocation fails, the entire update fails. 
>> Therefore any
>> - * existing overlapping svm_ranges are cloned and the original 
>> svm_ranges left
>> + * existing overlapping svm_ranges are duplicated and the original 
>> svm_ranges left
>>    * unchanged.
>>    *
>> - * If the transaction succeeds, the caller can update and insert 
>> clones and
>> - * new ranges, then free the originals.
>> + * If the transaction succeeds, the caller can update and insert 
>> split ranges and
>> + * new ranges.
>>    *
>> - * Otherwise the caller can free the clones and new ranges, while 
>> the old
>> + * Otherwise the caller can free the duplicated and new ranges, 
>> while the old
>>    * svm_ranges remain unchanged.
>>    *
>>    * Context: Process context, caller must hold svms->lock
>> @@ -2082,7 +2101,7 @@ static int
>>   svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>>             uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs,
>>             struct list_head *update_list, struct list_head 
>> *insert_list,
>> -          struct list_head *remove_list, struct list_head *remap_list)
>> +          struct list_head *remap_list)
>>   {
>>       unsigned long last = start + size - 1UL;
>>       struct svm_range_list *svms = &p->svms;
>> @@ -2090,13 +2109,14 @@ svm_range_add(struct kfd_process *p, uint64_t 
>> start, uint64_t size,
>>       struct svm_range *prange;
>>       struct svm_range *tmp;
>>       struct list_head new_list;
>> +    struct list_head modify_list;
>>       int r = 0;
>>         pr_debug("svms 0x%p [0x%llx 0x%lx]\n", &p->svms, start, last);
>>         INIT_LIST_HEAD(update_list);
>>       INIT_LIST_HEAD(insert_list);
>> -    INIT_LIST_HEAD(remove_list);
>> +    INIT_LIST_HEAD(&modify_list);
>>       INIT_LIST_HEAD(&new_list);
>>       INIT_LIST_HEAD(remap_list);
>>   @@ -2117,35 +2137,41 @@ svm_range_add(struct kfd_process *p, 
>> uint64_t start, uint64_t size,
>>               /* nothing to do */
>>           } else if (node->start < start || node->last > last) {
>>               /* node intersects the update range and its attributes
>> -             * will change. Clone and split it, apply updates only
>> +             * will change. duplicate and split it, apply updates only
>>                * to the overlapping part
>>                */
>> -            struct svm_range *old = prange;
>> +            /* prange_dup is a temperal prange that holds size and 
>> addr info
>> +             * updates of pragne after split
>> +             */
>> +            struct svm_range *prange_dup;
>>   -            prange = svm_range_clone(old);
>> -            if (!prange) {
>> +            prange_dup = svm_range_duplicate(prange);
>> +            if (!prange_dup) {
>>                   r = -ENOMEM;
>>                   goto out;
>>               }
>>   -            list_add(&old->update_list, remove_list);
>> -            list_add(&prange->list, insert_list);
>> -            list_add(&prange->update_list, update_list);
>> -
>> +            /* split prange_dup */
>>               if (node->start < start) {
>>                   pr_debug("change old range start\n");
>> -                r = svm_range_split_head(prange, start,
>> +                r = svm_range_split_head(prange_dup, start,
>>                                insert_list, remap_list);
>>                   if (r)
>>                       goto out;
>>               }
>>               if (node->last > last) {
>>                   pr_debug("change old range last\n");
>> -                r = svm_range_split_tail(prange, last,
>> +                r = svm_range_split_tail(prange_dup, last,
>>                                insert_list, remap_list);
>>                   if (r)
>>                       goto out;
>>               }
>> +
>> +            /* split success, insert_list has new head/tail pranges */
>> +            /* move prange from svm list to modify list */
>> +            list_move_tail(&prange->list, &modify_list);
>> +            /* put prange_dup at pragne->update_list */
>> +            list_add(&prange_dup->list, &prange->update_list);
>>           } else {
>>               /* The node is contained within start..last,
>>                * just update it
>> @@ -2178,8 +2204,38 @@ svm_range_add(struct kfd_process *p, uint64_t 
>> start, uint64_t size,
>>               svm_range_free(prange, false);
>>           list_for_each_entry_safe(prange, tmp, &new_list, list)
>>               svm_range_free(prange, true);
>> +
>> +        list_for_each_entry_safe(prange, tmp, &modify_list, list) {
>> +            struct svm_range *prange_dup;
>> +
>> +            /* free pragne_dup that is associated with prange on 
>> modify_list */
>> +            prange_dup = list_first_entry(&prange->update_list, 
>> struct svm_range, list);
>> +            if (prange_dup)
>> +                svm_range_free(prange_dup, false);
>> +
>> +            INIT_LIST_HEAD(&prange->update_list);
>> +            /* put prange back to svm list */
>> +            list_move_tail(&prange->list, &svms->list);
>> +        }
>>       } else {
>>           list_splice(&new_list, insert_list);
>> +
>> +        list_for_each_entry_safe(prange, tmp, &modify_list, list) {
>> +            struct svm_range *prange_dup;
>> +
>> +            prange_dup = list_first_entry(&prange->update_list, 
>> struct svm_range, list);
>> +            if (prange_dup) {
>> +                /* update prange from prange_dup */
>> +                svm_range_copy(prange, prange_dup);
>> +                /* release temporal pragne_dup */
>> +                svm_range_free(prange_dup, false);
>> +            }
>> +            INIT_LIST_HEAD(&prange->update_list);
>> +
>> +            /* move prange from modify_list to insert_list and 
>> update_list*/
>> +            list_move_tail(&prange->list, insert_list);
>> +            list_add(&prange->update_list, update_list);
>> +        }
>>       }
>>         return r;
>> @@ -3533,7 +3589,6 @@ svm_range_set_attr(struct kfd_process *p, 
>> struct mm_struct *mm,
>>       struct amdkfd_process_info *process_info = p->kgd_process_info;
>>       struct list_head update_list;
>>       struct list_head insert_list;
>> -    struct list_head remove_list;
>>       struct list_head remap_list;
>>       struct svm_range_list *svms;
>>       struct svm_range *prange;
>> @@ -3566,7 +3621,7 @@ svm_range_set_attr(struct kfd_process *p, 
>> struct mm_struct *mm,
>>         /* Add new range and split existing ranges as needed */
>>       r = svm_range_add(p, start, size, nattr, attrs, &update_list,
>> -              &insert_list, &remove_list, &remap_list);
>> +              &insert_list, &remap_list);
>>       if (r) {
>>           mutex_unlock(&svms->lock);
>>           mmap_write_unlock(mm);
>> @@ -3574,21 +3629,18 @@ svm_range_set_attr(struct kfd_process *p, 
>> struct mm_struct *mm,
>>       }
>>       /* Apply changes as a transaction */
>>       list_for_each_entry_safe(prange, next, &insert_list, list) {
>> -        svm_range_add_to_svms(prange);
>> -        svm_range_add_notifier_locked(mm, prange);
>> +        /* prange can be new or existing range, put it at svms->list */
>> +        list_move_tail(&prange->list, &prange->svms->list);
>> +        /* update prange at interval trees: svms->objects and
>> +         * mm interval notifier tree
>> +         */
>> +        svm_range_update_notifier_and_interval_tree(mm, prange);
>> +
>>       }
>>       list_for_each_entry(prange, &update_list, update_list) {
>>           svm_range_apply_attrs(p, prange, nattr, attrs, 
>> &update_mapping);
>>           /* TODO: unmap ranges from GPU that lost access */
>>       }
>> -    list_for_each_entry_safe(prange, next, &remove_list, update_list) {
>> -        pr_debug("unlink old 0x%p prange 0x%p [0x%lx 0x%lx]\n",
>> -             prange->svms, prange, prange->start,
>> -             prange->last);
>> -        svm_range_unlink(prange);
>> -        svm_range_remove_notifier(prange);
>> -        svm_range_free(prange, false);
>> -    }
>>         mmap_write_downgrade(mm);
>>       /* Trigger migrations and revalidate and map to GPUs as needed. If