[PATCH v2] drm/amdkfd: Correct svm prange overlapping handling at svm_range_set_attr ioctl

Felix Kuehling felix.kuehling at amd.com
Thu Jul 18 20:54:52 UTC 2024



On 2024-07-18 1:25, Chen, Xiaogang wrote:
> 
> On 7/17/2024 6:02 PM, Felix Kuehling wrote:
>>
>> On 2024-06-26 11:06, Xiaogang.Chen wrote:
>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>
>>> When user adds new vm range that has overlapping with existing svm pranges
>>> current kfd creats a cloned pragne and split it, then replaces original prange
>>> by it. That destroy original prange locks and the cloned prange locks do not
>>> inherit original prange lock contexts. This may cause issue if code still need
>>> use these locks. In general we should keep using original prange, update its
>>> internal data that got changed during split, then free the cloned prange.
>>
>> While splitting/updating ranges, the svms->lock needs to be held. You cannot have concurrent threads accessing ranges while we're updating the range list. If that is a possibility, you have a race condition anyway. You also can't split, shrink or otherwise modify a range while someone else is accessing that range. So keeping the same locking context is a non-issue.
>>
> We do hold svms->lock when call svm_range_add. The patch does not mean to fix race condition. It keeps original svm range context when we need split/update it. The current implementation "duplicate" a new one, then destroy original svm range.
My point is, nobody can be using the lock context while we update the range list. Therefore it doesn't matter if we replace it with a new one.

> 
>>>
>>> This patch change vm range overlaping handling that does not remove existing
>>> pranges, instead updates it for split and keeps its locks alive.
>>
>> It sounds like you're trying to fix a problem here. Is this an actual or a hypothetical problem?
>>
> It is not a problem in reality so far. It uses a way that not destroy original svm range when split/update it. So keep its locks(prange->lock, prange->migrate_mutex) context. The so called "clone svm range" create new locks that are not related to original locks. I think that is not reasonable.

That's your opinion. In my opinion the current code is perferctly reasonable. There is no technical reason that the prange and its lock context must remain the same. I see no technical justification for a change that makes the code longer, more complex, and introduces the risk of regressions.

Regards,
  Felix

> 
>>
>>>
>>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 112 ++++++++++++++++++++-------
>>>   1 file changed, 82 insertions(+), 30 deletions(-)
>>
>> Just looking at the number of added and removed lines, this doesn't look like a simplification. I really question the justification for this change. If it makes the code more complicated or less robust, without a good reason, then it's not a good change.
>>
> As above it does not make code simpler or more complicated. It split/update svm range directly on prange data, not destroy original prange, then generate a new one.
> 
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index 407636a68814..a66b8c96ee14 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -1967,7 +1967,8 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
>>>       return r;
>>>   }
>>>   -static struct svm_range *svm_range_clone(struct svm_range *old)
>>> +/* create a prange that has same range/size/addr etc info as old */
>>> +static struct svm_range *svm_range_duplicate(struct svm_range *old)
>>
>> This seems like an unnecessary name change. "clone" and "duplicate" mean the same thing. But "clone" is shorter.
>>
> 'clone" should mean identical to existing one. Here we use some items from existing svm_range to build new one, the new one is not totally same as existing one, such as the prange->lock is not same between old and new svm range.
> 
>>>   {
>>>       struct svm_range *new;
>>>   @@ -1999,6 +2000,25 @@ static struct svm_range *svm_range_clone(struct svm_range *old)
>>>       return new;
>>>   }
>>>   +/* copy range/size/addr info from src to dst prange */
>>> +static void svm_range_copy(struct svm_range *dst, struct svm_range *src)
>>> +{
>>> +    dst->npages = src->npages;
>>> +    dst->start = src->start;
>>> +    dst->last = src->last;
>>> +
>>> +    dst->vram_pages = src->vram_pages;
>>> +    dst->offset = src->offset;
>>> +
>>> +    for (int i = 0; i < MAX_GPU_INSTANCE; i++) {
>>> +        if (!src->dma_addr[i])
>>> +            continue;
>>> +
>>> +         memcpy(dst->dma_addr[i], src->dma_addr[i],
>>> +                src->npages * sizeof(*src->dma_addr[i]));
>>
>> This does not reallocate/resize the dma_addr arrays. Reallocating these arrays can't be done here, because this function is not allowed to fail. That's one reason to use the clone instead of modifying the existing range.
> 
> ok, maybe I can swap dma_addr between dst and src svm range here, then original dma_addr array got freed when free src svm range.
> 
> This patch purpose is to not destroy existing svm range when split/update it since svm_range_add has no mean to remove any existing svm range, even part of it. svm_range_add split or update existing svm ranges.  Maybe we need decide if this idea is valid at first?
> 
> Regards
> 
> Xiaogang
> 
> 
>>
>> Regards,
>>   Felix
>>
>>
>>> +    }
>>> +}
>>> +
>>>   void svm_range_set_max_pages(struct amdgpu_device *adev)
>>>   {
>>>       uint64_t max_pages;
>>> @@ -2057,20 +2077,19 @@ svm_range_split_new(struct svm_range_list *svms, uint64_t start, uint64_t last,
>>>    * @attrs: array of attributes
>>>    * @update_list: output, the ranges need validate and update GPU mapping
>>>    * @insert_list: output, the ranges need insert to svms
>>> - * @remove_list: output, the ranges are replaced and need remove from svms
>>>    * @remap_list: output, remap unaligned svm ranges
>>>    *
>>>    * Check if the virtual address range has overlap with any existing ranges,
>>>    * split partly overlapping ranges and add new ranges in the gaps. All changes
>>>    * should be applied to the range_list and interval tree transactionally. If
>>>    * any range split or allocation fails, the entire update fails. Therefore any
>>> - * existing overlapping svm_ranges are cloned and the original svm_ranges left
>>> + * existing overlapping svm_ranges are duplicated and the original svm_ranges left
>>>    * unchanged.
>>>    *
>>> - * If the transaction succeeds, the caller can update and insert clones and
>>> - * new ranges, then free the originals.
>>> + * If the transaction succeeds, the caller can update and insert split ranges and
>>> + * new ranges.
>>>    *
>>> - * Otherwise the caller can free the clones and new ranges, while the old
>>> + * Otherwise the caller can free the duplicated and new ranges, while the old
>>>    * svm_ranges remain unchanged.
>>>    *
>>>    * Context: Process context, caller must hold svms->lock
>>> @@ -2082,7 +2101,7 @@ static int
>>>   svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>>>             uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs,
>>>             struct list_head *update_list, struct list_head *insert_list,
>>> -          struct list_head *remove_list, struct list_head *remap_list)
>>> +          struct list_head *remap_list)
>>>   {
>>>       unsigned long last = start + size - 1UL;
>>>       struct svm_range_list *svms = &p->svms;
>>> @@ -2090,13 +2109,14 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>>>       struct svm_range *prange;
>>>       struct svm_range *tmp;
>>>       struct list_head new_list;
>>> +    struct list_head modify_list;
>>>       int r = 0;
>>>         pr_debug("svms 0x%p [0x%llx 0x%lx]\n", &p->svms, start, last);
>>>         INIT_LIST_HEAD(update_list);
>>>       INIT_LIST_HEAD(insert_list);
>>> -    INIT_LIST_HEAD(remove_list);
>>> +    INIT_LIST_HEAD(&modify_list);
>>>       INIT_LIST_HEAD(&new_list);
>>>       INIT_LIST_HEAD(remap_list);
>>>   @@ -2117,35 +2137,41 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>>>               /* nothing to do */
>>>           } else if (node->start < start || node->last > last) {
>>>               /* node intersects the update range and its attributes
>>> -             * will change. Clone and split it, apply updates only
>>> +             * will change. duplicate and split it, apply updates only
>>>                * to the overlapping part
>>>                */
>>> -            struct svm_range *old = prange;
>>> +            /* prange_dup is a temperal prange that holds size and addr info
>>> +             * updates of pragne after split
>>> +             */
>>> +            struct svm_range *prange_dup;
>>>   -            prange = svm_range_clone(old);
>>> -            if (!prange) {
>>> +            prange_dup = svm_range_duplicate(prange);
>>> +            if (!prange_dup) {
>>>                   r = -ENOMEM;
>>>                   goto out;
>>>               }
>>>   -            list_add(&old->update_list, remove_list);
>>> -            list_add(&prange->list, insert_list);
>>> -            list_add(&prange->update_list, update_list);
>>> -
>>> +            /* split prange_dup */
>>>               if (node->start < start) {
>>>                   pr_debug("change old range start\n");
>>> -                r = svm_range_split_head(prange, start,
>>> +                r = svm_range_split_head(prange_dup, start,
>>>                                insert_list, remap_list);
>>>                   if (r)
>>>                       goto out;
>>>               }
>>>               if (node->last > last) {
>>>                   pr_debug("change old range last\n");
>>> -                r = svm_range_split_tail(prange, last,
>>> +                r = svm_range_split_tail(prange_dup, last,
>>>                                insert_list, remap_list);
>>>                   if (r)
>>>                       goto out;
>>>               }
>>> +
>>> +            /* split success, insert_list has new head/tail pranges */
>>> +            /* move prange from svm list to modify list */
>>> +            list_move_tail(&prange->list, &modify_list);
>>> +            /* put prange_dup at pragne->update_list */
>>> +            list_add(&prange_dup->list, &prange->update_list);
>>>           } else {
>>>               /* The node is contained within start..last,
>>>                * just update it
>>> @@ -2178,8 +2204,38 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>>>               svm_range_free(prange, false);
>>>           list_for_each_entry_safe(prange, tmp, &new_list, list)
>>>               svm_range_free(prange, true);
>>> +
>>> +        list_for_each_entry_safe(prange, tmp, &modify_list, list) {
>>> +            struct svm_range *prange_dup;
>>> +
>>> +            /* free pragne_dup that is associated with prange on modify_list */
>>> +            prange_dup = list_first_entry(&prange->update_list, struct svm_range, list);
>>> +            if (prange_dup)
>>> +                svm_range_free(prange_dup, false);
>>> +
>>> +            INIT_LIST_HEAD(&prange->update_list);
>>> +            /* put prange back to svm list */
>>> +            list_move_tail(&prange->list, &svms->list);
>>> +        }
>>>       } else {
>>>           list_splice(&new_list, insert_list);
>>> +
>>> +        list_for_each_entry_safe(prange, tmp, &modify_list, list) {
>>> +            struct svm_range *prange_dup;
>>> +
>>> +            prange_dup = list_first_entry(&prange->update_list, struct svm_range, list);
>>> +            if (prange_dup) {
>>> +                /* update prange from prange_dup */
>>> +                svm_range_copy(prange, prange_dup);
>>> +                /* release temporal pragne_dup */
>>> +                svm_range_free(prange_dup, false);
>>> +            }
>>> +            INIT_LIST_HEAD(&prange->update_list);
>>> +
>>> +            /* move prange from modify_list to insert_list and update_list*/
>>> +            list_move_tail(&prange->list, insert_list);
>>> +            list_add(&prange->update_list, update_list);
>>> +        }
>>>       }
>>>         return r;
>>> @@ -3533,7 +3589,6 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
>>>       struct amdkfd_process_info *process_info = p->kgd_process_info;
>>>       struct list_head update_list;
>>>       struct list_head insert_list;
>>> -    struct list_head remove_list;
>>>       struct list_head remap_list;
>>>       struct svm_range_list *svms;
>>>       struct svm_range *prange;
>>> @@ -3566,7 +3621,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
>>>         /* Add new range and split existing ranges as needed */
>>>       r = svm_range_add(p, start, size, nattr, attrs, &update_list,
>>> -              &insert_list, &remove_list, &remap_list);
>>> +              &insert_list, &remap_list);
>>>       if (r) {
>>>           mutex_unlock(&svms->lock);
>>>           mmap_write_unlock(mm);
>>> @@ -3574,21 +3629,18 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
>>>       }
>>>       /* Apply changes as a transaction */
>>>       list_for_each_entry_safe(prange, next, &insert_list, list) {
>>> -        svm_range_add_to_svms(prange);
>>> -        svm_range_add_notifier_locked(mm, prange);
>>> +        /* prange can be new or existing range, put it at svms->list */
>>> +        list_move_tail(&prange->list, &prange->svms->list);
>>> +        /* update prange at interval trees: svms->objects and
>>> +         * mm interval notifier tree
>>> +         */
>>> +        svm_range_update_notifier_and_interval_tree(mm, prange);
>>> +
>>>       }
>>>       list_for_each_entry(prange, &update_list, update_list) {
>>>           svm_range_apply_attrs(p, prange, nattr, attrs, &update_mapping);
>>>           /* TODO: unmap ranges from GPU that lost access */
>>>       }
>>> -    list_for_each_entry_safe(prange, next, &remove_list, update_list) {
>>> -        pr_debug("unlink old 0x%p prange 0x%p [0x%lx 0x%lx]\n",
>>> -             prange->svms, prange, prange->start,
>>> -             prange->last);
>>> -        svm_range_unlink(prange);
>>> -        svm_range_remove_notifier(prange);
>>> -        svm_range_free(prange, false);
>>> -    }
>>>         mmap_write_downgrade(mm);
>>>       /* Trigger migrations and revalidate and map to GPUs as needed. If


More information about the amd-gfx mailing list