[PATCH v2] drm/amdkfd: Correct svm prange overlapping handling at svm_range_set_attr ioctl
Chen, Xiaogang
xiaogang.chen at amd.com
Thu Jul 18 05:25:26 UTC 2024
On 7/17/2024 6:02 PM, Felix Kuehling wrote:
>
> On 2024-06-26 11:06, Xiaogang.Chen wrote:
>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>
>> When user adds new vm range that has overlapping with existing svm
>> pranges
>> current kfd creats a cloned pragne and split it, then replaces
>> original prange
>> by it. That destroy original prange locks and the cloned prange locks
>> do not
>> inherit original prange lock contexts. This may cause issue if code
>> still need
>> use these locks. In general we should keep using original prange,
>> update its
>> internal data that got changed during split, then free the cloned
>> prange.
>
> While splitting/updating ranges, the svms->lock needs to be held. You
> cannot have concurrent threads accessing ranges while we're updating
> the range list. If that is a possibility, you have a race condition
> anyway. You also can't split, shrink or otherwise modify a range while
> someone else is accessing that range. So keeping the same locking
> context is a non-issue.
>
We do hold svms->lock when call svm_range_add. The patch does not mean
to fix race condition. It keeps original svm range context when we need
split/update it. The current implementation "duplicate" a new one, then
destroy original svm range.
>>
>> This patch change vm range overlaping handling that does not remove
>> existing
>> pranges, instead updates it for split and keeps its locks alive.
>
> It sounds like you're trying to fix a problem here. Is this an actual
> or a hypothetical problem?
>
It is not a problem in reality so far. It uses a way that not destroy
original svm range when split/update it. So keep its locks(prange->lock,
prange->migrate_mutex) context. The so called "clone svm range" create
new locks that are not related to original locks. I think that is not
reasonable.
>
>>
>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 112 ++++++++++++++++++++-------
>> 1 file changed, 82 insertions(+), 30 deletions(-)
>
> Just looking at the number of added and removed lines, this doesn't
> look like a simplification. I really question the justification for
> this change. If it makes the code more complicated or less robust,
> without a good reason, then it's not a good change.
>
As above it does not make code simpler or more complicated. It
split/update svm range directly on prange data, not destroy original
prange, then generate a new one.
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 407636a68814..a66b8c96ee14 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1967,7 +1967,8 @@ svm_range_evict(struct svm_range *prange,
>> struct mm_struct *mm,
>> return r;
>> }
>> -static struct svm_range *svm_range_clone(struct svm_range *old)
>> +/* create a prange that has same range/size/addr etc info as old */
>> +static struct svm_range *svm_range_duplicate(struct svm_range *old)
>
> This seems like an unnecessary name change. "clone" and "duplicate"
> mean the same thing. But "clone" is shorter.
>
'clone" should mean identical to existing one. Here we use some items
from existing svm_range to build new one, the new one is not totally
same as existing one, such as the prange->lock is not same between old
and new svm range.
>> {
>> struct svm_range *new;
>> @@ -1999,6 +2000,25 @@ static struct svm_range
>> *svm_range_clone(struct svm_range *old)
>> return new;
>> }
>> +/* copy range/size/addr info from src to dst prange */
>> +static void svm_range_copy(struct svm_range *dst, struct svm_range
>> *src)
>> +{
>> + dst->npages = src->npages;
>> + dst->start = src->start;
>> + dst->last = src->last;
>> +
>> + dst->vram_pages = src->vram_pages;
>> + dst->offset = src->offset;
>> +
>> + for (int i = 0; i < MAX_GPU_INSTANCE; i++) {
>> + if (!src->dma_addr[i])
>> + continue;
>> +
>> + memcpy(dst->dma_addr[i], src->dma_addr[i],
>> + src->npages * sizeof(*src->dma_addr[i]));
>
> This does not reallocate/resize the dma_addr arrays. Reallocating
> these arrays can't be done here, because this function is not allowed
> to fail. That's one reason to use the clone instead of modifying the
> existing range.
ok, maybe I can swap dma_addr between dst and src svm range here, then
original dma_addr array got freed when free src svm range.
This patch purpose is to not destroy existing svm range when
split/update it since svm_range_add has no mean to remove any existing
svm range, even part of it. svm_range_add split or update existing svm
ranges. Maybe we need decide if this idea is valid at first?
Regards
Xiaogang
>
> Regards,
> Felix
>
>
>> + }
>> +}
>> +
>> void svm_range_set_max_pages(struct amdgpu_device *adev)
>> {
>> uint64_t max_pages;
>> @@ -2057,20 +2077,19 @@ svm_range_split_new(struct svm_range_list
>> *svms, uint64_t start, uint64_t last,
>> * @attrs: array of attributes
>> * @update_list: output, the ranges need validate and update GPU
>> mapping
>> * @insert_list: output, the ranges need insert to svms
>> - * @remove_list: output, the ranges are replaced and need remove
>> from svms
>> * @remap_list: output, remap unaligned svm ranges
>> *
>> * Check if the virtual address range has overlap with any existing
>> ranges,
>> * split partly overlapping ranges and add new ranges in the gaps.
>> All changes
>> * should be applied to the range_list and interval tree
>> transactionally. If
>> * any range split or allocation fails, the entire update fails.
>> Therefore any
>> - * existing overlapping svm_ranges are cloned and the original
>> svm_ranges left
>> + * existing overlapping svm_ranges are duplicated and the original
>> svm_ranges left
>> * unchanged.
>> *
>> - * If the transaction succeeds, the caller can update and insert
>> clones and
>> - * new ranges, then free the originals.
>> + * If the transaction succeeds, the caller can update and insert
>> split ranges and
>> + * new ranges.
>> *
>> - * Otherwise the caller can free the clones and new ranges, while
>> the old
>> + * Otherwise the caller can free the duplicated and new ranges,
>> while the old
>> * svm_ranges remain unchanged.
>> *
>> * Context: Process context, caller must hold svms->lock
>> @@ -2082,7 +2101,7 @@ static int
>> svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>> uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs,
>> struct list_head *update_list, struct list_head
>> *insert_list,
>> - struct list_head *remove_list, struct list_head *remap_list)
>> + struct list_head *remap_list)
>> {
>> unsigned long last = start + size - 1UL;
>> struct svm_range_list *svms = &p->svms;
>> @@ -2090,13 +2109,14 @@ svm_range_add(struct kfd_process *p, uint64_t
>> start, uint64_t size,
>> struct svm_range *prange;
>> struct svm_range *tmp;
>> struct list_head new_list;
>> + struct list_head modify_list;
>> int r = 0;
>> pr_debug("svms 0x%p [0x%llx 0x%lx]\n", &p->svms, start, last);
>> INIT_LIST_HEAD(update_list);
>> INIT_LIST_HEAD(insert_list);
>> - INIT_LIST_HEAD(remove_list);
>> + INIT_LIST_HEAD(&modify_list);
>> INIT_LIST_HEAD(&new_list);
>> INIT_LIST_HEAD(remap_list);
>> @@ -2117,35 +2137,41 @@ svm_range_add(struct kfd_process *p,
>> uint64_t start, uint64_t size,
>> /* nothing to do */
>> } else if (node->start < start || node->last > last) {
>> /* node intersects the update range and its attributes
>> - * will change. Clone and split it, apply updates only
>> + * will change. duplicate and split it, apply updates only
>> * to the overlapping part
>> */
>> - struct svm_range *old = prange;
>> + /* prange_dup is a temperal prange that holds size and
>> addr info
>> + * updates of pragne after split
>> + */
>> + struct svm_range *prange_dup;
>> - prange = svm_range_clone(old);
>> - if (!prange) {
>> + prange_dup = svm_range_duplicate(prange);
>> + if (!prange_dup) {
>> r = -ENOMEM;
>> goto out;
>> }
>> - list_add(&old->update_list, remove_list);
>> - list_add(&prange->list, insert_list);
>> - list_add(&prange->update_list, update_list);
>> -
>> + /* split prange_dup */
>> if (node->start < start) {
>> pr_debug("change old range start\n");
>> - r = svm_range_split_head(prange, start,
>> + r = svm_range_split_head(prange_dup, start,
>> insert_list, remap_list);
>> if (r)
>> goto out;
>> }
>> if (node->last > last) {
>> pr_debug("change old range last\n");
>> - r = svm_range_split_tail(prange, last,
>> + r = svm_range_split_tail(prange_dup, last,
>> insert_list, remap_list);
>> if (r)
>> goto out;
>> }
>> +
>> + /* split success, insert_list has new head/tail pranges */
>> + /* move prange from svm list to modify list */
>> + list_move_tail(&prange->list, &modify_list);
>> + /* put prange_dup at pragne->update_list */
>> + list_add(&prange_dup->list, &prange->update_list);
>> } else {
>> /* The node is contained within start..last,
>> * just update it
>> @@ -2178,8 +2204,38 @@ svm_range_add(struct kfd_process *p, uint64_t
>> start, uint64_t size,
>> svm_range_free(prange, false);
>> list_for_each_entry_safe(prange, tmp, &new_list, list)
>> svm_range_free(prange, true);
>> +
>> + list_for_each_entry_safe(prange, tmp, &modify_list, list) {
>> + struct svm_range *prange_dup;
>> +
>> + /* free pragne_dup that is associated with prange on
>> modify_list */
>> + prange_dup = list_first_entry(&prange->update_list,
>> struct svm_range, list);
>> + if (prange_dup)
>> + svm_range_free(prange_dup, false);
>> +
>> + INIT_LIST_HEAD(&prange->update_list);
>> + /* put prange back to svm list */
>> + list_move_tail(&prange->list, &svms->list);
>> + }
>> } else {
>> list_splice(&new_list, insert_list);
>> +
>> + list_for_each_entry_safe(prange, tmp, &modify_list, list) {
>> + struct svm_range *prange_dup;
>> +
>> + prange_dup = list_first_entry(&prange->update_list,
>> struct svm_range, list);
>> + if (prange_dup) {
>> + /* update prange from prange_dup */
>> + svm_range_copy(prange, prange_dup);
>> + /* release temporal pragne_dup */
>> + svm_range_free(prange_dup, false);
>> + }
>> + INIT_LIST_HEAD(&prange->update_list);
>> +
>> + /* move prange from modify_list to insert_list and
>> update_list*/
>> + list_move_tail(&prange->list, insert_list);
>> + list_add(&prange->update_list, update_list);
>> + }
>> }
>> return r;
>> @@ -3533,7 +3589,6 @@ svm_range_set_attr(struct kfd_process *p,
>> struct mm_struct *mm,
>> struct amdkfd_process_info *process_info = p->kgd_process_info;
>> struct list_head update_list;
>> struct list_head insert_list;
>> - struct list_head remove_list;
>> struct list_head remap_list;
>> struct svm_range_list *svms;
>> struct svm_range *prange;
>> @@ -3566,7 +3621,7 @@ svm_range_set_attr(struct kfd_process *p,
>> struct mm_struct *mm,
>> /* Add new range and split existing ranges as needed */
>> r = svm_range_add(p, start, size, nattr, attrs, &update_list,
>> - &insert_list, &remove_list, &remap_list);
>> + &insert_list, &remap_list);
>> if (r) {
>> mutex_unlock(&svms->lock);
>> mmap_write_unlock(mm);
>> @@ -3574,21 +3629,18 @@ svm_range_set_attr(struct kfd_process *p,
>> struct mm_struct *mm,
>> }
>> /* Apply changes as a transaction */
>> list_for_each_entry_safe(prange, next, &insert_list, list) {
>> - svm_range_add_to_svms(prange);
>> - svm_range_add_notifier_locked(mm, prange);
>> + /* prange can be new or existing range, put it at svms->list */
>> + list_move_tail(&prange->list, &prange->svms->list);
>> + /* update prange at interval trees: svms->objects and
>> + * mm interval notifier tree
>> + */
>> + svm_range_update_notifier_and_interval_tree(mm, prange);
>> +
>> }
>> list_for_each_entry(prange, &update_list, update_list) {
>> svm_range_apply_attrs(p, prange, nattr, attrs,
>> &update_mapping);
>> /* TODO: unmap ranges from GPU that lost access */
>> }
>> - list_for_each_entry_safe(prange, next, &remove_list, update_list) {
>> - pr_debug("unlink old 0x%p prange 0x%p [0x%lx 0x%lx]\n",
>> - prange->svms, prange, prange->start,
>> - prange->last);
>> - svm_range_unlink(prange);
>> - svm_range_remove_notifier(prange);
>> - svm_range_free(prange, false);
>> - }
>> mmap_write_downgrade(mm);
>> /* Trigger migrations and revalidate and map to GPUs as needed. If
More information about the amd-gfx
mailing list