[PATCH 12/18] drm/xe/eudebug: implement userptr_vma access

Fri Oct 25 13:20:18 UTC 2024

W dniu 24.10.2024 o 18:06, Matthew Brost pisze:
> On Wed, Oct 23, 2024 at 01:32:53PM +0200, Hajda, Andrzej wrote:
>> W dniu 22.10.2024 o 00:34, Matthew Brost pisze:
>>> On Mon, Oct 21, 2024 at 11:54:30AM +0200, Hajda, Andrzej wrote:
>>>> W dniu 20.10.2024 o 20:16, Matthew Brost pisze:
>>>>> On Tue, Oct 01, 2024 at 05:43:00PM +0300, Mika Kuoppala wrote:
>>>>>> From: Andrzej Hajda <andrzej.hajda at intel.com>
>>>>>>
>>>>>> Debugger needs to read/write program's vmas including
>>>>>> userptr_vma. Since hmm_range_fault is used to pin userptr
>>>>>> vmas, it is possible to map those vmas from debugger
>>>>>> context.
>>>>>>
>>>>>> v2: pin pages vs notifier, move to vm.c (Matthew)
>>>>>>
>>>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda at intel.com>
>>>>>> Signed- off-by: Maciej Patelczyk <maciej.patelczyk at intel.com>
>>>>>> Signed- off-by: Mika Kuoppala
>>>>>> <mika.kuoppala at linux.intel.com> Reviewed- by: Jonathan
>>>>>> Cavitt <jonathan.cavitt at intel.com> --- drivers/
>>>>>> gpu/drm/xe/xe_eudebug.c |  2 +- drivers/gpu/drm/xe/ xe_vm.c
>>>>>> | 47 +++++++++++++++++++++++++++++++++ drivers/
>>>>>> gpu/drm/xe/xe_vm.h      |  3 +++ 3 files changed, 51
>>>>>> insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/
>>>>>> drm/ xe/xe_eudebug.c index edad6d533d0b..b09d7414cfe3 100644
>>>>>> --- a/ drivers/gpu/drm/xe/xe_eudebug.c +++
>>>>>> b/drivers/gpu/drm/ xe/ xe_eudebug.c @@ -3023,7 +3023,7 @@
>>>>>> static int xe_eudebug_vma_access(struct xe_vma *vma, u64
>>>>>> offset, return ret; } -	return -EINVAL; +	return
>>>>>> xe_uvma_access(to_userptr_vma(vma), offset, buf, bytes,
>>>>>> write); } static int xe_eudebug_vm_access(struct xe_vm *vm,
>>>>>> u64 offset, diff --git a/drivers/gpu/drm/xe/xe_vm.c b/
>>>>>> drivers/ gpu/drm/xe/xe_vm.c index a836dfc5a86f..5f891e76993b
>>>>>> 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/
>>>>>> xe/xe_vm.c @@ -3421,3 +3421,50 @@ void
>>>>>> xe_vm_snapshot_free(struct xe_vm_snapshot *snap) }
>>>>>> kvfree(snap); } + +int xe_uvma_access(struct xe_userptr_vma
>>>>>> *uvma, u64 offset, + void *buf, u64 len, bool write) +{
>>>>> Maybe dump question but are we overthinking this here?
>>>>>
>>>>> Can we just use kthread_use_mm, copy_to_user, copy_from_user?
>>>>>
>>>>> If not then my previous comments still apply here.
>>>> This function is called from debugger process context and
>>>> kthread_use_mm is allowed only from kthread. Spawning kthread
>>>> just for this is an option but looks odd and suboptimal, could be
>>>> kind of last resort, or not?
>>>>
>>>> Another options: 1. Keep reference to remote task in xe_userptr and
>>>> use access_process_vm(up->task, ...).
>>>>
>>> I think remote refs are generally a bad idea but admittedly don't fully
>>> understand what this would look like.
>>>
>>>> 2. Pass xe_eudebug.target_task reference down from eudebug framework
>>>> to this helper and use access_process_vm. Current call chain is:
>>>> __xe_eudebug_vm_access - has access to xe_eudebug.target_task
>>>> ->__vm_read_write --->xe_eudebug_vm_access ---->xe_eudebug_vm_access
>>>> ----->xe_eudebug_vma_access ------
>>>>> xe_vm_userptr_access So to achieve this multiple changes are
>>>> required, but maybe it is valid path to go? One potential issue with
>>>> 1 and 2 is that multiple UMD tests were failing when
>>>> access_process_vm/access_remote_vm were used, they were not
>>>> investigated as this approach was dropped due to different reasons.
>>>>
>>>> 3. Continue approach from this patch, but with corrected page
>>>> iterator of up->sg sg list[1]. This was nacked by you(?) [2] but
>>>> I have problem understanding why? I see lot of code in kernel
>>>> mapping sg pages: linux$ git grep ' kmap.*sg' | wc -l
>>> I looked here every example I found has mapping and accessing 1
>>> page at time not mapping 1 page and accessing many.
>>>
>>>> 61 Is it incorrect? Or our case is different?
>>>>
>>> A sglist segments are dma-addresses (virtual address), thus every
>>> 4k in the segment can be a different physical page.
>> sglist is also list of pages and their lengths (in case of consecutive
>> pages, they are glued together), i.e. exactly what we need. And this is
>> done in xe_build_sg: it calls sg_alloc_table_from_pages_segment which
>> is documented as follows:
>> ...
>> * Allocate and initialize an sg table from a list of pages. Contiguous
>> * ranges of the pages are squashed into a single scatterlist node up to the
>> * maximum size specified in @max_segment. A user may provide an offset at a
>> * start and a size of valid data in a buffer specified by the page array.
>> ...
>> So sglist contains the same information as hmm_range->hmm_pfns[i] and little
>> more.
>>
>> So my concern is if mapping operation can destroy this info, but looking at
>> the code this does not seems to be the case. For example iommu_dma_map_sg
>> docs explicitly says about "preserve the original offsets and sizes for the
>> caller".
>>
>>
>>> i.e., look at this snippet:
>>>
>>> +		void *ptr = kmap_local_page(sg_page(cur.sgl)) + cur.start; + +
>>> cur_len = min(cur.size, cur.remaining); +		if (write) + memcpy(ptr, buf,
>>> cur_len); +		else +			memcpy(buf, ptr, cur_len); + kunmap_local(ptr);
>>>
>>> If 'cur.start' > 4k, then you are potentially pointing to an incorrect
>>> page and corrupting memory.
>> With added possibility to iterate over sgl pages in xe_res_cursor[1] it does
>> not seems to be true.
>> Why? cur.start is limited by length of the segment (cur.sgl->length), if it
>> happens to be more than 4k, it means sg_page(cur.sgl) points to consecutive
>> pages and cur.start is correct.
>>
> I suppose if the cursor is changed to walk the pages not the dma
> address, yea i guess this would work. Still my much prefered way would
> just call hmm_range_fault or optionally save off the pages in
> xe_vma_userptr_pin_pages given at some point we will ditch SG tables for
> userptr in favor of drm gpusvm which will be page based.

Both alternatives seems for me suboptimal, as their result is what we 
have already in sg table, for a price of extra call (hmm_range_fault) 
and extra allocations (both). Most important they will complicate the 
code without clear benefit.

I will try to implement version with hmm_range_fault but I am still 
confused why we need to complicate things.

Regards

Andrzej

>
> Matt
>
>>> Likewise if 'cur_len' > 4k, then you are potentially pointing to an
>>> incorrect page and corrupting memory.
>> Again, in case of consecutive pages it should be in range.
>>
>> Anyway if there is an issue with consecutive pages, which I am not aware of,
>> we can always build sg list with segments pointing to 4K pages by
>> modyfing xe_build_sg to call sg_alloc_table_from_pages_segment with 4K max
>> segment size.
>>
>> [1]: https://lore.kernel.org/intel-xe/20241011-xe_res_cursor_add_page_iterator-v3-1-0f8b8d3ab021@intel.com/
>>
>> Regards
>> Andrzej
>>
>>> This loop would have to be changed to something like below which kmaps
>>> and accesses 1 page at a time...  for (xe_res_first_sg(up-
>>>> sg, offset, len, &cur); cur.remaining; xe_res_next(&cur, cur_len))
>>> { int segment_len; int remain;
>>>
>>> cur_len = min(cur.size, cur.remaining); remain = cur_len;
>>>
>>> for (i = 0; i < cur_len; i += segment_len) { phys_addr_t phys =
>>> iommu_iova_to_phys(sg_dma_address(cur.sgl) + i + cur.start); struct page
>>> *page = phys_to_page(phys); void *ptr = kmap_local_page(page); int
>>> ptr_offset = offset & ~PAGE_MASK;
>>>
>>> segment_len = min(remain, PAGE_SIZE - ptr_offset);
>>>
>>> if (write) memcpy(ptr + ptr_offset, buf + i, segment_len); else
>>> memcpy(buf + i, ptr + ptr_offset, segment_len); kunmap_local(ptr);
>>>
>>> offset += segment_len; remain -= segment_len; }  buf += cur_len; }
>>>
>>>> 4. As you suggested in [3](?), modify xe_hmm_userptr_populate_range
>>>> to keep hmm_range.hmm_pfns(or sth similar) in xe_userptr and use it
>>>> later (instead of up->sg) to iterate over pages.
>>>>
>>> Or just call hmm_range_fault directly here and operate on returned pages
>>> directly.
>>>
>>> BTW eventually all the userptr stuff is going to change and be
>>> based on GPU SVM [4]. Calling hmm_range_fault directly will always
>>> work though and likely the safest option.
>>>
>>> Matt
>>>
>>> [4] https://patchwork.freedesktop.org/patch/619809/? series=137870&rev=2
>>>
>>>> [1]: https://lore.kernel.org/intel-xe/20241011-
>>>> xe_res_cursor_add_page_iterator-v3-1-0f8b8d3ab021 at intel.com/ [2]:
>>>> https://lore.kernel.org/intel-xe/Zw32fauoUmB6Iojk@DUT025-
>>>> TGLU.fm.intel.com/ [3]: https://patchwork.freedesktop.org/
>>>> patch/617481/?series=136572&rev=2#comment_1126527
>>>>
>>>> Regards Andrzej
>>>>
>>>>> Matt
>>>>>
>>>>>> +	struct xe_vm *vm = xe_vma_vm(&uvma->vma); +	struct
>>>>>> xe_userptr *up = &uvma->userptr; +	struct xe_res_cursor cur
>>>>>> = {}; +	int cur_len, ret = 0; + +	while (true) { +
>>>>>> down_read(&vm-
>>>>>>> userptr.notifier_lock); +		if (!
>>>>>> xe_vma_userptr_check_repin(uvma)) +			break; + +
>>>>>> spin_lock(&vm-
>>>>>>> userptr.invalidated_lock); + list_del_init(&uvma-
>>>>>>> userptr.invalidate_link); + spin_unlock(&vm-
>>>>>>> userptr.invalidated_lock); + +		up_read(&vm-
>>>>>>> userptr.notifier_lock); +		ret =
>>>>>> xe_vma_userptr_pin_pages(uvma); +		if (ret) +			return ret;
>>>>>> +	} + +	if (!up->sg) { +		ret = -EINVAL; +		goto
>>>>>> out_unlock_notifier; +	} + +	for (xe_res_first_sg(up->sg,
>>>>>> offset, len, &cur); cur.remaining; +	     xe_res_next(&cur,
>>>>>> cur_len)) { +		void *ptr = kmap_local_page(sg_page(cur.sgl))
>>>>>> + cur.start; + +		cur_len = min(cur.size, cur.remaining); +
>>>>>> if (write) +			memcpy(ptr, buf, cur_len); +		else +
>>>>>> memcpy(buf, ptr, cur_len); +		kunmap_local(ptr); +		buf +=
>>>>>> cur_len; +	} + ret = len; + +out_unlock_notifier: +
>>>>>> up_read(&vm-
>>>>>>> userptr.notifier_lock); +	return ret; +} diff --git a/
>>>>>>> drivers/
>>>>>> gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index
>>>>>> c864dba35e1d..99b9a9b011de 100644 --- a/drivers/gpu/drm/xe/
>>>>>> xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -281,3 +281,6 @@
>>>>>> struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm
>>>>>> *vm); void xe_vm_snapshot_capture_delayed(struct
>>>>>> xe_vm_snapshot *snap); void xe_vm_snapshot_print(struct
>>>>>> xe_vm_snapshot *snap, struct drm_printer *p); void
>>>>>> xe_vm_snapshot_free(struct xe_vm_snapshot *snap); + +int
>>>>>> xe_uvma_access(struct xe_userptr_vma *uvma, u64 offset, +
>>>>>> void *buf, u64 len, bool write); -- 2.34.1
>>>>>>