[PATCH 12/18] drm/xe/eudebug: implement userptr_vma access

Wed Oct 23 11:32:53 UTC 2024

W dniu 22.10.2024 o 00:34, Matthew Brost pisze:
> On Mon, Oct 21, 2024 at 11:54:30AM +0200, Hajda, Andrzej wrote:
>> W dniu 20.10.2024 o 20:16, Matthew Brost pisze:
>>> On Tue, Oct 01, 2024 at 05:43:00PM +0300, Mika Kuoppala wrote:
>>>> From: Andrzej Hajda <andrzej.hajda at intel.com>
>>>> 
>>>> Debugger needs to read/write program's vmas including 
>>>> userptr_vma. Since hmm_range_fault is used to pin userptr 
>>>> vmas, it is possible to map those vmas from debugger context.
>>>> 
>>>> v2: pin pages vs notifier, move to vm.c (Matthew)
>>>> 
>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda at intel.com>
>>>> Signed- off-by: Maciej Patelczyk <maciej.patelczyk at intel.com>
>>>> Signed- off-by: Mika Kuoppala <mika.kuoppala at linux.intel.com> 
>>>> Reviewed- by: Jonathan Cavitt <jonathan.cavitt at intel.com> --- 
>>>> drivers/ gpu/drm/xe/xe_eudebug.c |  2 +- drivers/gpu/drm/xe/ 
>>>> xe_vm.c      | 47 +++++++++++++++++++++++++++++++++ drivers/ 
>>>> gpu/drm/xe/xe_vm.h      |  3 +++ 3 files changed, 51 
>>>> insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/ 
>>>> drm/ xe/xe_eudebug.c index edad6d533d0b..b09d7414cfe3 100644 
>>>> --- a/ drivers/gpu/drm/xe/xe_eudebug.c +++ b/drivers/gpu/drm/ 
>>>> xe/ xe_eudebug.c @@ -3023,7 +3023,7 @@ static int 
>>>> xe_eudebug_vma_access(struct xe_vma *vma, u64 offset, return 
>>>> ret; } -	return -EINVAL; +	return 
>>>> xe_uvma_access(to_userptr_vma(vma), offset, buf, bytes, 
>>>> write); } static int xe_eudebug_vm_access(struct xe_vm *vm, 
>>>> u64 offset, diff --git a/drivers/gpu/drm/xe/xe_vm.c b/
>>>> drivers/ gpu/drm/xe/xe_vm.c index a836dfc5a86f..5f891e76993b
>>>> 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/
>>>> xe/xe_vm.c @@ -3421,3 +3421,50 @@ void
>>>> xe_vm_snapshot_free(struct xe_vm_snapshot *snap) }
>>>> kvfree(snap); } + +int xe_uvma_access(struct xe_userptr_vma
>>>> *uvma, u64 offset, + void *buf, u64 len, bool write) +{
>>> 
>>> Maybe dump question but are we overthinking this here?
>>> 
>>> Can we just use kthread_use_mm, copy_to_user, copy_from_user?
>>> 
>>> If not then my previous comments still apply here.
>> 
>> This function is called from debugger process context and 
>> kthread_use_mm is allowed only from kthread. Spawning kthread
>> just for this is an option but looks odd and suboptimal, could be
>> kind of last resort, or not?
>> 
>> Another options: 1. Keep reference to remote task in xe_userptr 
>> and use access_process_vm(up->task, ...).
>> 
> 
> I think remote refs are generally a bad idea but admittedly don't 
> fully understand what this would look like.
> 
>> 2. Pass xe_eudebug.target_task reference down from eudebug 
>> framework to this helper and use access_process_vm. Current call 
>> chain is: __xe_eudebug_vm_access - has access to 
>> xe_eudebug.target_task ->__vm_read_write --->xe_eudebug_vm_access 
>> ---->xe_eudebug_vm_access ----->xe_eudebug_vma_access ------
>>> xe_vm_userptr_access So to achieve this multiple changes are
>> required, but maybe it is valid path to go? One potential issue 
>> with 1 and 2 is that multiple UMD tests were failing when 
>> access_process_vm/access_remote_vm were used, they were not 
>> investigated as this approach was dropped due to different 
>> reasons.
>> 
>> 3. Continue approach from this patch, but with corrected page 
>> iterator of up->sg sg list[1]. This was nacked by you(?) [2] but
>> I have problem understanding why? I see lot of code in kernel 
>> mapping sg pages: linux$ git grep ' kmap.*sg' | wc -l
> 
> I looked here every example I found has mapping and accessing 1
> page at time not mapping 1 page and accessing many.
> 
>> 61 Is it incorrect? Or our case is different?
>> 
> 
> A sglist segments are dma-addresses (virtual address), thus every
> 4k in the segment can be a different physical page.

sglist is also list of pages and their lengths (in case of consecutive
pages, they are glued together), i.e. exactly what we need. And this is
done in xe_build_sg: it calls sg_alloc_table_from_pages_segment which
is documented as follows:
...
* Allocate and initialize an sg table from a list of pages. Contiguous
* ranges of the pages are squashed into a single scatterlist node up to the
* maximum size specified in @max_segment. A user may provide an offset at a
* start and a size of valid data in a buffer specified by the page array.
...
So sglist contains the same information as hmm_range->hmm_pfns[i] and 
little more.

So my concern is if mapping operation can destroy this info, but looking 
at the code this does not seems to be the case. For example 
iommu_dma_map_sg docs explicitly says about "preserve the original 
offsets and sizes for the caller".

> 
> i.e., look at this snippet:
> 
> +		void *ptr = kmap_local_page(sg_page(cur.sgl)) + cur.start; + + 
> cur_len = min(cur.size, cur.remaining); +		if (write) + memcpy(ptr, 
> buf, cur_len); +		else +			memcpy(buf, ptr, cur_len); + 
> kunmap_local(ptr);
> 
> If 'cur.start' > 4k, then you are potentially pointing to an 
> incorrect page and corrupting memory.

With added possibility to iterate over sgl pages in xe_res_cursor[1] it 
does not seems to be true.
Why? cur.start is limited by length of the segment (cur.sgl->length), if 
it happens to be more than 4k, it means sg_page(cur.sgl) points to 
consecutive pages and cur.start is correct.

> 
> Likewise if 'cur_len' > 4k, then you are potentially pointing to an 
> incorrect page and corrupting memory.

Again, in case of consecutive pages it should be in range.

Anyway if there is an issue with consecutive pages, which I am not aware 
of, we can always build sg list with segments pointing to 4K pages by
modyfing xe_build_sg to call sg_alloc_table_from_pages_segment with 4K 
max segment size.

[1]: 
https://lore.kernel.org/intel-xe/20241011-xe_res_cursor_add_page_iterator-v3-1-0f8b8d3ab021@intel.com/

Regards
Andrzej

> 
> This loop would have to be changed to something like below which 
> kmaps and accesses 1 page at a time...  for (xe_res_first_sg(up-
> >sg, offset, len, &cur); cur.remaining; xe_res_next(&cur, cur_len))
> { int segment_len; int remain;
> 
> cur_len = min(cur.size, cur.remaining); remain = cur_len;
> 
> for (i = 0; i < cur_len; i += segment_len) { phys_addr_t phys = 
> iommu_iova_to_phys(sg_dma_address(cur.sgl) + i + cur.start); struct 
> page *page = phys_to_page(phys); void *ptr = kmap_local_page(page); 
> int ptr_offset = offset & ~PAGE_MASK;
> 
> segment_len = min(remain, PAGE_SIZE - ptr_offset);
> 
> if (write) memcpy(ptr + ptr_offset, buf + i, segment_len); else 
> memcpy(buf + i, ptr + ptr_offset, segment_len); kunmap_local(ptr);
> 
> offset += segment_len; remain -= segment_len; }  buf += cur_len; }
> 
>> 4. As you suggested in [3](?), modify 
>> xe_hmm_userptr_populate_range to keep hmm_range.hmm_pfns(or sth 
>> similar) in xe_userptr and use it later (instead of up->sg) to 
>> iterate over pages.
>> 
> 
> Or just call hmm_range_fault directly here and operate on returned 
> pages directly.
> 
> BTW eventually all the userptr stuff is going to change and be
> based on GPU SVM [4]. Calling hmm_range_fault directly will always
> work though and likely the safest option.
> 
> Matt
> 
> [4] https://patchwork.freedesktop.org/patch/619809/? 
> series=137870&rev=2
> 
>> [1]: https://lore.kernel.org/intel-xe/20241011- 
>> xe_res_cursor_add_page_iterator-v3-1-0f8b8d3ab021 at intel.com/ [2]: 
>> https://lore.kernel.org/intel-xe/Zw32fauoUmB6Iojk@DUT025- 
>> TGLU.fm.intel.com/ [3]: https://patchwork.freedesktop.org/ 
>> patch/617481/?series=136572&rev=2#comment_1126527
>> 
>> Regards Andrzej
>> 
>>> 
>>> Matt
>>> 
>>>> +	struct xe_vm *vm = xe_vma_vm(&uvma->vma); +	struct 
>>>> xe_userptr *up = &uvma->userptr; +	struct xe_res_cursor cur = 
>>>> {}; +	int cur_len, ret = 0; + +	while (true) { +
>>>> down_read(&vm-
>>>>> userptr.notifier_lock); +		if (!
>>>> xe_vma_userptr_check_repin(uvma)) +			break; + +
>>>> spin_lock(&vm-
>>>>> userptr.invalidated_lock); + list_del_init(&uvma- 
>>>>> userptr.invalidate_link); + spin_unlock(&vm- 
>>>>> userptr.invalidated_lock); + +		up_read(&vm- 
>>>>> userptr.notifier_lock); +		ret =
>>>> xe_vma_userptr_pin_pages(uvma); +		if (ret) +			return ret; 
>>>> +	} + +	if (!up->sg) { +		ret = -EINVAL; +		goto 
>>>> out_unlock_notifier; +	} + +	for (xe_res_first_sg(up->sg, 
>>>> offset, len, &cur); cur.remaining; +	     xe_res_next(&cur, 
>>>> cur_len)) { +		void *ptr = kmap_local_page(sg_page(cur.sgl))
>>>> + cur.start; + +		cur_len = min(cur.size, cur.remaining); +
>>>> if (write) +			memcpy(ptr, buf, cur_len); +		else +
>>>> memcpy(buf, ptr, cur_len); +		kunmap_local(ptr); +		buf +=
>>>> cur_len; +	} + ret = len; + +out_unlock_notifier: +
>>>> up_read(&vm-
>>>>> userptr.notifier_lock); +	return ret; +} diff --git a/ 
>>>>> drivers/
>>>> gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index 
>>>> c864dba35e1d..99b9a9b011de 100644 --- a/drivers/gpu/drm/xe/ 
>>>> xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -281,3 +281,6 @@ 
>>>> struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm 
>>>> *vm); void xe_vm_snapshot_capture_delayed(struct 
>>>> xe_vm_snapshot *snap); void xe_vm_snapshot_print(struct 
>>>> xe_vm_snapshot *snap, struct drm_printer *p); void 
>>>> xe_vm_snapshot_free(struct xe_vm_snapshot *snap); + +int 
>>>> xe_uvma_access(struct xe_userptr_vma *uvma, u64 offset, + void 
>>>> *buf, u64 len, bool write); -- 2.34.1
>>>> 
>>