[PATCH 2/2] drm/xe: Pick correct userptr VMA to repin on REMAP op failure

Thu Feb 1 20:00:14 UTC 2024

Hey,

On 2024-02-01 20:26, Matthew Brost wrote:
> On Thu, Feb 01, 2024 at 08:18:52PM +0100, Maarten Lankhorst wrote:
>>
>>
>> On 2024-02-01 01:48, Matthew Brost wrote:
>>> A REMAP op is composed of 3 VMA's - unmap, prev map, and next map. When
>>> op_execute fails with -EAGAIN we need to update the local VMA pointer to
>>> the current op state and then repin the VMA if it is a userptr.
>>>
>>> Fixes a failure seen in xe_vm.munmap-style-unbind-userptr-one-partial.
>>>
>>> Fixes: b06d47be7c83 ("drm/xe: Port Xe to GPUVA")
>>> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_vm.c | 22 +++++++++++++++++-----
>>>    1 file changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>> index e55161136490..2ab863fe7d0a 100644
>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>> @@ -2506,13 +2506,25 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
>>>    	}
>>>    	drm_exec_fini(&exec);
>>> -	if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
>>> +	if (err == -EAGAIN) {
>>>    		lockdep_assert_held_write(&vm->lock);
>>> -		err = xe_vma_userptr_pin_pages(vma);
>>> -		if (!err)
>>> -			goto retry_userptr;
>>> -		trace_xe_vma_fail(vma);
>>> +		if (op->base.op == DRM_GPUVA_OP_REMAP) {
>>> +			if (!op->remap.unmap_done)
>>> +				vma = gpuva_to_vma(op->base.remap.unmap->va);
>>> +			else if (op->remap.prev)
>>> +				vma = op->remap.prev;
>>> +			else
>>> +				vma = op->remap.next;
>>> +		}
>> I see this same vma picking in handling of DRM_GPUVA_OP_REMAP.
>>
>> Could the switch in xe_vma_op_execute() be moved to a separate pick_vma
>> function instead, called from this place too?
>>
>> It might make the code slightly more readable.
>>
> 
> I would agree if this code wasn't going get rewritten shortly in [1]. We
> are transiting to 1 job per VM bind IOCTL in [1]. I currently am
> reworking on rebasing that code and found a few bugs in the current
> code. I want to stablize the code quickly so I czn reliably test my
> larger changes.
> 
> Would it help if I added comment here saying this code is temporary?
Oh that gives some context. The infinite loop fix in one patch is caused 
by the first patch in that series.

I'd personally choose to fix it here, then when r ebasing put a revert 
before 21/27 and squash?

Cheers,
~Maarten