[PATCH 2/2] drm/xe: Pick correct userptr VMA to repin on REMAP op failure

Fri Feb 2 08:37:34 UTC 2024


On 2024-02-01 21:00, Maarten Lankhorst wrote:
> Hey,
> 
> On 2024-02-01 20:26, Matthew Brost wrote:
>> On Thu, Feb 01, 2024 at 08:18:52PM +0100, Maarten Lankhorst wrote:
>>>
>>>
>>> On 2024-02-01 01:48, Matthew Brost wrote:
>>>> A REMAP op is composed of 3 VMA's - unmap, prev map, and next map. When
>>>> op_execute fails with -EAGAIN we need to update the local VMA 
>>>> pointer to
>>>> the current op state and then repin the VMA if it is a userptr.
>>>>
>>>> Fixes a failure seen in xe_vm.munmap-style-unbind-userptr-one-partial.
>>>>
>>>> Fixes: b06d47be7c83 ("drm/xe: Port Xe to GPUVA")
>>>> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_vm.c | 22 +++++++++++++++++-----
>>>>    1 file changed, 17 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>>> index e55161136490..2ab863fe7d0a 100644
>>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>>> @@ -2506,13 +2506,25 @@ static int __xe_vma_op_execute(struct xe_vm 
>>>> *vm, struct xe_vma *vma,
>>>>        }
>>>>        drm_exec_fini(&exec);
>>>> -    if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
>>>> +    if (err == -EAGAIN) {
>>>>            lockdep_assert_held_write(&vm->lock);
>>>> -        err = xe_vma_userptr_pin_pages(vma);
>>>> -        if (!err)
>>>> -            goto retry_userptr;
>>>> -        trace_xe_vma_fail(vma);
>>>> +        if (op->base.op == DRM_GPUVA_OP_REMAP) {
>>>> +            if (!op->remap.unmap_done)
>>>> +                vma = gpuva_to_vma(op->base.remap.unmap->va);
>>>> +            else if (op->remap.prev)
>>>> +                vma = op->remap.prev;
>>>> +            else
>>>> +                vma = op->remap.next;
>>>> +        }
>>> I see this same vma picking in handling of DRM_GPUVA_OP_REMAP.
>>>
>>> Could the switch in xe_vma_op_execute() be moved to a separate pick_vma
>>> function instead, called from this place too?
>>>
>>> It might make the code slightly more readable.
>>>
>>
>> I would agree if this code wasn't going get rewritten shortly in [1]. We
>> are transiting to 1 job per VM bind IOCTL in [1]. I currently am
>> reworking on rebasing that code and found a few bugs in the current
>> code. I want to stablize the code quickly so I czn reliably test my
>> larger changes.
>>
>> Would it help if I added comment here saying this code is temporary?
> Oh that gives some context. The infinite loop fix in one patch is caused 
> by the first patch in that series.
> 
> I'd personally choose to fix it here, then when r ebasing put a revert 
> before 21/27 and squash?
> 
> Cheers,
> ~Maarten
Reviewed-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>