[PATCH 2/2] drm/xe: Pick correct userptr VMA to repin on REMAP op failure
Maarten Lankhorst
maarten.lankhorst at linux.intel.com
Fri Feb 2 08:37:34 UTC 2024
On 2024-02-01 21:00, Maarten Lankhorst wrote:
> Hey,
>
> On 2024-02-01 20:26, Matthew Brost wrote:
>> On Thu, Feb 01, 2024 at 08:18:52PM +0100, Maarten Lankhorst wrote:
>>>
>>>
>>> On 2024-02-01 01:48, Matthew Brost wrote:
>>>> A REMAP op is composed of 3 VMA's - unmap, prev map, and next map. When
>>>> op_execute fails with -EAGAIN we need to update the local VMA
>>>> pointer to
>>>> the current op state and then repin the VMA if it is a userptr.
>>>>
>>>> Fixes a failure seen in xe_vm.munmap-style-unbind-userptr-one-partial.
>>>>
>>>> Fixes: b06d47be7c83 ("drm/xe: Port Xe to GPUVA")
>>>> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>>> ---
>>>> drivers/gpu/drm/xe/xe_vm.c | 22 +++++++++++++++++-----
>>>> 1 file changed, 17 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>>> index e55161136490..2ab863fe7d0a 100644
>>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>>> @@ -2506,13 +2506,25 @@ static int __xe_vma_op_execute(struct xe_vm
>>>> *vm, struct xe_vma *vma,
>>>> }
>>>> drm_exec_fini(&exec);
>>>> - if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
>>>> + if (err == -EAGAIN) {
>>>> lockdep_assert_held_write(&vm->lock);
>>>> - err = xe_vma_userptr_pin_pages(vma);
>>>> - if (!err)
>>>> - goto retry_userptr;
>>>> - trace_xe_vma_fail(vma);
>>>> + if (op->base.op == DRM_GPUVA_OP_REMAP) {
>>>> + if (!op->remap.unmap_done)
>>>> + vma = gpuva_to_vma(op->base.remap.unmap->va);
>>>> + else if (op->remap.prev)
>>>> + vma = op->remap.prev;
>>>> + else
>>>> + vma = op->remap.next;
>>>> + }
>>> I see this same vma picking in handling of DRM_GPUVA_OP_REMAP.
>>>
>>> Could the switch in xe_vma_op_execute() be moved to a separate pick_vma
>>> function instead, called from this place too?
>>>
>>> It might make the code slightly more readable.
>>>
>>
>> I would agree if this code wasn't going get rewritten shortly in [1]. We
>> are transiting to 1 job per VM bind IOCTL in [1]. I currently am
>> reworking on rebasing that code and found a few bugs in the current
>> code. I want to stablize the code quickly so I czn reliably test my
>> larger changes.
>>
>> Would it help if I added comment here saying this code is temporary?
> Oh that gives some context. The infinite loop fix in one patch is caused
> by the first patch in that series.
>
> I'd personally choose to fix it here, then when r ebasing put a revert
> before 21/27 and squash?
>
> Cheers,
> ~Maarten
Reviewed-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
More information about the Intel-xe
mailing list