[RFC] drm/xe: Ensure write lock is held when calling xe_vma_rebind()
Nirmoy Das
nirmoy.das at intel.com
Thu Jun 6 10:09:39 UTC 2024
Hi Matt,
On 6/5/2024 6:44 PM, Matthew Brost wrote:
> On Wed, Jun 05, 2024 at 12:16:07PM +0200, Nirmoy Das wrote:
>> xe_vma_rebind() expects write vm lock is held so start out with a
>> read lock and upgrade if a vma rebind is required.
>>
>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1894
>> Fixes: bf69918b7199 ("drm/xe: Use xe_vma_ops to implement page fault rebinds")
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Oak Zeng <oak.zeng at intel.com>
>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_gt_pagefault.c | 25 +++++++++----------------
>> 1 file changed, 9 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
>> index 040dd142c49c..543cfa9892f6 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
>> @@ -153,24 +153,15 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>> return -EINVAL;
>>
>> retry_userptr:
>> - /*
>> - * TODO: Avoid exclusive lock if VM doesn't have userptrs, or
>> - * start out read-locked?
>> - */
>> - down_write(&vm->lock);
>> - write_locked = true;
>> + /* start out with read-locked and upgrade of vma is needed a rebind */
>> + down_read(&vm->lock);
>> + write_locked = false;
>> vma = lookup_vma(vm, pf->page_addr);
>> if (!vma) {
>> ret = -EINVAL;
>> goto unlock_vm;
>> }
>>
>> - if (!xe_vma_is_userptr(vma) ||
>> - !xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
>> - downgrade_write(&vm->lock);
>> - write_locked = false;
>> - }
>> -
>> trace_xe_vma_pagefault(vma);
>>
>> atomic = access_is_atomic(pf->access_type);
>> @@ -179,9 +170,14 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>> if (vma_is_valid(tile, vma) && !atomic)
>> goto unlock_vm;
>>
>> + /* xe_vma_rebind() expects a write lock so upgrade the vm lock */
>> + up_read(&vm->lock);
>> + down_write(&vm->lock);
> You can't do this, or even if it works it is bad, bad practice to drop
> the lock as someone else could come in and acquire the lock in write
> mode. In this case, a user unbind of the VMA could happen and the VMA
> could be destoryed.
I didn't verify but assumed vma is ref_counted but anyways I knew this
wasn't correct
so tagged it RFC :)
> Now the VMA refs below are accessing corrupt memory.
>
> I have on going refactor [1] here which just takes the vm->lock in write
> mode for page faults. I suggest we just review / use that now.
I will do that.
>
> Long term I think we need to audit the read / write usage of vm->lock as
> I'm almost certain we have broken the intended usage of this lock. Or
> honestly maybe just change this lock to a mutex as I'm unsure if the
> read vs. write buys us anything and complicates the driver.
Yes, that might be simpler.
Thanks,
Nirmoy
>
> Matt
>
> [1] https://patchwork.freedesktop.org/series/133628/
>
>> + write_locked = true;
>> /* TODO: Validate fault */
>>
>> - if (xe_vma_is_userptr(vma) && write_locked) {
>> + if (xe_vma_is_userptr(vma) &&
>> + xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
>> struct xe_userptr_vma *uvma = to_userptr_vma(vma);
>>
>> spin_lock(&vm->userptr.invalidated_lock);
>> @@ -191,9 +187,6 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
>> ret = xe_vma_userptr_pin_pages(uvma);
>> if (ret)
>> goto unlock_vm;
>> -
>> - downgrade_write(&vm->lock);
>> - write_locked = false;
>> }
>>
>> /* Lock VM and BOs dma-resv */
>> --
>> 2.42.0
>>
More information about the Intel-xe
mailing list