[PATCH] drm/xe/userptr: fix notifier vs folio deadlock
Matthew Brost
matthew.brost at intel.com
Tue Apr 15 22:42:20 UTC 2025
On Mon, Apr 14, 2025 at 02:25:40PM +0100, Matthew Auld wrote:
> User is reporting what smells like notifier vs folio deadlock, where
> migrate_pages_batch() on core kernel side is holding folio lock(s) and
> then interacting with the mappings of it, however those mappings are
> tied to some userptr, which means calling into the notifier callback and
> grabbing the notifier lock. With perfect timing it looks possible that
> the pages we pulled from the hmm fault can get sniped by
> migrate_pages_batch() at the same time that we are holding the notifier
> lock to mark the pages as accessed/dirty, but at this point we also want
> to grab the folio locks(s) to mark them as dirty, but if they are
> contended from notifier/migrate_pages_batch side then we deadlock since
> folio lock won't be dropped until we drop the notifier lock.
>
> Fortunately the mark_page_accessed/dirty is not really needed in the
> first place it seems and should have already been done by hmm fault, so
> just remove it.
>
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4765
> Fixes: 0a98219bcc96 ("drm/xe/hmm: Don't dereference struct page pointers without notifier lock")
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Thomas Hellström <thomas.hellstrom at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> Cc: <stable at vger.kernel.org> # v6.10+
> ---
> drivers/gpu/drm/xe/xe_hmm.c | 24 ------------------------
> 1 file changed, 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
> index c3cc0fa105e8..57b71956ddf4 100644
> --- a/drivers/gpu/drm/xe/xe_hmm.c
> +++ b/drivers/gpu/drm/xe/xe_hmm.c
> @@ -19,29 +19,6 @@ static u64 xe_npages_in_range(unsigned long start, unsigned long end)
> return (end - start) >> PAGE_SHIFT;
> }
>
> -/**
> - * xe_mark_range_accessed() - mark a range is accessed, so core mm
> - * have such information for memory eviction or write back to
> - * hard disk
> - * @range: the range to mark
> - * @write: if write to this range, we mark pages in this range
> - * as dirty
> - */
> -static void xe_mark_range_accessed(struct hmm_range *range, bool write)
> -{
> - struct page *page;
> - u64 i, npages;
> -
> - npages = xe_npages_in_range(range->start, range->end);
> - for (i = 0; i < npages; i++) {
> - page = hmm_pfn_to_page(range->hmm_pfns[i]);
> - if (write)
> - set_page_dirty_lock(page);
> -
> - mark_page_accessed(page);
> - }
> -}
> -
> static int xe_alloc_sg(struct xe_device *xe, struct sg_table *st,
> struct hmm_range *range, struct rw_semaphore *notifier_sem)
> {
> @@ -331,7 +308,6 @@ int xe_hmm_userptr_populate_range(struct xe_userptr_vma *uvma,
> if (ret)
> goto out_unlock;
>
> - xe_mark_range_accessed(&hmm_range, write);
> userptr->sg = &userptr->sgt;
> xe_hmm_userptr_set_mapped(uvma);
> userptr->notifier_seq = hmm_range.notifier_seq;
> --
> 2.49.0
>
More information about the Intel-xe
mailing list