[PATCH] drm/gpuvm: merge adjacent gpuva range during a map operation

Mon Sep 23 14:24:02 UTC 2024

Hi Danilo,

> -----Original Message-----
> From: Danilo Krummrich <dakr at kernel.org>
> Sent: Monday, September 23, 2024 4:22 AM
> To: Zeng, Oak <oak.zeng at intel.com>
> Cc: Brost, Matthew <matthew.brost at intel.com>; intel-
> xe at lists.freedesktop.org
> Subject: Re: [PATCH] drm/gpuvm: merge adjacent gpuva range during
> a map operation
> 
> 
> On 9/19/24 5:09 PM, Zeng, Oak wrote:
> >
> >
> >> -----Original Message-----
> >> From: Brost, Matthew <matthew.brost at intel.com>
> >> Sent: Wednesday, September 18, 2024 2:38 PM
> >> To: Zeng, Oak <oak.zeng at intel.com>
> >> Cc: intel-xe at lists.freedesktop.org; dakr at redhat.com
> >> Subject: Re: [PATCH] drm/gpuvm: merge adjacent gpuva range
> during
> >> a map operation
> >>
> >> On Wed, Sep 18, 2024 at 12:47:40PM -0400, Oak Zeng wrote:
> >>
> >> Please sent patches which touch common code to dri-devel.
> >>
> >>> Considder this example. Before a map operation, the gpuva
> ranges
> >>> in a vm looks like below:
> >>>
> >>>   VAs | start              | range              | end                | object             |
> >> object offset
> >>> -------------------------------------------------------------------------------
> ----
> >> --------------------------
> >>>       | 0x0000000000000000 | 0x00007ffff5cd0000 |
> 0x00007ffff5cd0000
> >> | 0x0000000000000000 | 0x0000000000000000
> >>>       | 0x00007ffff5cf0000 | 0x00000000000c7000 |
> 0x00007ffff5db7000
> >> | 0x0000000000000000 | 0x0000000000000000
> >>>
> >>> Now user want to map range [0x00007ffff5cd0000 -
> >> 0x00007ffff5cf0000).
> >>> With existing codes, the range walking in
> __drm_gpuvm_sm_map
> >> won't
> >>> find any range, so we end up a single map operation for range
> >>> [0x00007ffff5cd0000 - 0x00007ffff5cf0000). This result in:
> >>>
> >>>   VAs | start              | range              | end                | object             |
> >> object offset
> >>> -------------------------------------------------------------------------------
> ----
> >> --------------------------
> >>>       | 0x0000000000000000 | 0x00007ffff5cd0000 |
> 0x00007ffff5cd0000
> >> | 0x0000000000000000 | 0x0000000000000000
> >>>       | 0x00007ffff5cd0000 | 0x0000000000020000 |
> 0x00007ffff5cf0000
> >> | 0x0000000000000000 | 0x0000000000000000
> >>>       | 0x00007ffff5cf0000 | 0x00000000000c7000 |
> 0x00007ffff5db7000
> >> | 0x0000000000000000 | 0x0000000000000000
> >>>
> >>> The correct behavior is to merge those 3 ranges. So
> >> __drm_gpuvm_sm_map
> >>
> >> Danilo - correct me if I'm wrong, but I believe early in gpuvm you
> had
> >> similar code to this which could optionally be used. I was of the
> >> thinking Xe didn't want this behavior and eventually this behavior
> was
> >> ripped out prior to merging.
> >>
> >>> is slightly modified to handle this corner case. The walker is
> changed
> >>> to find the range just before or after the mapping request, and
> >> merge
> >>> adjacent ranges using unmap and map operations. with this
> change,
> >> the
> >>
> >> This would problematic in Xe for several reasons.
> >>
> >> 1. This would create a window in which previously valid mappings
> are
> >> unmapped by our bind code implementation which could result in
> a
> >> fault.
> >> Remap operations can create a similar window but it is handled by
> >> either
> >> only unmapping the required range or using dma-resv slots to
> close
> >> this
> >> window ensuring nothing is running on the GPU while valid
> mappings
> >> are
> >> unmapped. A series of UNMAP, UNMAP, and MAP ops currently
> >> doesn't detect
> >> the problematic window. If we wanted to do something like this,
> we'd
> >> probably need to a new op like MERGE or something to help
> detect
> >> this
> >> window.
> >>
> >> 2. Consider this case.
> >>
> >> 0x0000000000000000-0x00007ffff5cd0000 VMA[A]
> >> 0x00007ffff5cf0000-0x00000000000c7000 VMA[B]
> >> 0x00007ffff5cd0000-0x0000000000020000 VMA[C]
> >>
> >> What is VMA[A], VMA[B], and VMA[C] are all setup with different
> >> driver
> >> specific implmentation properties (e.g. pat_index). These VMAs
> >> cannot be
> >> merged. GPUVM has no visablity to this. If we wanted to do this I
> >> think
> >> we'd need a gpuvm vfunc that calls into the driver to determine if
> we
> >> can merge VMAs.
> >
> > #1, #2 are all reasonable to me. Agree if we want this merge
> behavior, more work is needed.
> >
> >>
> >> 3. What is the ROI of this? Slightly reducing the VMA count?
> Perhaps
> >> allowing larger GPU is very specific corner cases? Give 1), 2) I'd say
> >> just leave GPUVM as is rather than add this complexity and then
> >> make all
> >> driver use GPUVM absorb this behavior change.
> >
> > This patch is an old one in my back log. I roughly remember I ran into
> a situation where there were two duplicated VMAs covering
> > Same virtual address range are kept in gpuvm's RB-tree. One VMA
> was actually already destroyed. This further caused issues as
> > The destroyed VMA was found during a GPUVM RB-tree walk. This
> triggered me to look into the gpuvm merge split logic and end
> > Up with this patch. This patch did fix that issue.
> 
> That would indeed be a big issue. As Matt suggests, is there a
> reproducer?
> 
> Either way, adding merge support can't be the fix for this, we need a
> separate
> one, that's back-portable.
> 

The discussion went on when you were away. See https://patchwork.freedesktop.org/patch/614941/?series=138835&rev=1

Matt and me agreed to implement a merge logic in gpuvm, but gpuvm need to check a driver cookie/callback to decide merge or not.
We reached this conclusion based on some requirement from system allocator design. See more details in above link.

Can you take a look and let us know whether you agree?

> Also, can we move this on DRI-devel please?

Yes will do.

Oak

> 
> - Danilo
> 
> >
> > But I don't remember the details now. I need to go back to it to find
> more details.
> >
> >  From design perspective, I think merging adjacent contiguous
> ranges is a cleaner design. Merging for some use cases (I am not sure
> > We do merge for some cases, just guess from the function name
> _sm_) but not merging for other use cases creates a design hole and
> > Eventually such behavior can potentially mess things up. Maybe
> xekmd today doesn't have such use cases, but people may run into
> > Situation where they want a merge behavior.
> >
> > If we decide only merge for some case but not for other cases, we
> need a clear documentation of the behavior.
> >
> > Oak
> >
> >>
> >> Matt
> >>
> >>> end result of above example is as below:
> >>>
> >>>   VAs | start              | range              | end                | object             |
> >> object offset
> >>> -------------------------------------------------------------------------------
> ----
> >> --------------------------
> >>>       | 0x0000000000000000 | 0x00007ffff5db7000 |
> >> 0x00007ffff5db7000 | 0x0000000000000000 | 0x0000000000000000
> >>>
> >>> Even though this fixes a real problem, the codes looks a little ugly.
> >>> So I welcome any better fix or suggestion.
> >>>
> >>> Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> >>> ---
> >>>   drivers/gpu/drm/drm_gpuvm.c | 62
> >> +++++++++++++++++++++++++------------
> >>>   1 file changed, 43 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
> >> b/drivers/gpu/drm/drm_gpuvm.c
> >>> index 4b6fcaea635e..51825c794bdc 100644
> >>> --- a/drivers/gpu/drm/drm_gpuvm.c
> >>> +++ b/drivers/gpu/drm/drm_gpuvm.c
> >>> @@ -2104,28 +2104,30 @@ __drm_gpuvm_sm_map(struct
> >> drm_gpuvm *gpuvm,
> >>>   {
> >>>   	struct drm_gpuva *va, *next;
> >>>   	u64 req_end = req_addr + req_range;
> >>> +	u64 merged_req_addr = req_addr;
> >>> +	u64 merged_req_end = req_end;
> >>>   	int ret;
> >>>
> >>>   	if (unlikely(!drm_gpuvm_range_valid(gpuvm, req_addr,
> >> req_range)))
> >>>   		return -EINVAL;
> >>>
> >>> -	drm_gpuvm_for_each_va_range_safe(va, next, gpuvm,
> >> req_addr, req_end) {
> >>> +	drm_gpuvm_for_each_va_range_safe(va, next, gpuvm,
> >> req_addr - 1, req_end + 1) {
> >>>   		struct drm_gem_object *obj = va->gem.obj;
> >>>   		u64 offset = va->gem.offset;
> >>>   		u64 addr = va->va.addr;
> >>>   		u64 range = va->va.range;
> >>>   		u64 end = addr + range;
> >>> -		bool merge = !!va->gem.obj;
> >>> +		bool merge;
> >>>
> >>>   		if (addr == req_addr) {
> >>> -			merge &= obj == req_obj &&
> >>> +			merge = obj == req_obj &&
> >>>   				 offset == req_offset;
> >>>
> >>>   			if (end == req_end) {
> >>>   				ret = op_unmap_cb(ops, priv, va,
> >> merge);
> >>>   				if (ret)
> >>>   					return ret;
> >>> -				break;
> >>> +				continue;
> >>>   			}
> >>>
> >>>   			if (end < req_end) {
> >>> @@ -2162,22 +2164,33 @@ __drm_gpuvm_sm_map(struct
> >> drm_gpuvm *gpuvm,
> >>>   			};
> >>>   			struct drm_gpuva_op_unmap u = { .va = va };
> >>>
> >>> -			merge &= obj == req_obj &&
> >>> -				 offset + ls_range == req_offset;
> >>> +			merge = (obj && obj == req_obj &&
> >>> +				 offset + ls_range == req_offset) ||
> >>> +				 (!obj && !req_obj);
> >>>   			u.keep = merge;
> >>>
> >>>   			if (end == req_end) {
> >>>   				ret = op_remap_cb(ops, priv, &p,
> >> NULL, &u);
> >>>   				if (ret)
> >>>   					return ret;
> >>> -				break;
> >>> +				continue;
> >>>   			}
> >>>
> >>>   			if (end < req_end) {
> >>> -				ret = op_remap_cb(ops, priv, &p,
> >> NULL, &u);
> >>> -				if (ret)
> >>> -					return ret;
> >>> -				continue;
> >>> +				if (end == req_addr) {
> >>> +					if (merge) {
> >>> +						ret =
> >> op_unmap_cb(ops, priv, va, merge);
> >>> +						if (ret)
> >>> +							return ret;
> >>> +						merged_req_addr =
> >> addr;
> >>> +						continue;
> >>> +					}
> >>> +				} else {
> >>> +					ret = op_remap_cb(ops, priv,
> >> &p, NULL, &u);
> >>> +					if (ret)
> >>> +						return ret;
> >>> +					continue;
> >>> +				}
> >>>   			}
> >>>
> >>>   			if (end > req_end) {
> >>> @@ -2195,15 +2208,16 @@ __drm_gpuvm_sm_map(struct
> >> drm_gpuvm *gpuvm,
> >>>   				break;
> >>>   			}
> >>>   		} else if (addr > req_addr) {
> >>> -			merge &= obj == req_obj &&
> >>> +			merge = (obj && obj == req_obj &&
> >>>   				 offset == req_offset +
> >>> -					   (addr - req_addr);
> >>> +					   (addr - req_addr)) ||
> >>> +				 (!obj && !req_obj);
> >>>
> >>>   			if (end == req_end) {
> >>>   				ret = op_unmap_cb(ops, priv, va,
> >> merge);
> >>>   				if (ret)
> >>>   					return ret;
> >>> -				break;
> >>> +				continue;
> >>>   			}
> >>>
> >>>   			if (end < req_end) {
> >>> @@ -2225,16 +2239,26 @@ __drm_gpuvm_sm_map(struct
> >> drm_gpuvm *gpuvm,
> >>>   					.keep = merge,
> >>>   				};
> >>>
> >>> -				ret = op_remap_cb(ops, priv, NULL,
> >> &n, &u);
> >>> -				if (ret)
> >>> -					return ret;
> >>> -				break;
> >>> +				if (addr == req_end) {
> >>> +					if (merge) {
> >>> +						ret =
> >> op_unmap_cb(ops, priv, va, merge);
> >>> +						if (ret)
> >>> +							return ret;
> >>> +						merged_req_end =
> >> end;
> >>> +						break;
> >>> +					}
> >>> +				} else {
> >>> +					ret = op_remap_cb(ops, priv,
> >> NULL, &n, &u);
> >>> +					if (ret)
> >>> +						return ret;
> >>> +					break;
> >>> +				}
> >>>   			}
> >>>   		}
> >>>   	}
> >>>
> >>>   	return op_map_cb(ops, priv,
> >>> -			 req_addr, req_range,
> >>> +			 merged_req_addr, merged_req_end -
> >> merged_req_addr,
> >>>   			 req_obj, req_offset);
> >>>   }
> >>>
> >>> --
> >>> 2.26.3
> >>>
> >