[PATCH] drm/gpuvm: merge adjacent gpuva range during a map operation

Thu Sep 19 18:29:36 UTC 2024

On Thu, Sep 19, 2024 at 10:32:31AM -0600, Zeng, Oak wrote:
> 
> 
> > -----Original Message-----
> > From: Brost, Matthew <matthew.brost at intel.com>
> > Sent: Thursday, September 19, 2024 11:48 AM
> > To: Zeng, Oak <oak.zeng at intel.com>
> > Cc: intel-xe at lists.freedesktop.org; dakr at redhat.com
> > Subject: Re: [PATCH] drm/gpuvm: merge adjacent gpuva range during
> > a map operation
> > 
> > On Thu, Sep 19, 2024 at 09:09:57AM -0600, Zeng, Oak wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Brost, Matthew <matthew.brost at intel.com>
> > > > Sent: Wednesday, September 18, 2024 2:38 PM
> > > > To: Zeng, Oak <oak.zeng at intel.com>
> > > > Cc: intel-xe at lists.freedesktop.org; dakr at redhat.com
> > > > Subject: Re: [PATCH] drm/gpuvm: merge adjacent gpuva range
> > during
> > > > a map operation
> > > >
> > > > On Wed, Sep 18, 2024 at 12:47:40PM -0400, Oak Zeng wrote:
> > > >
> > > > Please sent patches which touch common code to dri-devel.
> > > >
> > > > > Considder this example. Before a map operation, the gpuva
> > ranges
> > > > > in a vm looks like below:
> > > > >
> > > > >  VAs | start              | range              | end                | object             |
> > > > object offset
> > > > > ------------------------------------------------------------------------------
> > -----
> > > > --------------------------
> > > > >      | 0x0000000000000000 | 0x00007ffff5cd0000 |
> > 0x00007ffff5cd0000
> > > > | 0x0000000000000000 | 0x0000000000000000
> > > > >      | 0x00007ffff5cf0000 | 0x00000000000c7000 |
> > 0x00007ffff5db7000
> > > > | 0x0000000000000000 | 0x0000000000000000
> > > > >
> > > > > Now user want to map range [0x00007ffff5cd0000 -
> > > > 0x00007ffff5cf0000).
> > > > > With existing codes, the range walking in
> > __drm_gpuvm_sm_map
> > > > won't
> > > > > find any range, so we end up a single map operation for range
> > > > > [0x00007ffff5cd0000 - 0x00007ffff5cf0000). This result in:
> > > > >
> > > > >  VAs | start              | range              | end                | object             |
> > > > object offset
> > > > > ------------------------------------------------------------------------------
> > -----
> > > > --------------------------
> > > > >      | 0x0000000000000000 | 0x00007ffff5cd0000 |
> > 0x00007ffff5cd0000
> > > > | 0x0000000000000000 | 0x0000000000000000
> > > > >      | 0x00007ffff5cd0000 | 0x0000000000020000 |
> > 0x00007ffff5cf0000
> > > > | 0x0000000000000000 | 0x0000000000000000
> > > > >      | 0x00007ffff5cf0000 | 0x00000000000c7000 |
> > 0x00007ffff5db7000
> > > > | 0x0000000000000000 | 0x0000000000000000
> > > > >
> > > > > The correct behavior is to merge those 3 ranges. So
> > > > __drm_gpuvm_sm_map
> > > >
> > > > Danilo - correct me if I'm wrong, but I believe early in gpuvm you
> > had
> > > > similar code to this which could optionally be used. I was of the
> > > > thinking Xe didn't want this behavior and eventually this behavior
> > was
> > > > ripped out prior to merging.
> > > >
> > > > > is slightly modified to handle this corner case. The walker is
> > changed
> > > > > to find the range just before or after the mapping request, and
> > > > merge
> > > > > adjacent ranges using unmap and map operations. with this
> > change,
> > > > the
> > > >
> > > > This would problematic in Xe for several reasons.
> > > >
> > > > 1. This would create a window in which previously valid mappings
> > are
> > > > unmapped by our bind code implementation which could result in
> > a
> > > > fault.
> > > > Remap operations can create a similar window but it is handled by
> > > > either
> > > > only unmapping the required range or using dma-resv slots to
> > close
> > > > this
> > > > window ensuring nothing is running on the GPU while valid
> > mappings
> > > > are
> > > > unmapped. A series of UNMAP, UNMAP, and MAP ops currently
> > > > doesn't detect
> > > > the problematic window. If we wanted to do something like this,
> > we'd
> > > > probably need to a new op like MERGE or something to help
> > detect
> > > > this
> > > > window.
> > > >
> > > > 2. Consider this case.
> > > >
> > > > 0x0000000000000000-0x00007ffff5cd0000 VMA[A]
> > > > 0x00007ffff5cf0000-0x00000000000c7000 VMA[B]
> > > > 0x00007ffff5cd0000-0x0000000000020000 VMA[C]
> > > >
> > > > What is VMA[A], VMA[B], and VMA[C] are all setup with different
> > > > driver
> > > > specific implmentation properties (e.g. pat_index). These VMAs
> > > > cannot be
> > > > merged. GPUVM has no visablity to this. If we wanted to do this I
> > > > think
> > > > we'd need a gpuvm vfunc that calls into the driver to determine if
> > we
> > > > can merge VMAs.
> > >
> > > #1, #2 are all reasonable to me. Agree if we want this merge
> > behavior, more work is needed.
> > >
> > > >
> > > > 3. What is the ROI of this? Slightly reducing the VMA count?
> > Perhaps
> > > > allowing larger GPU is very specific corner cases? Give 1), 2) I'd say
> > > > just leave GPUVM as is rather than add this complexity and then
> > > > make all
> > > > driver use GPUVM absorb this behavior change.
> > >
> > > This patch is an old one in my back log. I roughly remember I ran into
> > a situation where there were two duplicated VMAs covering
> > > Same virtual address range are kept in gpuvm's RB-tree. One VMA
> > was actually already destroyed. This further caused issues as
> > > The destroyed VMA was found during a GPUVM RB-tree walk. This
> > triggered me to look into the gpuvm merge split logic and end
> > > Up with this patch. This patch did fix that issue.
> > >
> > 
> > If a destroyed VMA is in the RB tree that would be a big issue and
> > definitely would need to be fixed.
> > 
> > Adding a test case to show the issue you describe would be good.
> > Also if
> > we end doing something with merging adding a test case for the
> > description in the commit message would also be good.
> > 
> > > But I don't remember the details now. I need to go back to it to find
> > more details.
> > >
> > 
> > That would be good.
> > 
> > > From design perspective, I think merging adjacent contiguous
> > ranges is a cleaner design. Merging for some use cases (I am not sure
> > > We do merge for some cases, just guess from the function name
> > _sm_) but not merging for other use cases creates a design hole and
> > > Eventually such behavior can potentially mess things up. Maybe
> > xekmd today doesn't have such use cases, but people may run into
> > > Situation where they want a merge behavior.
> > >
> > 
> > I don't think Xe has a current use case, but the situation you describe
> > is very similar to a system allocator case where we would want
> > merging.
> > 
> > Simple example below.
> > 
> > Initital State:
> > VMA[A] 0x0000-0x0fff - System allocator VMA
> > VMA[B] 0x1000-0x1fff - BO binding VMA
> > VMA[C] 0x2000-0x2fff - System allocator VMA
> > 
> > User op:
> > Bind 0x1000-0x1fff to sytem allocator
> > 
> > Ideally we really want this final state:
> > VMA[D] 0x0000-0x2fff - System allocator VMA
> > 
> > The without merging like above as BO bindings are bound / unbound
> > the
> > system allocator space will get fragmented into lots of VMA which is
> > not
> > ideal.
> 
> Yes I observed the same when I ran system allocator. From one log I captured, I end up with below fragmented address space:
> 
> VAs | start              | range              | end                | object             | object offset
> -------------------------------------------------------------------------------------------------------------
>      | 0x0000000000000000 | 0x000000000044c000 | 0x000000000044c000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x000000000044c000 | 0x0000000000001000 | 0x000000000044d000 | 0x0000000000000000 | 0x000000000044c000
>      | 0x000000000044d000 | 0x0000000000002000 | 0x000000000044f000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x000000000044f000 | 0x0000000000001000 | 0x0000000000450000 | 0x0000000000000000 | 0x000000000044f000
>      | 0x0000000000450000 | 0x0000000000019000 | 0x0000000000469000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x0000000000469000 | 0x0000000000001000 | 0x000000000046a000 | 0x0000000000000000 | 0x0000000000469000
>      | 0x000000000046a000 | 0x000000000000f000 | 0x0000000000479000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x0000000000479000 | 0x0000000000001000 | 0x000000000047a000 | 0x0000000000000000 | 0x0000000000479000
>      | 0x000000000047a000 | 0x0000000000006000 | 0x0000000000480000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x0000000000480000 | 0x0000000000001000 | 0x0000000000481000 | 0x0000000000000000 | 0x0000000000480000
>      | 0x0000000000481000 | 0x0000000000011000 | 0x0000000000492000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x0000000000492000 | 0x0000000000001000 | 0x0000000000493000 | 0x0000000000000000 | 0x0000000000492000
>      | 0x0000000000493000 | 0x0000000000005000 | 0x0000000000498000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x0000000000498000 | 0x0000000000001000 | 0x0000000000499000 | 0x0000000000000000 | 0x0000000000498000
>      | 0x0000000000499000 | 0x0000000000001000 | 0x000000000049a000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x000000000049a000 | 0x0000000000001000 | 0x000000000049b000 | 0x0000000000000000 | 0x000000000049a000
>      | 0x000000000049b000 | 0x0000000000002000 | 0x000000000049d000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x000000000049d000 | 0x0000000000001000 | 0x000000000049e000 | 0x0000000000000000 | 0x000000000049d000
>      | 0x000000000049e000 | 0x0000000000010000 | 0x00000000004ae000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x00000000004ae000 | 0x0000000000001000 | 0x00000000004af000 | 0x0000000000000000 | 0x00000000004ae000
>      | 0x00000000004af000 | 0x0000000000019000 | 0x00000000004c8000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x00000000004c8000 | 0x0000000000001000 | 0x00000000004c9000 | 0x0000000000000000 | 0x00000000004c8000
>      | 0x00000000004c9000 | 0x0000000000036000 | 0x00000000004ff000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x00000000004ff000 | 0x0000000000001000 | 0x0000000000500000 | 0x0000000000000000 | 0x00000000004ff000
>      | 0x0000000000500000 | 0x0000000000008000 | 0x0000000000508000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x0000000000508000 | 0x0000000000001000 | 0x0000000000509000 | 0x0000000000000000 | 0x0000000000508000
>      | 0x0000000000509000 | 0x0000000000015000 | 0x000000000051e000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x000000000051e000 | 0x0000000000001000 | 0x000000000051f000 | 0x0000000000000000 | 0x000000000051e000
>      | 0x000000000051f000 | 0x00007ffff58b1000 | 0x00007ffff5dd0000 | 0x0000000000000000 | 0x0000000000000000
>      | 0x00007ffff5dd0000 | 0x0000000000020000 | 0x00007ffff5df0000 | 0xff1100013e668400 | 0x0000000000000000
>      | 0x00007ffff5df0000 | 0x00ff80000a210000 | 0x0100000000000000 | 0x0000000000000000 | 0x0000000000000000
> 
> This list can go on if you run more test with single gpuvm. In a real world use case, this could be pretty bad.
> 

Agree this could get pretty bad on memory heavy app in real use case.

> 
> > 
> > So here 1) from my list is a non-issue as UNMAP system allocator
> > VMAs
> > don't interact with the hardware. 2) could still be an issue as VMA[A],
> > VMA[C] could have different caching or migration policies.
> > 
> > > If we decide only merge for some case but not for other cases, we
> > need a clear documentation of the behavior.
> > >
> > 
> > If this was added merging it likely would a be optional user controled
> > thing. I suggested a vfunc or something to test for merge condition,
> > we
> > could just use a user defined cookie attached to VMA that GPUVM
> > could
> > match on for merging (also could be used as enable merging if cookie
> > is
> > non-zero). That actually seems pretty clean.
> 
> This approach sounds good to me. 
> 
> This is a low priority thing to me. I have to deal with some higher priority thing now.
> I am open if other people want to continue this work, in which case I can help review
> And test it a bit. 
>

Agree it is a low priority as nothing is really broken rather perf could
degrade over time / memory footprint of the gpuvm will get larger over
time.

Also agree that fixing this in the gpuvm layer is probably the best
place so other drivers can make use of this too.

Matt

> Oak
> 
> 
> > 
> > Matt
> > 
> > > Oak
> > >
> > > >
> > > > Matt
> > > >
> > > > > end result of above example is as below:
> > > > >
> > > > >  VAs | start              | range              | end                | object             |
> > > > object offset
> > > > > ------------------------------------------------------------------------------
> > -----
> > > > --------------------------
> > > > >      | 0x0000000000000000 | 0x00007ffff5db7000 |
> > > > 0x00007ffff5db7000 | 0x0000000000000000 | 0x0000000000000000
> > > > >
> > > > > Even though this fixes a real problem, the codes looks a little
> > ugly.
> > > > > So I welcome any better fix or suggestion.
> > > > >
> > > > > Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/drm_gpuvm.c | 62
> > > > +++++++++++++++++++++++++------------
> > > > >  1 file changed, 43 insertions(+), 19 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c
> > > > b/drivers/gpu/drm/drm_gpuvm.c
> > > > > index 4b6fcaea635e..51825c794bdc 100644
> > > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > > @@ -2104,28 +2104,30 @@ __drm_gpuvm_sm_map(struct
> > > > drm_gpuvm *gpuvm,
> > > > >  {
> > > > >  	struct drm_gpuva *va, *next;
> > > > >  	u64 req_end = req_addr + req_range;
> > > > > +	u64 merged_req_addr = req_addr;
> > > > > +	u64 merged_req_end = req_end;
> > > > >  	int ret;
> > > > >
> > > > >  	if (unlikely(!drm_gpuvm_range_valid(gpuvm, req_addr,
> > > > req_range)))
> > > > >  		return -EINVAL;
> > > > >
> > > > > -	drm_gpuvm_for_each_va_range_safe(va, next, gpuvm,
> > > > req_addr, req_end) {
> > > > > +	drm_gpuvm_for_each_va_range_safe(va, next, gpuvm,
> > > > req_addr - 1, req_end + 1) {
> > > > >  		struct drm_gem_object *obj = va->gem.obj;
> > > > >  		u64 offset = va->gem.offset;
> > > > >  		u64 addr = va->va.addr;
> > > > >  		u64 range = va->va.range;
> > > > >  		u64 end = addr + range;
> > > > > -		bool merge = !!va->gem.obj;
> > > > > +		bool merge;
> > > > >
> > > > >  		if (addr == req_addr) {
> > > > > -			merge &= obj == req_obj &&
> > > > > +			merge = obj == req_obj &&
> > > > >  				 offset == req_offset;
> > > > >
> > > > >  			if (end == req_end) {
> > > > >  				ret = op_unmap_cb(ops, priv, va,
> > > > merge);
> > > > >  				if (ret)
> > > > >  					return ret;
> > > > > -				break;
> > > > > +				continue;
> > > > >  			}
> > > > >
> > > > >  			if (end < req_end) {
> > > > > @@ -2162,22 +2164,33 @@ __drm_gpuvm_sm_map(struct
> > > > drm_gpuvm *gpuvm,
> > > > >  			};
> > > > >  			struct drm_gpuva_op_unmap u = { .va = va };
> > > > >
> > > > > -			merge &= obj == req_obj &&
> > > > > -				 offset + ls_range == req_offset;
> > > > > +			merge = (obj && obj == req_obj &&
> > > > > +				 offset + ls_range == req_offset) ||
> > > > > +				 (!obj && !req_obj);
> > > > >  			u.keep = merge;
> > > > >
> > > > >  			if (end == req_end) {
> > > > >  				ret = op_remap_cb(ops, priv, &p,
> > > > NULL, &u);
> > > > >  				if (ret)
> > > > >  					return ret;
> > > > > -				break;
> > > > > +				continue;
> > > > >  			}
> > > > >
> > > > >  			if (end < req_end) {
> > > > > -				ret = op_remap_cb(ops, priv, &p,
> > > > NULL, &u);
> > > > > -				if (ret)
> > > > > -					return ret;
> > > > > -				continue;
> > > > > +				if (end == req_addr) {
> > > > > +					if (merge) {
> > > > > +						ret =
> > > > op_unmap_cb(ops, priv, va, merge);
> > > > > +						if (ret)
> > > > > +							return ret;
> > > > > +						merged_req_addr =
> > > > addr;
> > > > > +						continue;
> > > > > +					}
> > > > > +				} else {
> > > > > +					ret = op_remap_cb(ops, priv,
> > > > &p, NULL, &u);
> > > > > +					if (ret)
> > > > > +						return ret;
> > > > > +					continue;
> > > > > +				}
> > > > >  			}
> > > > >
> > > > >  			if (end > req_end) {
> > > > > @@ -2195,15 +2208,16 @@ __drm_gpuvm_sm_map(struct
> > > > drm_gpuvm *gpuvm,
> > > > >  				break;
> > > > >  			}
> > > > >  		} else if (addr > req_addr) {
> > > > > -			merge &= obj == req_obj &&
> > > > > +			merge = (obj && obj == req_obj &&
> > > > >  				 offset == req_offset +
> > > > > -					   (addr - req_addr);
> > > > > +					   (addr - req_addr)) ||
> > > > > +				 (!obj && !req_obj);
> > > > >
> > > > >  			if (end == req_end) {
> > > > >  				ret = op_unmap_cb(ops, priv, va,
> > > > merge);
> > > > >  				if (ret)
> > > > >  					return ret;
> > > > > -				break;
> > > > > +				continue;
> > > > >  			}
> > > > >
> > > > >  			if (end < req_end) {
> > > > > @@ -2225,16 +2239,26 @@ __drm_gpuvm_sm_map(struct
> > > > drm_gpuvm *gpuvm,
> > > > >  					.keep = merge,
> > > > >  				};
> > > > >
> > > > > -				ret = op_remap_cb(ops, priv, NULL,
> > > > &n, &u);
> > > > > -				if (ret)
> > > > > -					return ret;
> > > > > -				break;
> > > > > +				if (addr == req_end) {
> > > > > +					if (merge) {
> > > > > +						ret =
> > > > op_unmap_cb(ops, priv, va, merge);
> > > > > +						if (ret)
> > > > > +							return ret;
> > > > > +						merged_req_end =
> > > > end;
> > > > > +						break;
> > > > > +					}
> > > > > +				} else {
> > > > > +					ret = op_remap_cb(ops, priv,
> > > > NULL, &n, &u);
> > > > > +					if (ret)
> > > > > +						return ret;
> > > > > +					break;
> > > > > +				}
> > > > >  			}
> > > > >  		}
> > > > >  	}
> > > > >
> > > > >  	return op_map_cb(ops, priv,
> > > > > -			 req_addr, req_range,
> > > > > +			 merged_req_addr, merged_req_end -
> > > > merged_req_addr,
> > > > >  			 req_obj, req_offset);
> > > > >  }
> > > > >
> > > > > --
> > > > > 2.26.3
> > > > >