[PATCH] drm/gpuvm: merge adjacent gpuva range during a map operation

Thu Sep 19 16:32:31 UTC 2024

> -----Original Message-----
> From: Brost, Matthew <matthew.brost at intel.com>
> Sent: Thursday, September 19, 2024 11:48 AM
> To: Zeng, Oak <oak.zeng at intel.com>
> Cc: intel-xe at lists.freedesktop.org; dakr at redhat.com
> Subject: Re: [PATCH] drm/gpuvm: merge adjacent gpuva range during
> a map operation
> 
> On Thu, Sep 19, 2024 at 09:09:57AM -0600, Zeng, Oak wrote:
> >
> >
> > > -----Original Message-----
> > > From: Brost, Matthew <matthew.brost at intel.com>
> > > Sent: Wednesday, September 18, 2024 2:38 PM
> > > To: Zeng, Oak <oak.zeng at intel.com>
> > > Cc: intel-xe at lists.freedesktop.org; dakr at redhat.com
> > > Subject: Re: [PATCH] drm/gpuvm: merge adjacent gpuva range
> during
> > > a map operation
> > >
> > > On Wed, Sep 18, 2024 at 12:47:40PM -0400, Oak Zeng wrote:
> > >
> > > Please sent patches which touch common code to dri-devel.
> > >
> > > > Considder this example. Before a map operation, the gpuva
> ranges
> > > > in a vm looks like below:
> > > >
> > > >  VAs | start              | range              | end                | object             |
> > > object offset
> > > > ------------------------------------------------------------------------------
> -----
> > > --------------------------
> > > >      | 0x0000000000000000 | 0x00007ffff5cd0000 |
> 0x00007ffff5cd0000
> > > | 0x0000000000000000 | 0x0000000000000000
> > > >      | 0x00007ffff5cf0000 | 0x00000000000c7000 |
> 0x00007ffff5db7000
> > > | 0x0000000000000000 | 0x0000000000000000
> > > >
> > > > Now user want to map range [0x00007ffff5cd0000 -
> > > 0x00007ffff5cf0000).
> > > > With existing codes, the range walking in
> __drm_gpuvm_sm_map
> > > won't
> > > > find any range, so we end up a single map operation for range
> > > > [0x00007ffff5cd0000 - 0x00007ffff5cf0000). This result in:
> > > >
> > > >  VAs | start              | range              | end                | object             |
> > > object offset
> > > > ------------------------------------------------------------------------------
> -----
> > > --------------------------
> > > >      | 0x0000000000000000 | 0x00007ffff5cd0000 |
> 0x00007ffff5cd0000
> > > | 0x0000000000000000 | 0x0000000000000000
> > > >      | 0x00007ffff5cd0000 | 0x0000000000020000 |
> 0x00007ffff5cf0000
> > > | 0x0000000000000000 | 0x0000000000000000
> > > >      | 0x00007ffff5cf0000 | 0x00000000000c7000 |
> 0x00007ffff5db7000
> > > | 0x0000000000000000 | 0x0000000000000000
> > > >
> > > > The correct behavior is to merge those 3 ranges. So
> > > __drm_gpuvm_sm_map
> > >
> > > Danilo - correct me if I'm wrong, but I believe early in gpuvm you
> had
> > > similar code to this which could optionally be used. I was of the
> > > thinking Xe didn't want this behavior and eventually this behavior
> was
> > > ripped out prior to merging.
> > >
> > > > is slightly modified to handle this corner case. The walker is
> changed
> > > > to find the range just before or after the mapping request, and
> > > merge
> > > > adjacent ranges using unmap and map operations. with this
> change,
> > > the
> > >
> > > This would problematic in Xe for several reasons.
> > >
> > > 1. This would create a window in which previously valid mappings
> are
> > > unmapped by our bind code implementation which could result in
> a
> > > fault.
> > > Remap operations can create a similar window but it is handled by
> > > either
> > > only unmapping the required range or using dma-resv slots to
> close
> > > this
> > > window ensuring nothing is running on the GPU while valid
> mappings
> > > are
> > > unmapped. A series of UNMAP, UNMAP, and MAP ops currently
> > > doesn't detect
> > > the problematic window. If we wanted to do something like this,
> we'd
> > > probably need to a new op like MERGE or something to help
> detect
> > > this
> > > window.
> > >
> > > 2. Consider this case.
> > >
> > > 0x0000000000000000-0x00007ffff5cd0000 VMA[A]
> > > 0x00007ffff5cf0000-0x00000000000c7000 VMA[B]
> > > 0x00007ffff5cd0000-0x0000000000020000 VMA[C]
> > >
> > > What is VMA[A], VMA[B], and VMA[C] are all setup with different
> > > driver
> > > specific implmentation properties (e.g. pat_index). These VMAs
> > > cannot be
> > > merged. GPUVM has no visablity to this. If we wanted to do this I
> > > think
> > > we'd need a gpuvm vfunc that calls into the driver to determine if
> we
> > > can merge VMAs.
> >
> > #1, #2 are all reasonable to me. Agree if we want this merge
> behavior, more work is needed.
> >
> > >
> > > 3. What is the ROI of this? Slightly reducing the VMA count?
> Perhaps
> > > allowing larger GPU is very specific corner cases? Give 1), 2) I'd say
> > > just leave GPUVM as is rather than add this complexity and then
> > > make all
> > > driver use GPUVM absorb this behavior change.
> >
> > This patch is an old one in my back log. I roughly remember I ran into
> a situation where there were two duplicated VMAs covering
> > Same virtual address range are kept in gpuvm's RB-tree. One VMA
> was actually already destroyed. This further caused issues as
> > The destroyed VMA was found during a GPUVM RB-tree walk. This
> triggered me to look into the gpuvm merge split logic and end
> > Up with this patch. This patch did fix that issue.
> >
> 
> If a destroyed VMA is in the RB tree that would be a big issue and
> definitely would need to be fixed.
> 
> Adding a test case to show the issue you describe would be good.
> Also if
> we end doing something with merging adding a test case for the
> description in the commit message would also be good.
> 
> > But I don't remember the details now. I need to go back to it to find
> more details.
> >
> 
> That would be good.
> 
> > From design perspective, I think merging adjacent contiguous
> ranges is a cleaner design. Merging for some use cases (I am not sure
> > We do merge for some cases, just guess from the function name
> _sm_) but not merging for other use cases creates a design hole and
> > Eventually such behavior can potentially mess things up. Maybe
> xekmd today doesn't have such use cases, but people may run into
> > Situation where they want a merge behavior.
> >
> 
> I don't think Xe has a current use case, but the situation you describe
> is very similar to a system allocator case where we would want
> merging.
> 
> Simple example below.
> 
> Initital State:
> VMA[A] 0x0000-0x0fff - System allocator VMA
> VMA[B] 0x1000-0x1fff - BO binding VMA
> VMA[C] 0x2000-0x2fff - System allocator VMA
> 
> User op:
> Bind 0x1000-0x1fff to sytem allocator
> 
> Ideally we really want this final state:
> VMA[D] 0x0000-0x2fff - System allocator VMA
> 
> The without merging like above as BO bindings are bound / unbound
> the
> system allocator space will get fragmented into lots of VMA which is
> not
> ideal.

Yes I observed the same when I ran system allocator. From one log I captured, I end up with below fragmented address space:

VAs | start              | range              | end                | object             | object offset
-------------------------------------------------------------------------------------------------------------
     | 0x0000000000000000 | 0x000000000044c000 | 0x000000000044c000 | 0x0000000000000000 | 0x0000000000000000
     | 0x000000000044c000 | 0x0000000000001000 | 0x000000000044d000 | 0x0000000000000000 | 0x000000000044c000
     | 0x000000000044d000 | 0x0000000000002000 | 0x000000000044f000 | 0x0000000000000000 | 0x0000000000000000
     | 0x000000000044f000 | 0x0000000000001000 | 0x0000000000450000 | 0x0000000000000000 | 0x000000000044f000
     | 0x0000000000450000 | 0x0000000000019000 | 0x0000000000469000 | 0x0000000000000000 | 0x0000000000000000
     | 0x0000000000469000 | 0x0000000000001000 | 0x000000000046a000 | 0x0000000000000000 | 0x0000000000469000
     | 0x000000000046a000 | 0x000000000000f000 | 0x0000000000479000 | 0x0000000000000000 | 0x0000000000000000
     | 0x0000000000479000 | 0x0000000000001000 | 0x000000000047a000 | 0x0000000000000000 | 0x0000000000479000
     | 0x000000000047a000 | 0x0000000000006000 | 0x0000000000480000 | 0x0000000000000000 | 0x0000000000000000
     | 0x0000000000480000 | 0x0000000000001000 | 0x0000000000481000 | 0x0000000000000000 | 0x0000000000480000
     | 0x0000000000481000 | 0x0000000000011000 | 0x0000000000492000 | 0x0000000000000000 | 0x0000000000000000
     | 0x0000000000492000 | 0x0000000000001000 | 0x0000000000493000 | 0x0000000000000000 | 0x0000000000492000
     | 0x0000000000493000 | 0x0000000000005000 | 0x0000000000498000 | 0x0000000000000000 | 0x0000000000000000
     | 0x0000000000498000 | 0x0000000000001000 | 0x0000000000499000 | 0x0000000000000000 | 0x0000000000498000
     | 0x0000000000499000 | 0x0000000000001000 | 0x000000000049a000 | 0x0000000000000000 | 0x0000000000000000
     | 0x000000000049a000 | 0x0000000000001000 | 0x000000000049b000 | 0x0000000000000000 | 0x000000000049a000
     | 0x000000000049b000 | 0x0000000000002000 | 0x000000000049d000 | 0x0000000000000000 | 0x0000000000000000
     | 0x000000000049d000 | 0x0000000000001000 | 0x000000000049e000 | 0x0000000000000000 | 0x000000000049d000
     | 0x000000000049e000 | 0x0000000000010000 | 0x00000000004ae000 | 0x0000000000000000 | 0x0000000000000000
     | 0x00000000004ae000 | 0x0000000000001000 | 0x00000000004af000 | 0x0000000000000000 | 0x00000000004ae000
     | 0x00000000004af000 | 0x0000000000019000 | 0x00000000004c8000 | 0x0000000000000000 | 0x0000000000000000
     | 0x00000000004c8000 | 0x0000000000001000 | 0x00000000004c9000 | 0x0000000000000000 | 0x00000000004c8000
     | 0x00000000004c9000 | 0x0000000000036000 | 0x00000000004ff000 | 0x0000000000000000 | 0x0000000000000000
     | 0x00000000004ff000 | 0x0000000000001000 | 0x0000000000500000 | 0x0000000000000000 | 0x00000000004ff000
     | 0x0000000000500000 | 0x0000000000008000 | 0x0000000000508000 | 0x0000000000000000 | 0x0000000000000000
     | 0x0000000000508000 | 0x0000000000001000 | 0x0000000000509000 | 0x0000000000000000 | 0x0000000000508000
     | 0x0000000000509000 | 0x0000000000015000 | 0x000000000051e000 | 0x0000000000000000 | 0x0000000000000000
     | 0x000000000051e000 | 0x0000000000001000 | 0x000000000051f000 | 0x0000000000000000 | 0x000000000051e000
     | 0x000000000051f000 | 0x00007ffff58b1000 | 0x00007ffff5dd0000 | 0x0000000000000000 | 0x0000000000000000
     | 0x00007ffff5dd0000 | 0x0000000000020000 | 0x00007ffff5df0000 | 0xff1100013e668400 | 0x0000000000000000
     | 0x00007ffff5df0000 | 0x00ff80000a210000 | 0x0100000000000000 | 0x0000000000000000 | 0x0000000000000000

This list can go on if you run more test with single gpuvm. In a real world use case, this could be pretty bad.

> 
> So here 1) from my list is a non-issue as UNMAP system allocator
> VMAs
> don't interact with the hardware. 2) could still be an issue as VMA[A],
> VMA[C] could have different caching or migration policies.
> 
> > If we decide only merge for some case but not for other cases, we
> need a clear documentation of the behavior.
> >
> 
> If this was added merging it likely would a be optional user controled
> thing. I suggested a vfunc or something to test for merge condition,
> we
> could just use a user defined cookie attached to VMA that GPUVM
> could
> match on for merging (also could be used as enable merging if cookie
> is
> non-zero). That actually seems pretty clean.

This approach sounds good to me. 

This is a low priority thing to me. I have to deal with some higher priority thing now.
I am open if other people want to continue this work, in which case I can help review
And test it a bit. 

Oak

> 
> Matt
> 
> > Oak
> >
> > >
> > > Matt
> > >
> > > > end result of above example is as below:
> > > >
> > > >  VAs | start              | range              | end                | object             |
> > > object offset
> > > > ------------------------------------------------------------------------------
> -----
> > > --------------------------
> > > >      | 0x0000000000000000 | 0x00007ffff5db7000 |
> > > 0x00007ffff5db7000 | 0x0000000000000000 | 0x0000000000000000
> > > >
> > > > Even though this fixes a real problem, the codes looks a little
> ugly.
> > > > So I welcome any better fix or suggestion.
> > > >
> > > > Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> > > > ---
> > > >  drivers/gpu/drm/drm_gpuvm.c | 62
> > > +++++++++++++++++++++++++------------
> > > >  1 file changed, 43 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c
> > > b/drivers/gpu/drm/drm_gpuvm.c
> > > > index 4b6fcaea635e..51825c794bdc 100644
> > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > @@ -2104,28 +2104,30 @@ __drm_gpuvm_sm_map(struct
> > > drm_gpuvm *gpuvm,
> > > >  {
> > > >  	struct drm_gpuva *va, *next;
> > > >  	u64 req_end = req_addr + req_range;
> > > > +	u64 merged_req_addr = req_addr;
> > > > +	u64 merged_req_end = req_end;
> > > >  	int ret;
> > > >
> > > >  	if (unlikely(!drm_gpuvm_range_valid(gpuvm, req_addr,
> > > req_range)))
> > > >  		return -EINVAL;
> > > >
> > > > -	drm_gpuvm_for_each_va_range_safe(va, next, gpuvm,
> > > req_addr, req_end) {
> > > > +	drm_gpuvm_for_each_va_range_safe(va, next, gpuvm,
> > > req_addr - 1, req_end + 1) {
> > > >  		struct drm_gem_object *obj = va->gem.obj;
> > > >  		u64 offset = va->gem.offset;
> > > >  		u64 addr = va->va.addr;
> > > >  		u64 range = va->va.range;
> > > >  		u64 end = addr + range;
> > > > -		bool merge = !!va->gem.obj;
> > > > +		bool merge;
> > > >
> > > >  		if (addr == req_addr) {
> > > > -			merge &= obj == req_obj &&
> > > > +			merge = obj == req_obj &&
> > > >  				 offset == req_offset;
> > > >
> > > >  			if (end == req_end) {
> > > >  				ret = op_unmap_cb(ops, priv, va,
> > > merge);
> > > >  				if (ret)
> > > >  					return ret;
> > > > -				break;
> > > > +				continue;
> > > >  			}
> > > >
> > > >  			if (end < req_end) {
> > > > @@ -2162,22 +2164,33 @@ __drm_gpuvm_sm_map(struct
> > > drm_gpuvm *gpuvm,
> > > >  			};
> > > >  			struct drm_gpuva_op_unmap u = { .va = va };
> > > >
> > > > -			merge &= obj == req_obj &&
> > > > -				 offset + ls_range == req_offset;
> > > > +			merge = (obj && obj == req_obj &&
> > > > +				 offset + ls_range == req_offset) ||
> > > > +				 (!obj && !req_obj);
> > > >  			u.keep = merge;
> > > >
> > > >  			if (end == req_end) {
> > > >  				ret = op_remap_cb(ops, priv, &p,
> > > NULL, &u);
> > > >  				if (ret)
> > > >  					return ret;
> > > > -				break;
> > > > +				continue;
> > > >  			}
> > > >
> > > >  			if (end < req_end) {
> > > > -				ret = op_remap_cb(ops, priv, &p,
> > > NULL, &u);
> > > > -				if (ret)
> > > > -					return ret;
> > > > -				continue;
> > > > +				if (end == req_addr) {
> > > > +					if (merge) {
> > > > +						ret =
> > > op_unmap_cb(ops, priv, va, merge);
> > > > +						if (ret)
> > > > +							return ret;
> > > > +						merged_req_addr =
> > > addr;
> > > > +						continue;
> > > > +					}
> > > > +				} else {
> > > > +					ret = op_remap_cb(ops, priv,
> > > &p, NULL, &u);
> > > > +					if (ret)
> > > > +						return ret;
> > > > +					continue;
> > > > +				}
> > > >  			}
> > > >
> > > >  			if (end > req_end) {
> > > > @@ -2195,15 +2208,16 @@ __drm_gpuvm_sm_map(struct
> > > drm_gpuvm *gpuvm,
> > > >  				break;
> > > >  			}
> > > >  		} else if (addr > req_addr) {
> > > > -			merge &= obj == req_obj &&
> > > > +			merge = (obj && obj == req_obj &&
> > > >  				 offset == req_offset +
> > > > -					   (addr - req_addr);
> > > > +					   (addr - req_addr)) ||
> > > > +				 (!obj && !req_obj);
> > > >
> > > >  			if (end == req_end) {
> > > >  				ret = op_unmap_cb(ops, priv, va,
> > > merge);
> > > >  				if (ret)
> > > >  					return ret;
> > > > -				break;
> > > > +				continue;
> > > >  			}
> > > >
> > > >  			if (end < req_end) {
> > > > @@ -2225,16 +2239,26 @@ __drm_gpuvm_sm_map(struct
> > > drm_gpuvm *gpuvm,
> > > >  					.keep = merge,
> > > >  				};
> > > >
> > > > -				ret = op_remap_cb(ops, priv, NULL,
> > > &n, &u);
> > > > -				if (ret)
> > > > -					return ret;
> > > > -				break;
> > > > +				if (addr == req_end) {
> > > > +					if (merge) {
> > > > +						ret =
> > > op_unmap_cb(ops, priv, va, merge);
> > > > +						if (ret)
> > > > +							return ret;
> > > > +						merged_req_end =
> > > end;
> > > > +						break;
> > > > +					}
> > > > +				} else {
> > > > +					ret = op_remap_cb(ops, priv,
> > > NULL, &n, &u);
> > > > +					if (ret)
> > > > +						return ret;
> > > > +					break;
> > > > +				}
> > > >  			}
> > > >  		}
> > > >  	}
> > > >
> > > >  	return op_map_cb(ops, priv,
> > > > -			 req_addr, req_range,
> > > > +			 merged_req_addr, merged_req_end -
> > > merged_req_addr,
> > > >  			 req_obj, req_offset);
> > > >  }
> > > >
> > > > --
> > > > 2.26.3
> > > >