[PATCH 1/2] drm/xe_migrate: Switch from drm to dev managed actions

Fri Feb 28 14:47:45 UTC 2025

On Fri, 2025-02-28 at 13:57 +0100, Thomas Hellström wrote:
> On Fri, 2025-02-28 at 12:28 +0000, Matthew Auld wrote:
> > Hi,
> > 
> > On 28/02/2025 11:11, Hellstrom, Thomas wrote:
> > > Hi, Matthew
> > > 
> > > On Fri, 2025-02-28 at 10:21 +0000, Matthew Auld wrote:
> > > > On 28/02/2025 06:52, Aradhya Bhatia wrote:
> > > > > Change the scope of the migrate subsystem to be dev managed
> > > > > instead
> > > > > of
> > > > > drm managed.
> > > > > 
> > > > > The parent pci struct &device, that the xe struct &drm_device
> > > > > is a
> > > > > part
> > > > > of, gets removed when a hot unplug is triggered, which causes
> > > > > the
> > > > > underlying iommu group to get destroyed as well.
> > > > 
> > > > Nice find. But if that's the case then the migrate BO here is
> > > > just
> > > > one
> > > > of many where we will see this. Basically all system memory BO
> > > > can
> > > > suffer from this AFAICT, including userspace owned. So I think
> > > > we
> > > > need
> > > > to rather solve this for all, and I don't think we can really
> > > > tie
> > > > the
> > > > lifetime of all BOs to devm, so likely need a different
> > > > approach.
> > > > 
> > > > I think we might instead need to teardown all dma mappings when
> > > > removing
> > > > the device, leaving the BO intact. It looks like there is
> > > > already
> > > > a
> > > > helper for this, so maybe something roughly like this:
> > > > 
> > > > @@ -980,6 +980,8 @@ void xe_device_remove(struct xe_device *xe)
> > > > 
> > > >           drm_dev_unplug(&xe->drm);
> > > > 
> > > > +       ttm_device_clear_dma_mappings(&xe->ttm);
> > > > +
> > > >           xe_display_driver_remove(xe);
> > > > 
> > > >           xe_heci_gsc_fini(xe);
> > > > 
> > > > But I don't think userptr will be covered by that, just all
> > > > BOs,
> > > > so
> > > > likely need an extra step to also nuke all usersptr dma
> > > > mappings
> > > > somewhere.
> > > 
> > > I have been discussing this a bit with Aradhya, and the problem
> > > is
> > > a
> > > bit complex.
> > > 
> > > Really if the PCI device goes away, we need to move *all* bos to
> > > system. That should clear out any dma mappings, since we map dma
> > > when
> > > moving to TT. Also VRAM bos should be moved to system which will
> > > also
> > > handle things like SVM migration.
> > 
> > My thinking is that from userspace pov, the buffers are about to be
> > lost 
> > anyway, so going through the trouble of moving stuff seems like a
> > waste? 
> > Once the device is unplugged all ioctls are rejected for that
> > instance 
> > and we should have stopped touching the hw, so all that is left is
> > to
> > close stuff and open up a new instance. VRAM is basically just some
> > state tracking at this point (for this driver instance), and actual
> > memory footprint is small. If we move to system memory we might be
> > using 
> > actual system pages for stuff that should really just be thrown
> > away,
> > and will be once the driver instance goes away AFAIK. And for stuff
> > that 
> > has an actual populated ttm_tt, we just discard with something like
> > ttm_device_clear_dma_mappings() during remove.
> 
> That's true for most current use-cases but exported things, like
> multidevice SVM, if you unbind a device you'd currently just want to
> migrate to system, and it's not fatal to the running application
> (subject to review ofc). And I'm not sure what the expected dma-buf
> semantics are, although if we free pages that the remote device has
> pinned and keep accessing, that sounds bad.

Having thought a bit more about this, First, I fully agree that
evicting all of VRAM was probably a bad idea. We need to be selective,
and evict exported VRAM only. That means SVM (no action required until
multi-svm lands) and p2p DMA-buf, (unless there is a way to revoke a
dma-buf) where we at least need to send a move_notify() and wait for
idle. A possible way might be to hook up to xe_evict_flags() and
respond with a NULL placement, purging the bo for anything but SVM.

Pinned p2p VRAM dma-buf I think needs to block the remove() callback,
but we don't support that ATM.

System DMA-buf shouldn't be a problem, but we can't drop the pages
AFAICT.

Imported dma-buf should have the attachment unmapped.

I think that _evict_all() could be made to handle all these situations
with some minor tweaks, but that still leaves the pinned TT bos.

/Thomas

> 
> > 
> > > 
> > > But that doesn't really help with pinned bos. So IMO subsystems
> > > that
> > 
> > I think ttm_device_clear_dma_mappings() also handles pinned BO,
> > AFAICT.
> 
> Yes, you're right. I still think it's reasonable, though to 
> 
> > 
> > I believe the pinned BO are tracked via bdev->unevictable, so in
> > theory 
> > it will nuke the dma mapping there also and discard the pages.
> > 
> > > allocate pinned bos need to be devm managed, and this is a step
> > > in
> > > that
> > > direction. But we probably need to deal with some fallout. For
> > > example
> > > when we take down the vram managers using drmmm_ they'd want to
> > > call
> > > evict_all() and the migrate subsystem is already down.
> > > 
> > > So I think a natural place to deal with such fallouts is the
> > > remove()
> > > callback, while subsystems with pinned bos use devm_
> > > 
> > > But admittedly to me it's not really clear how to *best* handle
> > > this
> > > situation. I suspect if we really stress-test device unbinding on
> > > a
> > > running app we're going to hit a lot of problems.
> > 
> > Yes, there are lots of open issues here, but our testing is still
> > lacking :)
> > 
> > > 
> > > As for the userptrs, I just posted a series that introduces a
> > > per-
> > > userptr dedicated unmap function. We could probably put them on a
> > > device list or something similar that calls that function (or
> > > just
> > > general userptr invalidation) for all userptrs.
> > 
> > Sounds good.
> > 
> > > 
> > > /Thomas
> > > 
> > > 
> > > 
> > > 
> > > 
> > > > 
> > > > > 
> > > > > The migrate subsystem, which handles the lifetime of the
> > > > > page-
> > > > > table
> > > > > tree
> > > > > (pt) BO, doesn't get a chance to keep the BO back during the
> > > > > hot
> > > > > unplug,
> > > > > as all the references to DRM haven't been put back.
> > > > > When all the references to DRM are indeed put back later, the
> > > > > migrate
> > > > > subsystem tries to put back the pt BO. Since the underlying
> > > > > iommu
> > > > > group
> > > > > has been already destroyed, a kernel NULL ptr dereference
> > > > > takes
> > > > > place
> > > > > while attempting to keep back the pt BO.
> > > > > 
> > > > > Signed-off-by: Aradhya Bhatia <aradhya.bhatia at intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/xe/xe_migrate.c | 6 +++---
> > > > >    1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > index 278bc96cf593..4e23adfa208a 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > @@ -97,7 +97,7 @@ struct xe_exec_queue
> > > > > *xe_tile_migrate_exec_queue(struct xe_tile *tile)
> > > > >    	return tile->migrate->q;
> > > > >    }
> > > > >    
> > > > > -static void xe_migrate_fini(struct drm_device *dev, void
> > > > > *arg)
> > > > > +static void xe_migrate_fini(void *arg)
> > > > >    {
> > > > >    	struct xe_migrate *m = arg;
> > > > >    
> > > > > @@ -401,7 +401,7 @@ struct xe_migrate *xe_migrate_init(struct
> > > > > xe_tile *tile)
> > > > >    	struct xe_vm *vm;
> > > > >    	int err;
> > > > >    
> > > > > -	m = drmm_kzalloc(&xe->drm, sizeof(*m), GFP_KERNEL);
> > > > > +	m = devm_kzalloc(xe->drm.dev, sizeof(*m),
> > > > > GFP_KERNEL);
> > > > >    	if (!m)
> > > > >    		return ERR_PTR(-ENOMEM);
> > > > >    
> > > > > @@ -455,7 +455,7 @@ struct xe_migrate *xe_migrate_init(struct
> > > > > xe_tile *tile)
> > > > >    	might_lock(&m->job_mutex);
> > > > >    	fs_reclaim_release(GFP_KERNEL);
> > > > >    
> > > > > -	err = drmm_add_action_or_reset(&xe->drm,
> > > > > xe_migrate_fini,
> > > > > m);
> > > > > +	err = devm_add_action_or_reset(xe->drm.dev,
> > > > > xe_migrate_fini, m);
> > > > >    	if (err)
> > > > >    		return ERR_PTR(err);
> > > > >    
> > > > 
> > > 
> > 
>