[PATCH 1/2] drm/xe_migrate: Switch from drm to dev managed actions

Fri Feb 28 18:38:09 UTC 2025

On 28/02/2025 14:47, Hellstrom, Thomas wrote:
> On Fri, 2025-02-28 at 13:57 +0100, Thomas Hellström wrote:
>> On Fri, 2025-02-28 at 12:28 +0000, Matthew Auld wrote:
>>> Hi,
>>>
>>> On 28/02/2025 11:11, Hellstrom, Thomas wrote:
>>>> Hi, Matthew
>>>>
>>>> On Fri, 2025-02-28 at 10:21 +0000, Matthew Auld wrote:
>>>>> On 28/02/2025 06:52, Aradhya Bhatia wrote:
>>>>>> Change the scope of the migrate subsystem to be dev managed
>>>>>> instead
>>>>>> of
>>>>>> drm managed.
>>>>>>
>>>>>> The parent pci struct &device, that the xe struct &drm_device
>>>>>> is a
>>>>>> part
>>>>>> of, gets removed when a hot unplug is triggered, which causes
>>>>>> the
>>>>>> underlying iommu group to get destroyed as well.
>>>>>
>>>>> Nice find. But if that's the case then the migrate BO here is
>>>>> just
>>>>> one
>>>>> of many where we will see this. Basically all system memory BO
>>>>> can
>>>>> suffer from this AFAICT, including userspace owned. So I think
>>>>> we
>>>>> need
>>>>> to rather solve this for all, and I don't think we can really
>>>>> tie
>>>>> the
>>>>> lifetime of all BOs to devm, so likely need a different
>>>>> approach.
>>>>>
>>>>> I think we might instead need to teardown all dma mappings when
>>>>> removing
>>>>> the device, leaving the BO intact. It looks like there is
>>>>> already
>>>>> a
>>>>> helper for this, so maybe something roughly like this:
>>>>>
>>>>> @@ -980,6 +980,8 @@ void xe_device_remove(struct xe_device *xe)
>>>>>
>>>>>            drm_dev_unplug(&xe->drm);
>>>>>
>>>>> +       ttm_device_clear_dma_mappings(&xe->ttm);
>>>>> +
>>>>>            xe_display_driver_remove(xe);
>>>>>
>>>>>            xe_heci_gsc_fini(xe);
>>>>>
>>>>> But I don't think userptr will be covered by that, just all
>>>>> BOs,
>>>>> so
>>>>> likely need an extra step to also nuke all usersptr dma
>>>>> mappings
>>>>> somewhere.
>>>>
>>>> I have been discussing this a bit with Aradhya, and the problem
>>>> is
>>>> a
>>>> bit complex.
>>>>
>>>> Really if the PCI device goes away, we need to move *all* bos to
>>>> system. That should clear out any dma mappings, since we map dma
>>>> when
>>>> moving to TT. Also VRAM bos should be moved to system which will
>>>> also
>>>> handle things like SVM migration.
>>>
>>> My thinking is that from userspace pov, the buffers are about to be
>>> lost
>>> anyway, so going through the trouble of moving stuff seems like a
>>> waste?
>>> Once the device is unplugged all ioctls are rejected for that
>>> instance
>>> and we should have stopped touching the hw, so all that is left is
>>> to
>>> close stuff and open up a new instance. VRAM is basically just some
>>> state tracking at this point (for this driver instance), and actual
>>> memory footprint is small. If we move to system memory we might be
>>> using
>>> actual system pages for stuff that should really just be thrown
>>> away,
>>> and will be once the driver instance goes away AFAIK. And for stuff
>>> that
>>> has an actual populated ttm_tt, we just discard with something like
>>> ttm_device_clear_dma_mappings() during remove.
>>
>> That's true for most current use-cases but exported things, like
>> multidevice SVM, if you unbind a device you'd currently just want to
>> migrate to system, and it's not fatal to the running application
>> (subject to review ofc). And I'm not sure what the expected dma-buf
>> semantics are, although if we free pages that the remote device has
>> pinned and keep accessing, that sounds bad.
> 
> Having thought a bit more about this, First, I fully agree that
> evicting all of VRAM was probably a bad idea. We need to be selective,
> and evict exported VRAM only. That means SVM (no action required until
> multi-svm lands) and p2p DMA-buf, (unless there is a way to revoke a
> dma-buf) where we at least need to send a move_notify() and wait for
> idle. A possible way might be to hook up to xe_evict_flags() and
> respond with a NULL placement, purging the bo for anything but SVM.
> 
> Pinned p2p VRAM dma-buf I think needs to block the remove() callback,
> but we don't support that ATM.
> 
> System DMA-buf shouldn't be a problem, but we can't drop the pages
> AFAICT.
> 
> Imported dma-buf should have the attachment unmapped.
> 
> I think that _evict_all() could be made to handle all these situations
> with some minor tweaks, but that still leaves the pinned TT bos.

Right. I guess in the worst case we can rely on our own pin tracking to 
nuke anything pinned that also has populated tt?

> 
> /Thomas
> 
> 
>>
>>>
>>>>
>>>> But that doesn't really help with pinned bos. So IMO subsystems
>>>> that
>>>
>>> I think ttm_device_clear_dma_mappings() also handles pinned BO,
>>> AFAICT.
>>
>> Yes, you're right. I still think it's reasonable, though to
>>
>>>
>>> I believe the pinned BO are tracked via bdev->unevictable, so in
>>> theory
>>> it will nuke the dma mapping there also and discard the pages.
>>>
>>>> allocate pinned bos need to be devm managed, and this is a step
>>>> in
>>>> that
>>>> direction. But we probably need to deal with some fallout. For
>>>> example
>>>> when we take down the vram managers using drmmm_ they'd want to
>>>> call
>>>> evict_all() and the migrate subsystem is already down.
>>>>
>>>> So I think a natural place to deal with such fallouts is the
>>>> remove()
>>>> callback, while subsystems with pinned bos use devm_
>>>>
>>>> But admittedly to me it's not really clear how to *best* handle
>>>> this
>>>> situation. I suspect if we really stress-test device unbinding on
>>>> a
>>>> running app we're going to hit a lot of problems.
>>>
>>> Yes, there are lots of open issues here, but our testing is still
>>> lacking :)
>>>
>>>>
>>>> As for the userptrs, I just posted a series that introduces a
>>>> per-
>>>> userptr dedicated unmap function. We could probably put them on a
>>>> device list or something similar that calls that function (or
>>>> just
>>>> general userptr invalidation) for all userptrs.
>>>
>>> Sounds good.
>>>
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> The migrate subsystem, which handles the lifetime of the
>>>>>> page-
>>>>>> table
>>>>>> tree
>>>>>> (pt) BO, doesn't get a chance to keep the BO back during the
>>>>>> hot
>>>>>> unplug,
>>>>>> as all the references to DRM haven't been put back.
>>>>>> When all the references to DRM are indeed put back later, the
>>>>>> migrate
>>>>>> subsystem tries to put back the pt BO. Since the underlying
>>>>>> iommu
>>>>>> group
>>>>>> has been already destroyed, a kernel NULL ptr dereference
>>>>>> takes
>>>>>> place
>>>>>> while attempting to keep back the pt BO.
>>>>>>
>>>>>> Signed-off-by: Aradhya Bhatia <aradhya.bhatia at intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/xe/xe_migrate.c | 6 +++---
>>>>>>     1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c
>>>>>> b/drivers/gpu/drm/xe/xe_migrate.c
>>>>>> index 278bc96cf593..4e23adfa208a 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>>>>>> @@ -97,7 +97,7 @@ struct xe_exec_queue
>>>>>> *xe_tile_migrate_exec_queue(struct xe_tile *tile)
>>>>>>     	return tile->migrate->q;
>>>>>>     }
>>>>>>     
>>>>>> -static void xe_migrate_fini(struct drm_device *dev, void
>>>>>> *arg)
>>>>>> +static void xe_migrate_fini(void *arg)
>>>>>>     {
>>>>>>     	struct xe_migrate *m = arg;
>>>>>>     
>>>>>> @@ -401,7 +401,7 @@ struct xe_migrate *xe_migrate_init(struct
>>>>>> xe_tile *tile)
>>>>>>     	struct xe_vm *vm;
>>>>>>     	int err;
>>>>>>     
>>>>>> -	m = drmm_kzalloc(&xe->drm, sizeof(*m), GFP_KERNEL);
>>>>>> +	m = devm_kzalloc(xe->drm.dev, sizeof(*m),
>>>>>> GFP_KERNEL);
>>>>>>     	if (!m)
>>>>>>     		return ERR_PTR(-ENOMEM);
>>>>>>     
>>>>>> @@ -455,7 +455,7 @@ struct xe_migrate *xe_migrate_init(struct
>>>>>> xe_tile *tile)
>>>>>>     	might_lock(&m->job_mutex);
>>>>>>     	fs_reclaim_release(GFP_KERNEL);
>>>>>>     
>>>>>> -	err = drmm_add_action_or_reset(&xe->drm,
>>>>>> xe_migrate_fini,
>>>>>> m);
>>>>>> +	err = devm_add_action_or_reset(xe->drm.dev,
>>>>>> xe_migrate_fini, m);
>>>>>>     	if (err)
>>>>>>     		return ERR_PTR(err);
>>>>>>     
>>>>>
>>>>
>>>
>>
>