[PATCH v4 07/14] drm/amdgpu: Register IOMMU topology notifier per device.

Tue Jan 19 21:21:52 UTC 2021

On 1/19/21 8:45 AM, Daniel Vetter wrote:
> On Tue, Jan 19, 2021 at 09:48:03AM +0100, Christian König wrote:
>> Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
>>> Handle all DMA IOMMU gropup related dependencies before the
>>> group is removed.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  5 ++++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  2 +-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   |  1 +
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 ++
>>>    6 files changed, 65 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 478a7d8..2953420 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -51,6 +51,7 @@
>>>    #include <linux/dma-fence.h>
>>>    #include <linux/pci.h>
>>>    #include <linux/aer.h>
>>> +#include <linux/notifier.h>
>>>    #include <drm/ttm/ttm_bo_api.h>
>>>    #include <drm/ttm/ttm_bo_driver.h>
>>> @@ -1041,6 +1042,10 @@ struct amdgpu_device {
>>>    	bool                            in_pci_err_recovery;
>>>    	struct pci_saved_state          *pci_state;
>>> +
>>> +	struct notifier_block		nb;
>>> +	struct blocking_notifier_head	notifier;
>>> +	struct list_head		device_bo_list;
>>>    };
>>>    static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 45e23e3..e99f4f1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -70,6 +70,8 @@
>>>    #include <drm/task_barrier.h>
>>>    #include <linux/pm_runtime.h>
>>> +#include <linux/iommu.h>
>>> +
>>>    MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>>>    MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>>>    MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
>>> @@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = {
>>>    };
>>> +static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
>>> +				     unsigned long action, void *data)
>>> +{
>>> +	struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
>>> +	struct amdgpu_bo *bo = NULL;
>>> +
>>> +	/*
>>> +	 * Following is a set of IOMMU group dependencies taken care of before
>>> +	 * device's IOMMU group is removed
>>> +	 */
>>> +	if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
>>> +
>>> +		spin_lock(&ttm_bo_glob.lru_lock);
>>> +		list_for_each_entry(bo, &adev->device_bo_list, bo) {
>>> +			if (bo->tbo.ttm)
>>> +				ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
>>> +		}
>>> +		spin_unlock(&ttm_bo_glob.lru_lock);
>> That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
>>
>> You need to use a mutex here or even better make sure you can access the
>> device_bo_list without a lock in this moment.
> I'd also be worried about the notifier mutex getting really badly in the
> way.
>
> Plus I'm worried why we even need this, it sounds a bit like papering over
> the iommu subsystem. Assuming we clean up all our iommu mappings in our
> device hotunplug/unload code, why do we still need to have an additional
> iommu notifier on top, with all kinds of additional headaches? The iommu
> shouldn't clean up before the devices in its group have cleaned up.
>
> I think we need more info here on what the exact problem is first.
> -Daniel

Originally I experienced the  crash bellow on IOMMU enabled device, it happens 
post device removal from PCI topology -
during shutting down of user client holding last reference to drm device file (X 
in my case).
The crash is because by the time I get to this point struct device->iommu_group 
pointer is NULL
already since the IOMMU group for the device is unset during PCI removal. So 
this contradicts what you said above
that the iommu shouldn't clean up before the devices in its group have cleaned up.
So instead of guessing when is the right place to place all IOMMU related 
cleanups it makes sense
to get notification from IOMMU subsystem in the form of event 
IOMMU_GROUP_NOTIFY_DEL_DEVICE
and use that place to do all the relevant cleanups.

Andrey

[  123.810074 <   28.126960>] BUG: kernel NULL pointer dereference, address: 
00000000000000c8
[  123.810080 <    0.000006>] #PF: supervisor read access in kernel mode
[  123.810082 <    0.000002>] #PF: error_code(0x0000) - not-present page
[  123.810085 <    0.000003>] PGD 0 P4D 0
[  123.810089 <    0.000004>] Oops: 0000 [#1] SMP NOPTI
[  123.810094 <    0.000005>] CPU: 5 PID: 1418 Comm: Xorg:shlo4 Tainted: 
G           O      5.9.0-rc2-dev+ #59
[  123.810096 <    0.000002>] Hardware name: System manufacturer System Product 
Name/PRIME X470-PRO, BIOS 4406 02/28/2019
[  123.810105 <    0.000009>] *RIP: 0010:iommu_get_dma_domain*+0x10/0x20
[  123.810108 <    0.000003>] Code: b0 48 c7 87 98 00 00 00 00 00 00 00 31 c0 c3 
b8 f4 ff ff ff eb a6 0f 1f 40 00 0f 1f 44 00 00 48 8b 87 d0 02 00 00 55 48 89 e5 
<48> 8b 80 c8 00 00 00 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48
[  123.810111 <    0.000003>] RSP: 0018:ffffa2e201f7f980 EFLAGS: 00010246
[  123.810114 <    0.000003>] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 
0000000000000000
[  123.810116 <    0.000002>] RDX: 0000000000001000 RSI: 00000000bf5cb000 RDI: 
ffff93c259dc60b0
[  123.810117 <    0.000001>] RBP: ffffa2e201f7f980 R08: 0000000000000000 R09: 
0000000000000000
[  123.810119 <    0.000002>] R10: ffffa2e201f7faf0 R11: 0000000000000001 R12: 
00000000bf5cb000
[  123.810121 <    0.000002>] R13: 0000000000001000 R14: ffff93c24cef9c50 R15: 
ffff93c256c05688
[  123.810124 <    0.000003>] FS:  00007f5e5e8d3700(0000) 
GS:ffff93c25e940000(0000) knlGS:0000000000000000
[  123.810126 <    0.000002>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.810128 <    0.000002>] CR2: 00000000000000c8 CR3: 000000027fe0a000 CR4: 
00000000003506e0
[  123.810130 <    0.000002>] Call Trace:
[  123.810136 <    0.000006>]  __iommu_dma_unmap+0x2e/0x100
[  123.810141 <    0.000005>]  ? kfree+0x389/0x3a0
[  123.810144 <    0.000003>]  iommu_dma_unmap_page+0xe/0x10
[  123.810149 <    0.000005>] dma_unmap_page_attrs+0x4d/0xf0
[  123.810159 <    0.000010>]  ? ttm_bo_del_from_lru+0x8e/0xb0 [ttm]
[  123.810165 <    0.000006>] ttm_unmap_and_unpopulate_pages+0x8e/0xc0 [ttm]
[  123.810252 <    0.000087>] amdgpu_ttm_tt_unpopulate+0xaa/0xd0 [amdgpu]
[  123.810258 <    0.000006>]  ttm_tt_unpopulate+0x59/0x70 [ttm]
[  123.810264 <    0.000006>]  ttm_tt_destroy+0x6a/0x70 [ttm]
[  123.810270 <    0.000006>] ttm_bo_cleanup_memtype_use+0x36/0xa0 [ttm]
[  123.810276 <    0.000006>]  ttm_bo_put+0x1e7/0x400 [ttm]
[  123.810358 <    0.000082>]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[  123.810440 <    0.000082>] amdgpu_gem_object_free+0x37/0x50 [amdgpu]
[  123.810459 <    0.000019>]  drm_gem_object_free+0x35/0x40 [drm]
[  123.810476 <    0.000017>] drm_gem_object_handle_put_unlocked+0x9d/0xd0 [drm]
[  123.810494 <    0.000018>] drm_gem_object_release_handle+0x74/0x90 [drm]
[  123.810511 <    0.000017>]  ? drm_gem_object_handle_put_unlocked+0xd0/0xd0 [drm]
[  123.810516 <    0.000005>]  idr_for_each+0x4d/0xd0
[  123.810534 <    0.000018>]  drm_gem_release+0x20/0x30 [drm]
[  123.810550 <    0.000016>]  drm_file_free+0x251/0x2a0 [drm]
[  123.810567 <    0.000017>] drm_close_helper.isra.14+0x61/0x70 [drm]
[  123.810583 <    0.000016>]  drm_release+0x6a/0xe0 [drm]
[  123.810588 <    0.000005>]  __fput+0xa2/0x250
[  123.810592 <    0.000004>]  ____fput+0xe/0x10
[  123.810595 <    0.000003>]  task_work_run+0x6c/0xa0
[  123.810600 <    0.000005>]  do_exit+0x376/0xb60
[  123.810604 <    0.000004>]  do_group_exit+0x43/0xa0
[  123.810608 <    0.000004>]  get_signal+0x18b/0x8e0
[  123.810612 <    0.000004>]  ? do_futex+0x595/0xc20
[  123.810617 <    0.000005>]  arch_do_signal+0x34/0x880
[  123.810620 <    0.000003>]  ? check_preempt_curr+0x50/0x60
[  123.810623 <    0.000003>]  ? ttwu_do_wakeup+0x1e/0x160
[  123.810626 <    0.000003>]  ? ttwu_do_activate+0x61/0x70
[  123.810630 <    0.000004>] exit_to_user_mode_prepare+0x124/0x1b0
[  123.810635 <    0.000005>] syscall_exit_to_user_mode+0x31/0x170
[  123.810639 <    0.000004>]  do_syscall_64+0x43/0x80

Andrey

>
>> Christian.
>>
>>> +
>>> +		if (adev->irq.ih.use_bus_addr)
>>> +			amdgpu_ih_ring_fini(adev, &adev->irq.ih);
>>> +		if (adev->irq.ih1.use_bus_addr)
>>> +			amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
>>> +		if (adev->irq.ih2.use_bus_addr)
>>> +			amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
>>> +
>>> +		amdgpu_gart_dummy_page_fini(adev);
>>> +	}
>>> +
>>> +	return NOTIFY_OK;
>>> +}
>>> +
>>> +
>>>    /**
>>>     * amdgpu_device_init - initialize the driver
>>>     *
>>> @@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>    	INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
>>> +	INIT_LIST_HEAD(&adev->device_bo_list);
>>> +
>>>    	adev->gfx.gfx_off_req_count = 1;
>>>    	adev->pm.ac_power = power_supply_is_system_supplied() > 0;
>>> @@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>    	if (amdgpu_device_cache_pci_state(adev->pdev))
>>>    		pci_restore_state(pdev);
>>> +	BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
>>> +	adev->nb.notifier_call = amdgpu_iommu_group_notifier;
>>> +
>>> +	if (adev->dev->iommu_group) {
>>> +		r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
>>> +		if (r)
>>> +			goto failed;
>>> +	}
>>> +
>>>    	return 0;
>>>    failed:
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> index 0db9330..486ad6d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev)
>>>     *
>>>     * Frees the dummy page used by the driver (all asics).
>>>     */
>>> -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
>>> +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
>>>    {
>>>    	if (!adev->dummy_page_addr)
>>>    		return;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
>>> index afa2e28..5678d9c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
>>> @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
>>>    void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
>>>    int amdgpu_gart_init(struct amdgpu_device *adev);
>>>    void amdgpu_gart_fini(struct amdgpu_device *adev);
>>> +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev);
>>>    int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
>>>    		       int pages);
>>>    int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 6cc9919..4a1de69 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo)
>>>    	}
>>>    	amdgpu_bo_unref(&bo->parent);
>>> +	spin_lock(&ttm_bo_glob.lru_lock);
>>> +	list_del(&bo->bo);
>>> +	spin_unlock(&ttm_bo_glob.lru_lock);
>>> +
>>>    	kfree(bo->metadata);
>>>    	kfree(bo);
>>>    }
>>> @@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
>>>    	if (bp->type == ttm_bo_type_device)
>>>    		bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>>> +	INIT_LIST_HEAD(&bo->bo);
>>> +
>>> +	spin_lock(&ttm_bo_glob.lru_lock);
>>> +	list_add_tail(&bo->bo, &adev->device_bo_list);
>>> +	spin_unlock(&ttm_bo_glob.lru_lock);
>>> +
>>>    	return 0;
>>>    fail_unreserve:
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> index 9ac3756..5ae8555 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> @@ -110,6 +110,8 @@ struct amdgpu_bo {
>>>    	struct list_head		shadow_list;
>>>    	struct kgd_mem                  *kfd_bo;
>>> +
>>> +	struct list_head		bo;
>>>    };
>>>    static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20210119/85608d38/attachment-0001.htm>