[PATCH v4 07/14] drm/amdgpu: Register IOMMU topology notifier per device.

Tue Jan 19 22:01:04 UTC 2021

On Tue, Jan 19, 2021 at 10:22 PM Andrey Grodzovsky
<Andrey.Grodzovsky at amd.com> wrote:
>
>
> On 1/19/21 8:45 AM, Daniel Vetter wrote:
>
> On Tue, Jan 19, 2021 at 09:48:03AM +0100, Christian König wrote:
>
> Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
>
> Handle all DMA IOMMU gropup related dependencies before the
> group is removed.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  5 ++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   |  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 ++
>   6 files changed, 65 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 478a7d8..2953420 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -51,6 +51,7 @@
>   #include <linux/dma-fence.h>
>   #include <linux/pci.h>
>   #include <linux/aer.h>
> +#include <linux/notifier.h>
>   #include <drm/ttm/ttm_bo_api.h>
>   #include <drm/ttm/ttm_bo_driver.h>
> @@ -1041,6 +1042,10 @@ struct amdgpu_device {
>   bool                            in_pci_err_recovery;
>   struct pci_saved_state          *pci_state;
> +
> + struct notifier_block nb;
> + struct blocking_notifier_head notifier;
> + struct list_head device_bo_list;
>   };
>   static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 45e23e3..e99f4f1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -70,6 +70,8 @@
>   #include <drm/task_barrier.h>
>   #include <linux/pm_runtime.h>
> +#include <linux/iommu.h>
> +
>   MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>   MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>   MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
> @@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = {
>   };
> +static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
> +     unsigned long action, void *data)
> +{
> + struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
> + struct amdgpu_bo *bo = NULL;
> +
> + /*
> + * Following is a set of IOMMU group dependencies taken care of before
> + * device's IOMMU group is removed
> + */
> + if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
> +
> + spin_lock(&ttm_bo_glob.lru_lock);
> + list_for_each_entry(bo, &adev->device_bo_list, bo) {
> + if (bo->tbo.ttm)
> + ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
> + }
> + spin_unlock(&ttm_bo_glob.lru_lock);
>
> That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
>
> You need to use a mutex here or even better make sure you can access the
> device_bo_list without a lock in this moment.
>
> I'd also be worried about the notifier mutex getting really badly in the
> way.
>
> Plus I'm worried why we even need this, it sounds a bit like papering over
> the iommu subsystem. Assuming we clean up all our iommu mappings in our
> device hotunplug/unload code, why do we still need to have an additional
> iommu notifier on top, with all kinds of additional headaches? The iommu
> shouldn't clean up before the devices in its group have cleaned up.
>
> I think we need more info here on what the exact problem is first.
> -Daniel
>
>
> Originally I experienced the  crash bellow on IOMMU enabled device, it happens post device removal from PCI topology -
> during shutting down of user client holding last reference to drm device file (X in my case).
> The crash is because by the time I get to this point struct device->iommu_group pointer is NULL
> already since the IOMMU group for the device is unset during PCI removal. So this contradicts what you said above
> that the iommu shouldn't clean up before the devices in its group have cleaned up.
> So instead of guessing when is the right place to place all IOMMU related cleanups it makes sense
> to get notification from IOMMU subsystem in the form of event IOMMU_GROUP_NOTIFY_DEL_DEVICE
> and use that place to do all the relevant cleanups.

Yeah that goes boom, but you shouldn't need this special iommu cleanup
handler. Making sure that all the dma-api mappings are gone needs to
be done as part of the device hotunplug, you can't delay that to the
last drm_device cleanup.

So I most of the patch here with pulling that out (should be outright
removed from the final release code even) is good, just not yet how
you call that new code. Probably these bits (aside from walking all
buffers and unpopulating the tt) should be done from the early_free
callback you're adding.

Also what I just realized: For normal unload you need to make sure the
hw is actually stopped first, before we unmap buffers. Otherwise
driver unload will likely result in wedged hw, probably not what you
want for debugging.
-Daniel

> Andrey
>
>
> [  123.810074 <   28.126960>] BUG: kernel NULL pointer dereference, address: 00000000000000c8
> [  123.810080 <    0.000006>] #PF: supervisor read access in kernel mode
> [  123.810082 <    0.000002>] #PF: error_code(0x0000) - not-present page
> [  123.810085 <    0.000003>] PGD 0 P4D 0
> [  123.810089 <    0.000004>] Oops: 0000 [#1] SMP NOPTI
> [  123.810094 <    0.000005>] CPU: 5 PID: 1418 Comm: Xorg:shlo4 Tainted: G           O      5.9.0-rc2-dev+ #59
> [  123.810096 <    0.000002>] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 4406 02/28/2019
> [  123.810105 <    0.000009>] RIP: 0010:iommu_get_dma_domain+0x10/0x20
> [  123.810108 <    0.000003>] Code: b0 48 c7 87 98 00 00 00 00 00 00 00 31 c0 c3 b8 f4 ff ff ff eb a6 0f 1f 40 00 0f 1f 44 00 00 48 8b 87 d0 02 00 00 55 48 89 e5 <48> 8b 80 c8 00 00 00 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48
> [  123.810111 <    0.000003>] RSP: 0018:ffffa2e201f7f980 EFLAGS: 00010246
> [  123.810114 <    0.000003>] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> [  123.810116 <    0.000002>] RDX: 0000000000001000 RSI: 00000000bf5cb000 RDI: ffff93c259dc60b0
> [  123.810117 <    0.000001>] RBP: ffffa2e201f7f980 R08: 0000000000000000 R09: 0000000000000000
> [  123.810119 <    0.000002>] R10: ffffa2e201f7faf0 R11: 0000000000000001 R12: 00000000bf5cb000
> [  123.810121 <    0.000002>] R13: 0000000000001000 R14: ffff93c24cef9c50 R15: ffff93c256c05688
> [  123.810124 <    0.000003>] FS:  00007f5e5e8d3700(0000) GS:ffff93c25e940000(0000) knlGS:0000000000000000
> [  123.810126 <    0.000002>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  123.810128 <    0.000002>] CR2: 00000000000000c8 CR3: 000000027fe0a000 CR4: 00000000003506e0
> [  123.810130 <    0.000002>] Call Trace:
> [  123.810136 <    0.000006>]  __iommu_dma_unmap+0x2e/0x100
> [  123.810141 <    0.000005>]  ? kfree+0x389/0x3a0
> [  123.810144 <    0.000003>]  iommu_dma_unmap_page+0xe/0x10
> [  123.810149 <    0.000005>] dma_unmap_page_attrs+0x4d/0xf0
> [  123.810159 <    0.000010>]  ? ttm_bo_del_from_lru+0x8e/0xb0 [ttm]
> [  123.810165 <    0.000006>] ttm_unmap_and_unpopulate_pages+0x8e/0xc0 [ttm]
> [  123.810252 <    0.000087>] amdgpu_ttm_tt_unpopulate+0xaa/0xd0 [amdgpu]
> [  123.810258 <    0.000006>]  ttm_tt_unpopulate+0x59/0x70 [ttm]
> [  123.810264 <    0.000006>]  ttm_tt_destroy+0x6a/0x70 [ttm]
> [  123.810270 <    0.000006>] ttm_bo_cleanup_memtype_use+0x36/0xa0 [ttm]
> [  123.810276 <    0.000006>]  ttm_bo_put+0x1e7/0x400 [ttm]
> [  123.810358 <    0.000082>]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
> [  123.810440 <    0.000082>] amdgpu_gem_object_free+0x37/0x50 [amdgpu]
> [  123.810459 <    0.000019>]  drm_gem_object_free+0x35/0x40 [drm]
> [  123.810476 <    0.000017>] drm_gem_object_handle_put_unlocked+0x9d/0xd0 [drm]
> [  123.810494 <    0.000018>] drm_gem_object_release_handle+0x74/0x90 [drm]
> [  123.810511 <    0.000017>]  ? drm_gem_object_handle_put_unlocked+0xd0/0xd0 [drm]
> [  123.810516 <    0.000005>]  idr_for_each+0x4d/0xd0
> [  123.810534 <    0.000018>]  drm_gem_release+0x20/0x30 [drm]
> [  123.810550 <    0.000016>]  drm_file_free+0x251/0x2a0 [drm]
> [  123.810567 <    0.000017>] drm_close_helper.isra.14+0x61/0x70 [drm]
> [  123.810583 <    0.000016>]  drm_release+0x6a/0xe0 [drm]
> [  123.810588 <    0.000005>]  __fput+0xa2/0x250
> [  123.810592 <    0.000004>]  ____fput+0xe/0x10
> [  123.810595 <    0.000003>]  task_work_run+0x6c/0xa0
> [  123.810600 <    0.000005>]  do_exit+0x376/0xb60
> [  123.810604 <    0.000004>]  do_group_exit+0x43/0xa0
> [  123.810608 <    0.000004>]  get_signal+0x18b/0x8e0
> [  123.810612 <    0.000004>]  ? do_futex+0x595/0xc20
> [  123.810617 <    0.000005>]  arch_do_signal+0x34/0x880
> [  123.810620 <    0.000003>]  ? check_preempt_curr+0x50/0x60
> [  123.810623 <    0.000003>]  ? ttwu_do_wakeup+0x1e/0x160
> [  123.810626 <    0.000003>]  ? ttwu_do_activate+0x61/0x70
> [  123.810630 <    0.000004>] exit_to_user_mode_prepare+0x124/0x1b0
> [  123.810635 <    0.000005>] syscall_exit_to_user_mode+0x31/0x170
> [  123.810639 <    0.000004>]  do_syscall_64+0x43/0x80
>
>
> Andrey
>
>
>
> Christian.
>
> +
> + if (adev->irq.ih.use_bus_addr)
> + amdgpu_ih_ring_fini(adev, &adev->irq.ih);
> + if (adev->irq.ih1.use_bus_addr)
> + amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
> + if (adev->irq.ih2.use_bus_addr)
> + amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
> +
> + amdgpu_gart_dummy_page_fini(adev);
> + }
> +
> + return NOTIFY_OK;
> +}
> +
> +
>   /**
>    * amdgpu_device_init - initialize the driver
>    *
> @@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
> + INIT_LIST_HEAD(&adev->device_bo_list);
> +
>   adev->gfx.gfx_off_req_count = 1;
>   adev->pm.ac_power = power_supply_is_system_supplied() > 0;
> @@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   if (amdgpu_device_cache_pci_state(adev->pdev))
>   pci_restore_state(pdev);
> + BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
> + adev->nb.notifier_call = amdgpu_iommu_group_notifier;
> +
> + if (adev->dev->iommu_group) {
> + r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
> + if (r)
> + goto failed;
> + }
> +
>   return 0;
>   failed:
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index 0db9330..486ad6d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev)
>    *
>    * Frees the dummy page used by the driver (all asics).
>    */
> -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
> +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
>   {
>   if (!adev->dummy_page_addr)
>   return;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> index afa2e28..5678d9c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
>   void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
>   int amdgpu_gart_init(struct amdgpu_device *adev);
>   void amdgpu_gart_fini(struct amdgpu_device *adev);
> +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev);
>   int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
>         int pages);
>   int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 6cc9919..4a1de69 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo)
>   }
>   amdgpu_bo_unref(&bo->parent);
> + spin_lock(&ttm_bo_glob.lru_lock);
> + list_del(&bo->bo);
> + spin_unlock(&ttm_bo_glob.lru_lock);
> +
>   kfree(bo->metadata);
>   kfree(bo);
>   }
> @@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
>   if (bp->type == ttm_bo_type_device)
>   bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> + INIT_LIST_HEAD(&bo->bo);
> +
> + spin_lock(&ttm_bo_glob.lru_lock);
> + list_add_tail(&bo->bo, &adev->device_bo_list);
> + spin_unlock(&ttm_bo_glob.lru_lock);
> +
>   return 0;
>   fail_unreserve:
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index 9ac3756..5ae8555 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -110,6 +110,8 @@ struct amdgpu_bo {
>   struct list_head shadow_list;
>   struct kgd_mem                  *kfd_bo;
> +
> + struct list_head bo;
>   };
>   static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch