[PATCH v4 07/14] drm/amdgpu: Register IOMMU topology notifier per device.

Wed Jan 20 08:38:31 UTC 2021

On Wed, Jan 20, 2021 at 5:21 AM Andrey Grodzovsky
<Andrey.Grodzovsky at amd.com> wrote:
>
>
> On 1/19/21 5:01 PM, Daniel Vetter wrote:
> > On Tue, Jan 19, 2021 at 10:22 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky at amd.com> wrote:
> >>
> >> On 1/19/21 8:45 AM, Daniel Vetter wrote:
> >>
> >> On Tue, Jan 19, 2021 at 09:48:03AM +0100, Christian König wrote:
> >>
> >> Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
> >>
> >> Handle all DMA IOMMU gropup related dependencies before the
> >> group is removed.
> >>
> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> >> ---
> >>    drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  5 ++++
> >>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++
> >>    drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  2 +-
> >>    drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   |  1 +
> >>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++
> >>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 ++
> >>    6 files changed, 65 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> index 478a7d8..2953420 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> @@ -51,6 +51,7 @@
> >>    #include <linux/dma-fence.h>
> >>    #include <linux/pci.h>
> >>    #include <linux/aer.h>
> >> +#include <linux/notifier.h>
> >>    #include <drm/ttm/ttm_bo_api.h>
> >>    #include <drm/ttm/ttm_bo_driver.h>
> >> @@ -1041,6 +1042,10 @@ struct amdgpu_device {
> >>    bool                            in_pci_err_recovery;
> >>    struct pci_saved_state          *pci_state;
> >> +
> >> + struct notifier_block nb;
> >> + struct blocking_notifier_head notifier;
> >> + struct list_head device_bo_list;
> >>    };
> >>    static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index 45e23e3..e99f4f1 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -70,6 +70,8 @@
> >>    #include <drm/task_barrier.h>
> >>    #include <linux/pm_runtime.h>
> >> +#include <linux/iommu.h>
> >> +
> >>    MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
> >>    MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
> >>    MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
> >> @@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = {
> >>    };
> >> +static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
> >> +     unsigned long action, void *data)
> >> +{
> >> + struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
> >> + struct amdgpu_bo *bo = NULL;
> >> +
> >> + /*
> >> + * Following is a set of IOMMU group dependencies taken care of before
> >> + * device's IOMMU group is removed
> >> + */
> >> + if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
> >> +
> >> + spin_lock(&ttm_bo_glob.lru_lock);
> >> + list_for_each_entry(bo, &adev->device_bo_list, bo) {
> >> + if (bo->tbo.ttm)
> >> + ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
> >> + }
> >> + spin_unlock(&ttm_bo_glob.lru_lock);
> >>
> >> That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
> >>
> >> You need to use a mutex here or even better make sure you can access the
> >> device_bo_list without a lock in this moment.
> >>
> >> I'd also be worried about the notifier mutex getting really badly in the
> >> way.
> >>
> >> Plus I'm worried why we even need this, it sounds a bit like papering over
> >> the iommu subsystem. Assuming we clean up all our iommu mappings in our
> >> device hotunplug/unload code, why do we still need to have an additional
> >> iommu notifier on top, with all kinds of additional headaches? The iommu
> >> shouldn't clean up before the devices in its group have cleaned up.
> >>
> >> I think we need more info here on what the exact problem is first.
> >> -Daniel
> >>
> >>
> >> Originally I experienced the  crash bellow on IOMMU enabled device, it happens post device removal from PCI topology -
> >> during shutting down of user client holding last reference to drm device file (X in my case).
> >> The crash is because by the time I get to this point struct device->iommu_group pointer is NULL
> >> already since the IOMMU group for the device is unset during PCI removal. So this contradicts what you said above
> >> that the iommu shouldn't clean up before the devices in its group have cleaned up.
> >> So instead of guessing when is the right place to place all IOMMU related cleanups it makes sense
> >> to get notification from IOMMU subsystem in the form of event IOMMU_GROUP_NOTIFY_DEL_DEVICE
> >> and use that place to do all the relevant cleanups.
> > Yeah that goes boom, but you shouldn't need this special iommu cleanup
> > handler. Making sure that all the dma-api mappings are gone needs to
> > be done as part of the device hotunplug, you can't delay that to the
> > last drm_device cleanup.
> >
> > So I most of the patch here with pulling that out (should be outright
> > removed from the final release code even) is good, just not yet how
> > you call that new code. Probably these bits (aside from walking all
> > buffers and unpopulating the tt) should be done from the early_free
> > callback you're adding.
> >
> > Also what I just realized: For normal unload you need to make sure the
> > hw is actually stopped first, before we unmap buffers. Otherwise
> > driver unload will likely result in wedged hw, probably not what you
> > want for debugging.
> > -Daniel
>
> Since device removal from IOMMU group and this hook in particular
> takes place before call to amdgpu_pci_remove essentially it means
> that for IOMMU use case the entire amdgpu_device_fini_hw function
> shouold be called here to stop the HW instead from amdgpu_pci_remove.

The crash you showed was on final drm_close, which should happen after
device removal, so that's clearly buggy. If the iommu subsystem
removes stuff before the driver could clean up already, then I think
that's an iommu bug or dma-api bug. Just plain using dma_map/unmap and
friends really shouldn't need notifier hacks like you're implementing
here. Can you pls show me a backtrace where dma_unmap_sg blows up when
it's put into the pci_driver remove callback?

> Looking at this from another perspective, AFAIK on each new device probing
> either due to PCI bus rescan or driver reload we are resetting the ASIC before doing
> any init operations (assuming we successfully gained MMIO access) and so maybe
> your concern is not an issue ?

Reset on probe is too late. The problem is that if you just remove the
driver, your device is doing dma at that moment. And you kinda have to
stop that before you free the mappings/memory. Of course when the
device is actually hotunplugged, then dma is guaranteed to have
stopped already. I'm not sure whether disabling the pci device is
enough to make sure no more dma happens, could be that's enough.
-Daniel

> Andrey
>
>
> >
> >> Andrey
> >>
> >>
> >> [  123.810074 <   28.126960>] BUG: kernel NULL pointer dereference, address: 00000000000000c8
> >> [  123.810080 <    0.000006>] #PF: supervisor read access in kernel mode
> >> [  123.810082 <    0.000002>] #PF: error_code(0x0000) - not-present page
> >> [  123.810085 <    0.000003>] PGD 0 P4D 0
> >> [  123.810089 <    0.000004>] Oops: 0000 [#1] SMP NOPTI
> >> [  123.810094 <    0.000005>] CPU: 5 PID: 1418 Comm: Xorg:shlo4 Tainted: G           O      5.9.0-rc2-dev+ #59
> >> [  123.810096 <    0.000002>] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 4406 02/28/2019
> >> [  123.810105 <    0.000009>] RIP: 0010:iommu_get_dma_domain+0x10/0x20
> >> [  123.810108 <    0.000003>] Code: b0 48 c7 87 98 00 00 00 00 00 00 00 31 c0 c3 b8 f4 ff ff ff eb a6 0f 1f 40 00 0f 1f 44 00 00 48 8b 87 d0 02 00 00 55 48 89 e5 <48> 8b 80 c8 00 00 00 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48
> >> [  123.810111 <    0.000003>] RSP: 0018:ffffa2e201f7f980 EFLAGS: 00010246
> >> [  123.810114 <    0.000003>] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> >> [  123.810116 <    0.000002>] RDX: 0000000000001000 RSI: 00000000bf5cb000 RDI: ffff93c259dc60b0
> >> [  123.810117 <    0.000001>] RBP: ffffa2e201f7f980 R08: 0000000000000000 R09: 0000000000000000
> >> [  123.810119 <    0.000002>] R10: ffffa2e201f7faf0 R11: 0000000000000001 R12: 00000000bf5cb000
> >> [  123.810121 <    0.000002>] R13: 0000000000001000 R14: ffff93c24cef9c50 R15: ffff93c256c05688
> >> [  123.810124 <    0.000003>] FS:  00007f5e5e8d3700(0000) GS:ffff93c25e940000(0000) knlGS:0000000000000000
> >> [  123.810126 <    0.000002>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [  123.810128 <    0.000002>] CR2: 00000000000000c8 CR3: 000000027fe0a000 CR4: 00000000003506e0
> >> [  123.810130 <    0.000002>] Call Trace:
> >> [  123.810136 <    0.000006>]  __iommu_dma_unmap+0x2e/0x100
> >> [  123.810141 <    0.000005>]  ? kfree+0x389/0x3a0
> >> [  123.810144 <    0.000003>]  iommu_dma_unmap_page+0xe/0x10
> >> [  123.810149 <    0.000005>] dma_unmap_page_attrs+0x4d/0xf0
> >> [  123.810159 <    0.000010>]  ? ttm_bo_del_from_lru+0x8e/0xb0 [ttm]
> >> [  123.810165 <    0.000006>] ttm_unmap_and_unpopulate_pages+0x8e/0xc0 [ttm]
> >> [  123.810252 <    0.000087>] amdgpu_ttm_tt_unpopulate+0xaa/0xd0 [amdgpu]
> >> [  123.810258 <    0.000006>]  ttm_tt_unpopulate+0x59/0x70 [ttm]
> >> [  123.810264 <    0.000006>]  ttm_tt_destroy+0x6a/0x70 [ttm]
> >> [  123.810270 <    0.000006>] ttm_bo_cleanup_memtype_use+0x36/0xa0 [ttm]
> >> [  123.810276 <    0.000006>]  ttm_bo_put+0x1e7/0x400 [ttm]
> >> [  123.810358 <    0.000082>]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
> >> [  123.810440 <    0.000082>] amdgpu_gem_object_free+0x37/0x50 [amdgpu]
> >> [  123.810459 <    0.000019>]  drm_gem_object_free+0x35/0x40 [drm]
> >> [  123.810476 <    0.000017>] drm_gem_object_handle_put_unlocked+0x9d/0xd0 [drm]
> >> [  123.810494 <    0.000018>] drm_gem_object_release_handle+0x74/0x90 [drm]
> >> [  123.810511 <    0.000017>]  ? drm_gem_object_handle_put_unlocked+0xd0/0xd0 [drm]
> >> [  123.810516 <    0.000005>]  idr_for_each+0x4d/0xd0
> >> [  123.810534 <    0.000018>]  drm_gem_release+0x20/0x30 [drm]
> >> [  123.810550 <    0.000016>]  drm_file_free+0x251/0x2a0 [drm]
> >> [  123.810567 <    0.000017>] drm_close_helper.isra.14+0x61/0x70 [drm]
> >> [  123.810583 <    0.000016>]  drm_release+0x6a/0xe0 [drm]
> >> [  123.810588 <    0.000005>]  __fput+0xa2/0x250
> >> [  123.810592 <    0.000004>]  ____fput+0xe/0x10
> >> [  123.810595 <    0.000003>]  task_work_run+0x6c/0xa0
> >> [  123.810600 <    0.000005>]  do_exit+0x376/0xb60
> >> [  123.810604 <    0.000004>]  do_group_exit+0x43/0xa0
> >> [  123.810608 <    0.000004>]  get_signal+0x18b/0x8e0
> >> [  123.810612 <    0.000004>]  ? do_futex+0x595/0xc20
> >> [  123.810617 <    0.000005>]  arch_do_signal+0x34/0x880
> >> [  123.810620 <    0.000003>]  ? check_preempt_curr+0x50/0x60
> >> [  123.810623 <    0.000003>]  ? ttwu_do_wakeup+0x1e/0x160
> >> [  123.810626 <    0.000003>]  ? ttwu_do_activate+0x61/0x70
> >> [  123.810630 <    0.000004>] exit_to_user_mode_prepare+0x124/0x1b0
> >> [  123.810635 <    0.000005>] syscall_exit_to_user_mode+0x31/0x170
> >> [  123.810639 <    0.000004>]  do_syscall_64+0x43/0x80
> >>
> >>
> >> Andrey
> >>
> >>
> >>
> >> Christian.
> >>
> >> +
> >> + if (adev->irq.ih.use_bus_addr)
> >> + amdgpu_ih_ring_fini(adev, &adev->irq.ih);
> >> + if (adev->irq.ih1.use_bus_addr)
> >> + amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
> >> + if (adev->irq.ih2.use_bus_addr)
> >> + amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
> >> +
> >> + amdgpu_gart_dummy_page_fini(adev);
> >> + }
> >> +
> >> + return NOTIFY_OK;
> >> +}
> >> +
> >> +
> >>    /**
> >>     * amdgpu_device_init - initialize the driver
> >>     *
> >> @@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >>    INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
> >> + INIT_LIST_HEAD(&adev->device_bo_list);
> >> +
> >>    adev->gfx.gfx_off_req_count = 1;
> >>    adev->pm.ac_power = power_supply_is_system_supplied() > 0;
> >> @@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >>    if (amdgpu_device_cache_pci_state(adev->pdev))
> >>    pci_restore_state(pdev);
> >> + BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
> >> + adev->nb.notifier_call = amdgpu_iommu_group_notifier;
> >> +
> >> + if (adev->dev->iommu_group) {
> >> + r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
> >> + if (r)
> >> + goto failed;
> >> + }
> >> +
> >>    return 0;
> >>    failed:
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> >> index 0db9330..486ad6d 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> >> @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev)
> >>     *
> >>     * Frees the dummy page used by the driver (all asics).
> >>     */
> >> -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
> >> +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
> >>    {
> >>    if (!adev->dummy_page_addr)
> >>    return;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> >> index afa2e28..5678d9c 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> >> @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
> >>    void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
> >>    int amdgpu_gart_init(struct amdgpu_device *adev);
> >>    void amdgpu_gart_fini(struct amdgpu_device *adev);
> >> +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev);
> >>    int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
> >>          int pages);
> >>    int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >> index 6cc9919..4a1de69 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >> @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo)
> >>    }
> >>    amdgpu_bo_unref(&bo->parent);
> >> + spin_lock(&ttm_bo_glob.lru_lock);
> >> + list_del(&bo->bo);
> >> + spin_unlock(&ttm_bo_glob.lru_lock);
> >> +
> >>    kfree(bo->metadata);
> >>    kfree(bo);
> >>    }
> >> @@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
> >>    if (bp->type == ttm_bo_type_device)
> >>    bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> >> + INIT_LIST_HEAD(&bo->bo);
> >> +
> >> + spin_lock(&ttm_bo_glob.lru_lock);
> >> + list_add_tail(&bo->bo, &adev->device_bo_list);
> >> + spin_unlock(&ttm_bo_glob.lru_lock);
> >> +
> >>    return 0;
> >>    fail_unreserve:
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> >> index 9ac3756..5ae8555 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> >> @@ -110,6 +110,8 @@ struct amdgpu_bo {
> >>    struct list_head shadow_list;
> >>    struct kgd_mem                  *kfd_bo;
> >> +
> >> + struct list_head bo;
> >>    };
> >>    static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
> >
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch