[PATCH] drm/amdkfd: fix interrupt spin lock
Kuehling, Felix
Felix.Kuehling at amd.com
Fri Nov 2 18:59:52 UTC 2018
On 2018-11-02 9:48 a.m., Christian König wrote:
> Vega10 has multiple interrupt rings,
I don't think I've seen your code that implements multiple interrupt
rings. So it's a bit hard to comment. As I understand it, the only way
this could happen is, if the two interrupt rings are handled by
different CPU cores. Alternatively, you may end up calling
kgd2kfd_interrupt outside of interrupt context (e.g. in a kernel
thread), which would invalidate an implicit assumption in
kgd2kfd_interrupt. Can you clarify what's happening here?
Anyway, this change shouldn't cause any other problems so feel free to
add my Acked-by.
Regards,
Felix
> so this can be called from multiple
> calles at the same time resulting in:
>
> [ 71.779334] ================================
> [ 71.779406] WARNING: inconsistent lock state
> [ 71.779478] 4.19.0-rc1+ #44 Tainted: G W
> [ 71.779565] --------------------------------
> [ 71.779637] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [ 71.779740] kworker/6:1/120 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 71.779832] 00000000ad761971 (&(&kfd->interrupt_lock)->rlock){?...},
> at: kgd2kfd_interrupt+0x75/0x100 [amdgpu]
> [ 71.780058] {IN-HARDIRQ-W} state was registered at:
> [ 71.780115] _raw_spin_lock+0x2c/0x40
> [ 71.780180] kgd2kfd_interrupt+0x75/0x100 [amdgpu]
> [ 71.780248] amdgpu_irq_callback+0x6c/0x150 [amdgpu]
> [ 71.780315] amdgpu_ih_process+0x88/0x100 [amdgpu]
> [ 71.780380] amdgpu_irq_handler+0x20/0x40 [amdgpu]
> [ 71.780409] __handle_irq_event_percpu+0x49/0x2a0
> [ 71.780436] handle_irq_event_percpu+0x30/0x70
> [ 71.780461] handle_irq_event+0x37/0x60
> [ 71.780484] handle_edge_irq+0x83/0x1b0
> [ 71.780506] handle_irq+0x1f/0x30
> [ 71.780526] do_IRQ+0x53/0x110
> [ 71.780544] ret_from_intr+0x0/0x22
> [ 71.780566] cpuidle_enter_state+0xaa/0x330
> [ 71.780591] do_idle+0x203/0x280
> [ 71.780610] cpu_startup_entry+0x6f/0x80
> [ 71.780634] start_secondary+0x1b0/0x200
> [ 71.780657] secondary_startup_64+0xa4/0xb0
>
> Fix this by always using irq save spin locks.
>
> Signed-off-by: Christian König <christian.koenig at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index d7e0f41f8edc..c004647c8cb4 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -682,6 +682,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
> {
> uint32_t patched_ihre[KFD_MAX_RING_ENTRY_SIZE];
> bool is_patched = false;
> + unsigned long flags;
>
> if (!kfd->init_complete)
> return;
> @@ -691,7 +692,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
> return;
> }
>
> - spin_lock(&kfd->interrupt_lock);
> + spin_lock_irqsave(&kfd->interrupt_lock, flags);
>
> if (kfd->interrupts_active
> && interrupt_is_wanted(kfd, ih_ring_entry,
> @@ -700,7 +701,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
> is_patched ? patched_ihre : ih_ring_entry))
> queue_work(kfd->ih_wq, &kfd->interrupt_work);
>
> - spin_unlock(&kfd->interrupt_lock);
> + spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
> }
>
> int kgd2kfd_quiesce_mm(struct mm_struct *mm)
More information about the amd-gfx
mailing list