[PATCH] drm/amdkfd: fix interrupt spin lock

Fri Nov 2 18:59:52 UTC 2018

On 2018-11-02 9:48 a.m., Christian König wrote:
> Vega10 has multiple interrupt rings,

I don't think I've seen your code that implements multiple interrupt
rings. So it's a bit hard to comment. As I understand it, the only way
this could happen is, if the two interrupt rings are handled by
different CPU cores. Alternatively, you may end up calling
kgd2kfd_interrupt outside of interrupt context (e.g. in a kernel
thread), which would invalidate an implicit assumption in
kgd2kfd_interrupt. Can you clarify what's happening here?

Anyway, this change shouldn't cause any other problems so feel free to
add my Acked-by.

Regards,
  Felix

>  so this can be called from multiple
> calles at the same time resulting in:
>
> [   71.779334] ================================
> [   71.779406] WARNING: inconsistent lock state
> [   71.779478] 4.19.0-rc1+ #44 Tainted: G        W
> [   71.779565] --------------------------------
> [   71.779637] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [   71.779740] kworker/6:1/120 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [   71.779832] 00000000ad761971 (&(&kfd->interrupt_lock)->rlock){?...},
> at: kgd2kfd_interrupt+0x75/0x100 [amdgpu]
> [   71.780058] {IN-HARDIRQ-W} state was registered at:
> [   71.780115]   _raw_spin_lock+0x2c/0x40
> [   71.780180]   kgd2kfd_interrupt+0x75/0x100 [amdgpu]
> [   71.780248]   amdgpu_irq_callback+0x6c/0x150 [amdgpu]
> [   71.780315]   amdgpu_ih_process+0x88/0x100 [amdgpu]
> [   71.780380]   amdgpu_irq_handler+0x20/0x40 [amdgpu]
> [   71.780409]   __handle_irq_event_percpu+0x49/0x2a0
> [   71.780436]   handle_irq_event_percpu+0x30/0x70
> [   71.780461]   handle_irq_event+0x37/0x60
> [   71.780484]   handle_edge_irq+0x83/0x1b0
> [   71.780506]   handle_irq+0x1f/0x30
> [   71.780526]   do_IRQ+0x53/0x110
> [   71.780544]   ret_from_intr+0x0/0x22
> [   71.780566]   cpuidle_enter_state+0xaa/0x330
> [   71.780591]   do_idle+0x203/0x280
> [   71.780610]   cpu_startup_entry+0x6f/0x80
> [   71.780634]   start_secondary+0x1b0/0x200
> [   71.780657]   secondary_startup_64+0xa4/0xb0
>
> Fix this by always using irq save spin locks.
>
> Signed-off-by: Christian König <christian.koenig at amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index d7e0f41f8edc..c004647c8cb4 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -682,6 +682,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>  {
>  	uint32_t patched_ihre[KFD_MAX_RING_ENTRY_SIZE];
>  	bool is_patched = false;
> +	unsigned long flags;
>  
>  	if (!kfd->init_complete)
>  		return;
> @@ -691,7 +692,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>  		return;
>  	}
>  
> -	spin_lock(&kfd->interrupt_lock);
> +	spin_lock_irqsave(&kfd->interrupt_lock, flags);
>  
>  	if (kfd->interrupts_active
>  	    && interrupt_is_wanted(kfd, ih_ring_entry,
> @@ -700,7 +701,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>  				     is_patched ? patched_ihre : ih_ring_entry))
>  		queue_work(kfd->ih_wq, &kfd->interrupt_work);
>  
> -	spin_unlock(&kfd->interrupt_lock);
> +	spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
>  }
>  
>  int kgd2kfd_quiesce_mm(struct mm_struct *mm)