etnaviv: Possible circular lockingon i.MX6QP

Thu Jun 27 09:43:33 UTC 2019

Hi Fabio,

Am Mittwoch, den 12.06.2019, 12:48 -0300 schrieb Fabio Estevam:
> Hi,
> 
> On a imx6qp-wandboard I get the warning below about a possible
> circular locking dependency running 5.1.9 built from
> imx_v6_v7_defconfig.
> 
> Such warning does not happen on the imx6q or imx6solo variants of
> wandboard though.
> 
> Any ideas?

The issue reported by lockdep is real. You probably only see it on QP
as it's uncovered due to a MMU exception triggered GPU hang. MMUv1
cores like the ones on the older i.MX6 are unable to signal MMU
exceptions but just read the dummy page.

Some git history digging shows that the bug has been introduced with
3741540e0413 (drm/sched: Rework HW fence processing.), which is part of kernel 5.1. The fix is 5918045c4ed4 (drm/scheduler: rework job destruction), which is not in any released kernel yet and seems to be too big for stable, so I'm not really sure what to do at this point.

Regards,
Lucas

>  Thanks,
> 
> Fabio Estevam
> 
> ** (matchbox-panel:708): WARNING **: Failed to load applet "battery"
> (/usr/lib/matchbox-panel/libbattery.so: cannot open shared object
> file: No such file or directory).
> matchbox-wm: X error warning (0xe00003): BadWindow (invalid Window
> parameter) (opcode: 12)
> etnaviv-gpu 134000.gpu: MMU fault status 0x00000001
> etnaviv-gpu 134000.gpu: MMU 0 fault addr 0x0805ffc0
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.1.9 #58 Not tainted
> ------------------------------------------------------
> kworker/0:1/29 is trying to acquire lock:
> (ptrval) (&(&gpu->fence_spinlock)->rlock){-...}, at:
> dma_fence_remove_callback+0x14/0x50
> 
> but task is already holding lock:
> (ptrval) (&(&sched->job_list_lock)->rlock){-...}, at:
> drm_sched_stop+0x1c/0x124
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&(&sched->job_list_lock)->rlock){-...}:
>        drm_sched_process_job+0x5c/0x1c8
>        dma_fence_signal+0xdc/0x1d4
>        irq_handler+0xd0/0x1e0
>        __handle_irq_event_percpu+0x48/0x360
>        handle_irq_event_percpu+0x28/0x7c
>        handle_irq_event+0x38/0x5c
>        handle_fasteoi_irq+0xc0/0x17c
>        generic_handle_irq+0x20/0x34
>        __handle_domain_irq+0x64/0xe0
>        gic_handle_irq+0x4c/0xa8
>        __irq_svc+0x70/0x98
>        cpuidle_enter_state+0x168/0x5a4
>        cpuidle_enter_state+0x168/0x5a4
>        do_idle+0x220/0x2c0
>        cpu_startup_entry+0x18/0x20
>        start_kernel+0x3e4/0x498
> 
> -> #0 (&(&gpu->fence_spinlock)->rlock){-...}:
>        _raw_spin_lock_irqsave+0x38/0x4c
>        dma_fence_remove_callback+0x14/0x50
>        drm_sched_stop+0x98/0x124
>        etnaviv_sched_timedout_job+0x7c/0xb4
>        drm_sched_job_timedout+0x34/0x5c
>        process_one_work+0x2ac/0x704
>        worker_thread+0x2c/0x574
>        kthread+0x134/0x148
>        ret_from_fork+0x14/0x20
>          (null)
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&(&sched->job_list_lock)->rlock);
>                                lock(&(&gpu->fence_spinlock)->rlock);
>                                lock(&(&sched->job_list_lock)->rlock);
>   lock(&(&gpu->fence_spinlock)->rlock);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/0:1/29:
>  #0: (ptrval) ((wq_completion)events){+.+.}, at:
> process_one_work+0x1f4/0x704
>  #1: (ptrval) ((work_completion)(&(&sched->work_tdr)->work)){+.+.},
> at: process_one_work+0x1f4/0x704
>  #2: (ptrval) (&(&sched->job_list_lock)->rlock){-...}, at:
> drm_sched_stop+0x1c/0x124
> 
> stack backtrace:
> CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 5.1.9 #58
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Workqueue: events drm_sched_job_timedout
> [<c0112748>] (unwind_backtrace) from [<c010cfbc>]
> (show_stack+0x10/0x14)
> [<c010cfbc>] (show_stack) from [<c0bd31ec>] (dump_stack+0xd8/0x110)
> [<c0bd31ec>] (dump_stack) from [<c017a22c>]
> (print_circular_bug.constprop.19+0x1bc/0x2f0)
> [<c017a22c>] (print_circular_bug.constprop.19) from [<c017d408>]
> (__lock_acquire+0x1778/0x1f38)
> [<c017d408>] (__lock_acquire) from [<c017e3a4>]
> (lock_acquire+0xcc/0x1e8)
> [<c017e3a4>] (lock_acquire) from [<c0bf4134>]
> (_raw_spin_lock_irqsave+0x38/0x4c)
> [<c0bf4134>] (_raw_spin_lock_irqsave) from [<c0692710>]
> (dma_fence_remove_callback+0x14/0x50)
> [<c0692710>] (dma_fence_remove_callback) from [<c05d25b4>]
> (drm_sched_stop+0x98/0x124)
> [<c05d25b4>] (drm_sched_stop) from [<c064a3e8>]
> (etnaviv_sched_timedout_job+0x7c/0xb4)
> [<c064a3e8>] (etnaviv_sched_timedout_job) from [<c05d2964>]
> (drm_sched_job_timedout+0x34/0x5c)
> [<c05d2964>] (drm_sched_job_timedout) from [<c01468ec>]
> (process_one_work+0x2ac/0x704)
> [<c01468ec>] (process_one_work) from [<c0146d70>]
> (worker_thread+0x2c/0x574)
> [<c0146d70>] (worker_thread) from [<c014cd88>] (kthread+0x134/0x148)
> [<c014cd88>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> Exception stack(0xe81f7fb0 to 0xe81f7ff8)
> 7fa0:                                     00000000 00000000 00000000
> 00000000
> 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> etnaviv-gpu 134000.gpu: recover hung GPU!
> _______________________________________________
> etnaviv mailing list
> etnaviv at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/etnaviv