[Intel-gfx] [PATCH 14/27] drm/i915: Don't mark an execlists context-switch when idle
Joonas Lahtinen
joonas.lahtinen at linux.intel.com
Thu Apr 20 08:53:19 UTC 2017
On ke, 2017-04-19 at 10:41 +0100, Chris Wilson wrote:
> If we *know* that the engine is idle, i.e. we have not more contexts in
> lift, we can skip any spurious CSB idle interrupts. These spurious
in flight?
> interrupts seem to arrive long after we assert that the engines are
> completely idle, triggering later assertions:
>
> [ 178.896646] intel_engine_is_idle(bcs): interrupt not handled, irq_posted=2
> [ 178.896655] ------------[ cut here ]------------
> [ 178.896658] kernel BUG at drivers/gpu/drm/i915/intel_engine_cs.c:226!
> [ 178.896661] invalid opcode: 0000 [#1] SMP
> [ 178.896663] Modules linked in: i915(E) x86_pkg_temp_thermal(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) intel_gtt(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) aesni_intel(E) prime_numbers(E) evdev(E) aes_x86_64(E) drm(E) crypto_simd(E) cryptd(E) glue_helper(E) mei_me(E) mei(E) lpc_ich(E) efivars(E) mfd_core(E) battery(E) video(E) acpi_pad(E) button(E) tpm_tis(E) tpm_tis_core(E) tpm(E) autofs4(E) i2c_i801(E) fan(E) thermal(E) i2c_designware_platform(E) i2c_designware_core(E)
> [ 178.896694] CPU: 1 PID: 522 Comm: gem_exec_whispe Tainted: G E 4.11.0-rc5+ #14
> [ 178.896702] task: ffff88040aba8d40 task.stack: ffffc900003f0000
> [ 178.896722] RIP: 0010:intel_engine_init_global_seqno+0x1db/0x1f0 [i915]
> [ 178.896725] RSP: 0018:ffffc900003f3ab0 EFLAGS: 00010246
> [ 178.896728] RAX: 0000000000000000 RBX: ffff88040af54000 RCX: 0000000000000000
> [ 178.896731] RDX: ffff88041ec933e0 RSI: ffff88041ec8cc48 RDI: ffff88041ec8cc48
> [ 178.896734] RBP: ffffc900003f3ac8 R08: 0000000000000000 R09: 000000000000047d
> [ 178.896736] R10: 0000000000000040 R11: ffff88040b344f80 R12: 0000000000000000
> [ 178.896739] R13: ffff88040bce0000 R14: ffff88040bce52d8 R15: ffff88040bce0000
> [ 178.896742] FS: 00007f2cccc2d8c0(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000
> [ 178.896746] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 178.896749] CR2: 00007f41ddd8f000 CR3: 000000040bb03000 CR4: 00000000001406e0
> [ 178.896752] Call Trace:
> [ 178.896768] reset_all_global_seqno.part.33+0x4e/0xd0 [i915]
> [ 178.896782] i915_gem_request_alloc+0x304/0x330 [i915]
> [ 178.896795] i915_gem_do_execbuffer+0x8a1/0x17d0 [i915]
> [ 178.896799] ? remove_wait_queue+0x48/0x50
> [ 178.896812] ? i915_wait_request+0x300/0x590 [i915]
> [ 178.896816] ? wake_up_q+0x70/0x70
> [ 178.896819] ? refcount_dec_and_test+0x11/0x20
> [ 178.896823] ? reservation_object_add_excl_fence+0xa5/0x100
> [ 178.896835] i915_gem_execbuffer2+0xab/0x1f0 [i915]
> [ 178.896844] drm_ioctl+0x1e6/0x460 [drm]
> [ 178.896858] ? i915_gem_execbuffer+0x260/0x260 [i915]
> [ 178.896862] ? dput+0xcf/0x250
> [ 178.896866] ? full_proxy_release+0x66/0x80
> [ 178.896869] ? mntput+0x1f/0x30
> [ 178.896872] do_vfs_ioctl+0x8f/0x5b0
> [ 178.896875] ? ____fput+0x9/0x10
> [ 178.896878] ? task_work_run+0x80/0xa0
> [ 178.896881] SyS_ioctl+0x3c/0x70
> [ 178.896885] entry_SYSCALL_64_fastpath+0x17/0x98
> [ 178.896888] RIP: 0033:0x7f2ccb455ca7
> [ 178.896890] RSP: 002b:00007ffcabec72d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 178.896894] RAX: ffffffffffffffda RBX: 000055f897a44b90 RCX: 00007f2ccb455ca7
> [ 178.896897] RDX: 00007ffcabec74a0 RSI: 0000000040406469 RDI: 0000000000000003
> [ 178.896900] RBP: 00007f2ccb70a440 R08: 00007f2ccb70d0a4 R09: 0000000000000000
> [ 178.896903] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [ 178.896905] R13: 000055f89782d71a R14: 00007ffcabecf838 R15: 0000000000000003
> [ 178.896908] Code: 00 31 d2 4c 89 ef 8d 70 48 41 ff 95 f8 06 00 00 e9 68 fe ff ff be 0f 00 00 00 48 c7 c7 48 dc 37 a0 e8 fa 33 d6 e0 e9 0b ff ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00
>
> On the other hand, by ignoring the interrupt do we risk running out of
> space in CSB ring? Testing for a few hours suggests not, i.e. that we
> only seem to get the odd delayed CSB idle notification.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Slap your Tested-by too.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
Even with that, I dislike the port_count macro.
Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
More information about the Intel-gfx
mailing list