GVT Scheduler

Julian Stecklina julian.stecklina at cyberus-technology.de
Wed Oct 28 15:46:21 UTC 2020


Hi!

On Wed, 2020-10-28 at 10:40 +0200, Julian Stecklina wrote:
> >   According to our assumption, there might be extra execlist schedule-out
> > status notification. Is it possible that you can open the tracepoint in
> > execlist_context_schedule_in and execlist_context_schedule_out in
> > intel_lrc.c?
> 
> 
> We'll try turning trace_i915_request_in / trace_i915_request_out into printks
> and see whether this helps in debugging. Alternatively, is there a way to get
> trace events out of a crashed kernel?
> 
> Btw, would it make sense to count the schedule_in and schedule_out events for
> each requests and dump a stacktrace when we see an unpaired schedule_out?

So we tried this out with a tiny patch that checks for matched schedule in/out
events:

https://github.com/blitz/linux/commit/441663fab60df4a4692d5cc031dcfdeffe243008

It would be good if you can check whether this is a useful invariant to warn on.
:)

On one system, we see this triggering right after boot with no VMs running at
all (see below). I haven't seen this with our production VM workload yet, but
that usually takes hours to manifest. So we might have something there tomorrow.

[   10.370703] ------------[ cut here ]------------
[   10.370734] mismatched schedule in/out operations
[   10.370807] WARNING: CPU: 1 PID: 0 at drivers/gpu/drm/i915/gt/intel_lrc.c:612
process_csb+0x762/0x7a0 [i915]
[   10.370842]  fb_sys_fops e1000e igb i2c_i801 drm dca ahci i2c_algo_bit
libahci wmi video pinctrl_cannonlake pinctrl_intel
[   10.370849] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.61 #1
[   10.370849] Hardware name: Gigabyte Technology Co., Ltd. Q370M D3H GSM
PLUS/Q370M D3H GSM PLUS, BIOS F14 06/05/2019
[   10.370902] RIP: 0010:process_csb+0x762/0x7a0 [i915]
[   10.370904] Code: 88 aa 15 00 00 0f 85 0f fd ff ff 48 c7 c7 10 e3 70 c0 4c 89
55 b0 48 89 4d b8 48 89 55 c0 c6 05 68 aa 15 00 01 e8 99 b7 2a eb <0f> 0b 4c 8b
55 b0 48 8b 4d b8 48 8b 55 c0 e9 dd fc ff ff 4c 89 55
[   10.370905] RSP: 0018:ffffb1204014ce60 EFLAGS: 00010286
[   10.370906] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   10.370907] RDX: 0000000000000025 RSI: ffffffffad387405 RDI: 0000000000000246
[   10.370907] RBP: ffffb1204014cec0 R08: ffffffffad3873e0 R09: 0000000000000025
[   10.370907] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000006
[   10.370908] R13: ffff8ed12dcfe040 R14: 0000000000000001 R15: ffff8ed12f6fe000
[   10.370909] FS:  0000000000000000(0000) GS:ffff8ed130440000(0000)
knlGS:0000000000000000
[   10.370909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.370910] CR2: 000055da74158008 CR3: 000000017b40a004 CR4: 00000000003606e0
[   10.370910] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.370910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.370911] Call Trace:
[   10.370912]  <IRQ>
[   10.370928]  execlists_submission_tasklet+0x19/0x70 [i915]
[   10.370948]  tasklet_action_common.isra.0+0x60/0x110
[   10.370949]  tasklet_hi_action+0x1f/0x30
[   10.370952]  __do_softirq+0xe1/0x2d6
[   10.370955]  ? update_ts_time_stats+0x58/0x80
[   10.370956]  irq_exit+0xae/0xb0
[   10.370957]  scheduler_ipi+0xe4/0x130
[   10.370958]  smp_reschedule_interrupt+0x39/0xe0
[   10.370959]  reschedule_interrupt+0xf/0x20
[   10.370960]  </IRQ>
[   10.370964] RIP: 0010:cpuidle_enter_state+0xc5/0x450
[   10.370965] Code: ff e8 0f 78 82 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6
c4 02 0f 85 65 03 00 00 31 ff e8 62 dc 88 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f
88 8f 02 00 00 49 63 cd 4c 8b 7d d0 4c 2b 7d c8 48 8d
[   10.370966] RSP: 0018:ffffb120400efe38 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff02
[   10.370966] RAX: ffff8ed13046a880 RBX: ffffffffacf58e80 RCX: 000000000000001f
[   10.370967] RDX: 0000000000000000 RSI: 000000002aaaab99 RDI: 0000000000000000
[   10.370967] RBP: ffffb120400efe78 R08: 000000026a23c65e R09: 000000028d99190d
[   10.370967] R10: ffff8ed130469580 R11: ffff8ed130469560 R12: ffff8ed130475928
[   10.370968] R13: 0000000000000008 R14: 0000000000000008 R15: ffff8ed130475928
[   10.370970]  ? cpuidle_enter_state+0xa1/0x450
[   10.370971]  cpuidle_enter+0x2e/0x40
[   10.370988]  call_cpuidle+0x23/0x40
[   10.370989]  do_idle+0x1dd/0x270
[   10.370990]  cpu_startup_entry+0x20/0x30
[   10.370992]  start_secondary+0x167/0x1c0
[   10.370994]  secondary_startup_64+0xa4/0xb0
[   10.370995] ---[ end trace 85cd1056f39ffa8d ]---

Julian




More information about the intel-gvt-dev mailing list