GVT Scheduler
Zhenyu Wang
zhenyuw at linux.intel.com
Tue Nov 3 03:33:36 UTC 2020
On 2020.10.28 17:46:21 +0200, Julian Stecklina wrote:
> Hi!
>
> On Wed, 2020-10-28 at 10:40 +0200, Julian Stecklina wrote:
> > > According to our assumption, there might be extra execlist schedule-out
> > > status notification. Is it possible that you can open the tracepoint in
> > > execlist_context_schedule_in and execlist_context_schedule_out in
> > > intel_lrc.c?
> >
> >
> > We'll try turning trace_i915_request_in / trace_i915_request_out into printks
> > and see whether this helps in debugging. Alternatively, is there a way to get
> > trace events out of a crashed kernel?
> >
> > Btw, would it make sense to count the schedule_in and schedule_out events for
> > each requests and dump a stacktrace when we see an unpaired schedule_out?
>
> So we tried this out with a tiny patch that checks for matched schedule in/out
> events:
>
> https://github.com/blitz/linux/commit/441663fab60df4a4692d5cc031dcfdeffe243008
>
> It would be good if you can check whether this is a useful invariant to warn on.
> :)
>
> On one system, we see this triggering right after boot with no VMs running at
> all (see below). I haven't seen this with our production VM workload yet, but
> that usually takes hours to manifest. So we might have something there tomorrow.
>
Hmm, looks one i915 change removed check of actual request preempted for status...
I'm not sure if that's relevant, but maybe you could try like:
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index d0be98b67138..f1a16d4b6e6a 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1439,7 +1439,9 @@ __execlists_schedule_out(struct i915_request *rq,
intel_context_update_runtime(ce);
intel_engine_context_out(engine);
- execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
+ execlists_context_status_change(rq, i915_request_completed(rq) ?
+ INTEL_CONTEXT_SCHEDULE_OUT:
+ INTEL_CONTEXT_SCHEDULE_PREEMPTED);
if (engine->fw_domain && !atomic_dec_return(&engine->fw_active))
intel_uncore_forcewake_put(engine->uncore, engine->fw_domain);
intel_gt_pm_put_async(engine->gt);
> [ 10.370703] ------------[ cut here ]------------
> [ 10.370734] mismatched schedule in/out operations
> [ 10.370807] WARNING: CPU: 1 PID: 0 at drivers/gpu/drm/i915/gt/intel_lrc.c:612
> process_csb+0x762/0x7a0 [i915]
> [ 10.370842] fb_sys_fops e1000e igb i2c_i801 drm dca ahci i2c_algo_bit
> libahci wmi video pinctrl_cannonlake pinctrl_intel
> [ 10.370849] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.61 #1
> [ 10.370849] Hardware name: Gigabyte Technology Co., Ltd. Q370M D3H GSM
> PLUS/Q370M D3H GSM PLUS, BIOS F14 06/05/2019
> [ 10.370902] RIP: 0010:process_csb+0x762/0x7a0 [i915]
> [ 10.370904] Code: 88 aa 15 00 00 0f 85 0f fd ff ff 48 c7 c7 10 e3 70 c0 4c 89
> 55 b0 48 89 4d b8 48 89 55 c0 c6 05 68 aa 15 00 01 e8 99 b7 2a eb <0f> 0b 4c 8b
> 55 b0 48 8b 4d b8 48 8b 55 c0 e9 dd fc ff ff 4c 89 55
> [ 10.370905] RSP: 0018:ffffb1204014ce60 EFLAGS: 00010286
> [ 10.370906] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 10.370907] RDX: 0000000000000025 RSI: ffffffffad387405 RDI: 0000000000000246
> [ 10.370907] RBP: ffffb1204014cec0 R08: ffffffffad3873e0 R09: 0000000000000025
> [ 10.370907] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000006
> [ 10.370908] R13: ffff8ed12dcfe040 R14: 0000000000000001 R15: ffff8ed12f6fe000
> [ 10.370909] FS: 0000000000000000(0000) GS:ffff8ed130440000(0000)
> knlGS:0000000000000000
> [ 10.370909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 10.370910] CR2: 000055da74158008 CR3: 000000017b40a004 CR4: 00000000003606e0
> [ 10.370910] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 10.370910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 10.370911] Call Trace:
> [ 10.370912] <IRQ>
> [ 10.370928] execlists_submission_tasklet+0x19/0x70 [i915]
> [ 10.370948] tasklet_action_common.isra.0+0x60/0x110
> [ 10.370949] tasklet_hi_action+0x1f/0x30
> [ 10.370952] __do_softirq+0xe1/0x2d6
> [ 10.370955] ? update_ts_time_stats+0x58/0x80
> [ 10.370956] irq_exit+0xae/0xb0
> [ 10.370957] scheduler_ipi+0xe4/0x130
> [ 10.370958] smp_reschedule_interrupt+0x39/0xe0
> [ 10.370959] reschedule_interrupt+0xf/0x20
> [ 10.370960] </IRQ>
> [ 10.370964] RIP: 0010:cpuidle_enter_state+0xc5/0x450
> [ 10.370965] Code: ff e8 0f 78 82 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6
> c4 02 0f 85 65 03 00 00 31 ff e8 62 dc 88 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f
> 88 8f 02 00 00 49 63 cd 4c 8b 7d d0 4c 2b 7d c8 48 8d
> [ 10.370966] RSP: 0018:ffffb120400efe38 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff02
> [ 10.370966] RAX: ffff8ed13046a880 RBX: ffffffffacf58e80 RCX: 000000000000001f
> [ 10.370967] RDX: 0000000000000000 RSI: 000000002aaaab99 RDI: 0000000000000000
> [ 10.370967] RBP: ffffb120400efe78 R08: 000000026a23c65e R09: 000000028d99190d
> [ 10.370967] R10: ffff8ed130469580 R11: ffff8ed130469560 R12: ffff8ed130475928
> [ 10.370968] R13: 0000000000000008 R14: 0000000000000008 R15: ffff8ed130475928
> [ 10.370970] ? cpuidle_enter_state+0xa1/0x450
> [ 10.370971] cpuidle_enter+0x2e/0x40
> [ 10.370988] call_cpuidle+0x23/0x40
> [ 10.370989] do_idle+0x1dd/0x270
> [ 10.370990] cpu_startup_entry+0x20/0x30
> [ 10.370992] start_secondary+0x167/0x1c0
> [ 10.370994] secondary_startup_64+0xa4/0xb0
> [ 10.370995] ---[ end trace 85cd1056f39ffa8d ]---
>
> Julian
>
>
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
--
$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gvt-dev/attachments/20201103/1474dbc2/attachment.sig>
More information about the intel-gvt-dev
mailing list