GVT Scheduler

Julian Stecklina julian.stecklina at cyberus-technology.de
Wed Sep 30 11:28:43 UTC 2020


Hello,

we've just found this discussion from 2017 that looks directly related:
https://lists.freedesktop.org/archives/intel-gvt-dev/2017-February/000063.html

Especially, the race between complete_current_workload
and shadow_context_status_change looks problematic and reading the current code
I cannot convince myself that it's race free.

In our testing we have seen page faults in shadow_context_status_change in the
final wake_up(&workload->shadow_ctx_status_wq); call that are hard to explain
without a race like that. The backtraces always look like the one below.

We are currently testing with v5.4.68, but I don't see anything relevant being
changed in newer versions.

Any pointers are appreciated.

[ 2594.865440] BUG: unable to handle page fault for address: 00000000000263e0
[ 2594.865448] RIP: 0010:[<ffffffff814b3fcb>]
queued_spin_lock_slowpath+0x17b/0x1c0
[...]
[ 2594.865484] Call Trace:
[ 2594.865487]  <IRQ>
[ 2594.865490] _raw_spin_lock_irqsave (kernel/locking/spinlock.c:159)
[ 2594.865494] __wake_up_common_lock (kernel/sched/wait.c:123)
[ 2594.865499] shadow_context_status_change
(drivers/gpu/drm/i915/gvt/scheduler.c:286)
[ 2594.865501] notifier_call_chain (kernel/notifier.c:104)
[ 2594.865504] atomic_notifier_call_chain (kernel/notifier.c:203)
[ 2594.865507] process_csb (drivers/gpu/drm/i915/gt/intel_lrc.c:610
drivers/gpu/drm/i915/gt/intel_lrc.c:640
drivers/gpu/drm/i915/gt/intel_lrc.c:1590)
[ 2594.865510] execlists_submission_tasklet
(drivers/gpu/drm/i915/gt/intel_lrc.c:1637)
[ 2594.865514] tasklet_action_common (./arch/x86/include/asm/bitops.h:75
./include/asm-generic/bitops-instrumented.h:57 ./include/linux/interrupt.h:624
kernel/softirq.c:523)
[ 2594.865517] __do_softirq (./arch/x86/include/asm/jump_label.h:25
./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142
kernel/softirq.c:293)
[ 2594.865520] irq_exit (kernel/softirq.c:373 kernel/softirq.c:413)
[ 2594.865523] do_IRQ (arch/x86/kernel/irq.c:267)
[ 2594.865526] common_interrupt (arch/x86/entry/entry_64.S:890)

Julian

On Tue, 2020-09-29 at 15:03 +0200, Julian Stecklina wrote:
> Hello everyone!
> 
> I'm currently trying to understand the GVT scheduler (gvt/scheduler.c) better.
> I'm specifically trying to understand how the shadow_context_status_change()
> callback is synchronized with other code that modifies the current_workload
> array. I would be very grateful, if someone has a couple of minutes to shed
> some
> light here. :)
> 
> Can shadow_context_status_change[1] run concurrently with other code that
> modifies scheduler->current_workload[ring_id]? I see other functions holding
> gvt->sched_lock, but the callback does not.
> 
> If sched_lock is not required in the callback, what currently prevents
> concurrent exection, e.g. with workload_thread()?
> 
> Thanks!
> Julian
> 
> [1] 
> https://elixir.bootlin.com/linux/v5.9-rc7/source/drivers/gpu/drm/i915/gvt/scheduler.c#L268
> 
> 
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev



More information about the intel-gvt-dev mailing list