GVT Scheduler

Julian Stecklina julian.stecklina at cyberus-technology.de
Wed Oct 28 08:40:15 UTC 2020


Hi,

On Mon, 2020-10-19 at 06:11 +0000, Wang, Zhi A wrote:
> According to the discussion last time, I reviewed all the code paths of
> execlist context schedule-in and schedule-out in your code repo.

Thank you! Sorry for getting back so late, but a lot of us are currently on
vacation. I'm also adding Stefan to the thread, because I'm not available next
week and it would be good if we keep this rolling. :)

Our current workaround [1] results in hung tasks unfortunately, so it would be
really helpful to get to the rootcause of this issue to avoid having to put
bandaids over it.

>  According to our assumption, there might be extra execlist schedule-out
> status notification. Is it possible that you can open the tracepoint in
> execlist_context_schedule_in and execlist_context_schedule_out in intel_lrc.c?

We'll try turning trace_i915_request_in / trace_i915_request_out into printks
and see whether this helps in debugging. Alternatively, is there a way to get
trace events out of a crashed kernel?

Btw, would it make sense to count the schedule_in and schedule_out events for
each requests and dump a stacktrace when we see an unpaired schedule_out?

> so that we can check if there are unpair schedule_in and schedule_out events.
> Also, better move the trace_i915_request_in to __execlists_schedule_in as
> there is a seqlock like sync try_cmpxchg around.

Will do!

Thanks,
Julian

[1] 
https://github.com/blitz/linux/commit/50a1cfd0695f7c141d16377c087a3642faee9b99

> -----Original Message-----
> From: intel-gvt-dev <intel-gvt-dev-bounces at lists.freedesktop.org> On Behalf Of
> Julian Stecklina
> Sent: Friday, October 9, 2020 12:26 PM
> To: Wang, Zhi A <zhi.a.wang at intel.com>; Intel GVT Dev <
> intel-gvt-dev at lists.freedesktop.org>
> Cc: Thomas Prescher <thomas.prescher at cyberus-technology.de>
> Subject: Re: GVT Scheduler
> 
> Hi Zhi,
> 
> your explanation is really helpful. Thank you! See my comments below.
> 
> On Thu, 2020-10-08 at 16:41 +0000, Wang, Zhi A wrote:
> > Now let's see the timeline:
> > 
> > - GVT-g submits a workload to i915.
> > - i915 append the breadcrumb to the workload
> > - i915 submits the workload to HW.
> > - GVT-g called i915_wait_request to wait for the GPU execution passed 
> > the breadcrumb. (But the context might not be switched out at this 
> > time)
> > - GVT-g waits for the context to be switched out by the 
> > shadow_context_status_change. (Because GVT-g need to copy the content 
> > in the shadow context back to the guest context. The shadow context 
> > must be idle at this time.)
> > - No one is going to touch the shadow context anymore and GVT-g call 
> > complete_current_workload.
> > 
> > The race between shadow_context_status_change and 
> > complete_current_workload should be addressed in our design. So this 
> > problem might be caused by i915 change, e.g. the timing of call 
> > shadow_context_status_change is changed. But we will double confirm in GVT-g 
> > as well.
> 
> We definitely see shadow_context_status_change being called for a workload
> that has already passed beyond wait_event(workload->shadow_ctx_status_wq,
> ...); in complete_current_workload.
> 
> > The patch you mentioned is for a corner case in GPU reset. But this 
> > shouldn't happen in a normal submission flow unless someone breaks the flow
> > above.
> 
> The problem for us is that we can only reproduce this issue in a hardened
> Linux build after many hours. So it's not exactly the most friendly issue to
> debug. :)
> 
> We are currently using a workaround that serializes the actual completion of
> the workload against handling schedule out in shadow_context_status_change:
> https://github.com/blitz/linux/commit/50a1cfd0695f7c141d16377c087a3642faee9b99
> 
> This is not pretty, but so far this has prevented the issue from popping up
> again.
> 
> Thanks,
> Julian
> 
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev



More information about the intel-gvt-dev mailing list