GVT Scheduler
Julian Stecklina
julian.stecklina at cyberus-technology.de
Wed Oct 28 08:40:15 UTC 2020
Hi,
On Mon, 2020-10-19 at 06:11 +0000, Wang, Zhi A wrote:
> According to the discussion last time, I reviewed all the code paths of
> execlist context schedule-in and schedule-out in your code repo.
Thank you! Sorry for getting back so late, but a lot of us are currently on
vacation. I'm also adding Stefan to the thread, because I'm not available next
week and it would be good if we keep this rolling. :)
Our current workaround [1] results in hung tasks unfortunately, so it would be
really helpful to get to the rootcause of this issue to avoid having to put
bandaids over it.
> According to our assumption, there might be extra execlist schedule-out
> status notification. Is it possible that you can open the tracepoint in
> execlist_context_schedule_in and execlist_context_schedule_out in intel_lrc.c?
We'll try turning trace_i915_request_in / trace_i915_request_out into printks
and see whether this helps in debugging. Alternatively, is there a way to get
trace events out of a crashed kernel?
Btw, would it make sense to count the schedule_in and schedule_out events for
each requests and dump a stacktrace when we see an unpaired schedule_out?
> so that we can check if there are unpair schedule_in and schedule_out events.
> Also, better move the trace_i915_request_in to __execlists_schedule_in as
> there is a seqlock like sync try_cmpxchg around.
Will do!
Thanks,
Julian
[1]
https://github.com/blitz/linux/commit/50a1cfd0695f7c141d16377c087a3642faee9b99
> -----Original Message-----
> From: intel-gvt-dev <intel-gvt-dev-bounces at lists.freedesktop.org> On Behalf Of
> Julian Stecklina
> Sent: Friday, October 9, 2020 12:26 PM
> To: Wang, Zhi A <zhi.a.wang at intel.com>; Intel GVT Dev <
> intel-gvt-dev at lists.freedesktop.org>
> Cc: Thomas Prescher <thomas.prescher at cyberus-technology.de>
> Subject: Re: GVT Scheduler
>
> Hi Zhi,
>
> your explanation is really helpful. Thank you! See my comments below.
>
> On Thu, 2020-10-08 at 16:41 +0000, Wang, Zhi A wrote:
> > Now let's see the timeline:
> >
> > - GVT-g submits a workload to i915.
> > - i915 append the breadcrumb to the workload
> > - i915 submits the workload to HW.
> > - GVT-g called i915_wait_request to wait for the GPU execution passed
> > the breadcrumb. (But the context might not be switched out at this
> > time)
> > - GVT-g waits for the context to be switched out by the
> > shadow_context_status_change. (Because GVT-g need to copy the content
> > in the shadow context back to the guest context. The shadow context
> > must be idle at this time.)
> > - No one is going to touch the shadow context anymore and GVT-g call
> > complete_current_workload.
> >
> > The race between shadow_context_status_change and
> > complete_current_workload should be addressed in our design. So this
> > problem might be caused by i915 change, e.g. the timing of call
> > shadow_context_status_change is changed. But we will double confirm in GVT-g
> > as well.
>
> We definitely see shadow_context_status_change being called for a workload
> that has already passed beyond wait_event(workload->shadow_ctx_status_wq,
> ...); in complete_current_workload.
>
> > The patch you mentioned is for a corner case in GPU reset. But this
> > shouldn't happen in a normal submission flow unless someone breaks the flow
> > above.
>
> The problem for us is that we can only reproduce this issue in a hardened
> Linux build after many hours. So it's not exactly the most friendly issue to
> debug. :)
>
> We are currently using a workaround that serializes the actual completion of
> the workload against handling schedule out in shadow_context_status_change:
> https://github.com/blitz/linux/commit/50a1cfd0695f7c141d16377c087a3642faee9b99
>
> This is not pretty, but so far this has prevented the issue from popping up
> again.
>
> Thanks,
> Julian
>
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
More information about the intel-gvt-dev
mailing list