[Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
Matthew Brost
matthew.brost at intel.com
Fri May 14 16:46:21 UTC 2021
On Fri, May 14, 2021 at 11:36:37AM -0500, Jason Ekstrand wrote:
> On Fri, May 14, 2021 at 6:12 AM Tvrtko Ursulin
> <tvrtko.ursulin at linux.intel.com> wrote:
> >
> > On 06/05/2021 20:13, Matthew Brost wrote:
> > > Basic GuC submission support. This is the first bullet point in the
> > > upstreaming plan covered in the following RFC [1].
> > >
> > > At a very high level the GuC is a piece of firmware which sits between
> > > the i915 and the GPU. It offloads some of the scheduling of contexts
> > > from the i915 and programs the GPU to submit contexts. The i915
> > > communicates with the GuC and the GuC communicates with the GPU.
> > >
> > > GuC submission will be disabled by default on all current upstream
> > > platforms behind a module parameter - enable_guc. A value of 3 will
> > > enable submission and HuC loading via the GuC. GuC submission should
> > > work on all gen11+ platforms assuming the GuC firmware is present.
> >
> > Some thoughts mostly relating to future platforms where GuC will be the
> > only option, and to some extent platforms where it will be possible to
> > turn it on for one reason or another.
> >
> > Debuggability - in the context of having an upstream way/tool for
> > capturing and viewing GuC logs usable for attaching to bug reports.
> >
> > Currently i915 logs, can provide traces via tracepoints and trace
> > printk, and GPU error capture state, which provides often sufficient
> > trail of evidence to debug issues.
> >
> > We need to make sure GuC does is not a black box in this respect. By
> > this I mean it does not hide a large portion of the execution flows from
> > upstream observability.
>
> I agree here. If GuC suddenly makes submission issues massively
> harder to debug then that's a regression vs. execlists. I don't know
> what the solution there is but I think the concern is valid.
>
Replied to Tvrtko with detailed answers. The TL;DR is agree with basically
everything he said and we have plans address it all and everything must be
addressed before the GuC can be turned on by default.
Matt
> > This could mean a tool in IGT to access/capture GuC logs and update bug
> > filing instructions.
> >
> > Leading from here is probably the need for the GuC firmware team to
> > cross the internal-upstream boundary and deal with such bug reports on
> > upstream trackers. Upstream GuC is unlikely to work if we don't have
> > such plan and commitment.
>
> I mostly agree here as well. I'm not sure it'll actually happen but
> I'd like anyone who writes code which impacts Linux to be active in
> upstream bug trackers.
>
> > Also leading from here is the need for GPU error capture to be on par
> > from day one which is I believe still not there in the firmware.
>
> This one has me genuinely concerned. I've heard rumors that we don't
> have competent error captures with GuC yet. From the Mesa PoV, this
> is a non-starter. We can't be asked to develop graphics drivers with
> no error capture.
>
> The good news is that, based on my understanding, it shouldn't be
> terrible to support. We just need the GuC to grab all the registers
> for us and shove them in a buffer somewhere before it resets the GPU
> and all that data is lost. I would hope the Windows people have
> already done that and we just need to hook it up. If not, there may
> be some GuC engineering required here.
>
> > Another, although unrelated, missing feature on my wish list is firmware
> > support for wiring up accurate engine busyness stats to i915 PMU. I
> > believe this is also being worked on but I don't know when is the
> > expected delivery.
> >
> > If we are tracking a TODO list of items somewhere I think these ones
> > should be definitely considered.
>
> Yup, let's get it all in the ToDo and not flip GuC on by default in
> the wild until it's all checked off.
>
> --Jason
More information about the Intel-gfx
mailing list