[Intel-gfx] [PATCH v5 06/10] drm/i915/guc: Update GuC's log-buffer-state access for error capture.

Tue Feb 8 22:18:14 UTC 2022

On Tue, Feb 08, 2022 at 11:38:13AM -0800, Teres Alexis, Alan Previn wrote:
> Hi Matt, thank you for taking the time to review the codes.
> Answering your question inline below.
> 
> 
> On Fri, 2022-02-04 at 10:19 -0800, Matthew Brost wrote:
> > On Wed, Jan 26, 2022 at 02:48:18AM -0800, Alan Previn wrote:
> > > GuC log buffer regions for debug-log-events, crash-dumps and
> > > error-state-capture are all a single bo allocation that includes
> > > the guc_log_buffer_state structures.
> > >
> > > Since the error-capture region is accessed with high priority at non-
> > > deterministic times (as part of gpu coredump) while the debug-log-event
> > > region is populated and accessed with different priorities, timings and
> > > consumers, let's split out separate locks for buffer-state accesses
> > > of each region.
> > >
> > > Also, ensure a global mapping is made up front for the entire bo
> > > throughout GuC operation so that dynamic mapping and unmapping isn't
> > > required for error capture log access if relay-logging isn't running.
> > >
> > > Additionally, while here, make some readibility improvements:
> > > 1. change previous function names with "capture_logs" to
> > >    "copy_debug_logs" to help make the distinction clearer.
> > > 2. Update the guc log region mapping comments to order them
> > >    according to the enum definition as per the GuC interface.
> > >
> > > Signed-off-by: Alan Previn <alan.previn.teres.alexis at intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
> > >  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  47 ++++++
> > >  .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |   1 +
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 135 +++++++++++-------
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  16 ++-
> > >  5 files changed, 141 insertions(+), 60 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > index 4e819853ec2e..be1ad7fa2bf8 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > @@ -34,6 +34,8 @@ struct intel_guc {
> > >     struct intel_uc_fw fw;
> > >     /** @log: sub-structure containing GuC log related data and objects */
> > >     struct intel_guc_log log;
> > > +   /** @log_state: states and locks for each subregion of GuC's log buffer */
> > > +   struct intel_guc_log_stats log_state[GUC_MAX_LOG_BUFFER];
> >
> > Why move this? This still probably should be sub-structure of
> > intel_guc_log. Most of the access is from functions that pass in
> > intel_guc_log, then retrieve the GuC object, only to access this new
> > intel_guc_log_stats object. That layering seems wrong, if the argument
> > to a function is intel_guc_log it should really try to access members
> > within that object or below. Obv some exceptions exist but here it seems
> > clear to me this is in the wrong place.
> >
> So the reasoning i had was because because intel_guc_log module only managed
> guc-relay-logging and guc-log-dumping for log-events but allocates the buffer
> for 3 separate subregion/usages (i.e. log-events, crash-dump and error-capture).
> That said, I did not want intel_guc_capture and relay-logging (or log-dumping)
> to have to contend with the same lock because these two subregions are completely
> independant of each other in terms of when they are being accessed and why.
> 

All that is fine, I just think the 'struct intel_guc_log_stats' should
be sub-substure of struct intel_guc_log.

> However, after the redesign of rev5 (this rev), I now believe the intel_guc_capture
> module does not require a lock because its only ever accessing its log
> subregion in response to the ctb handler functions that run out of the same queue.
> As we know intel_guc_capture reacts to G2H-error-capture-notification and G2H-context-reset
> (that trickles into i915_gpu_coredump). At the point of i915_error_state dump,
> intel_guc_capture module does not access the buffer - it merely dumps the already-parsed
> and engine-dump-node (that was detached from error-capture's internal linked-list
> and attached to the gpu_coredump structure in the second G2H above).
> 
> And of course, intel_guc_log only ever accesses the log-events subregion
> and never the error-capture regions.
> 
> That said, i could revert the lock structure to the original and not have
> intel_guc_capture using locks. What do you think?
> 

Again my comment has nothing to do with locking, it is where the
structure lives.

Matt

> ...alan
> 
> > Another nit, I'd personally break this out into multiple patches.
> >
> > e.g. 1 to rename relay log functions, 1 introducing intel_guc_log_stats
> > + lock, and 1 adding intel_guc_capture_output_min_size_est. Maybe I'm
> > missing another patch or two in there.
> >
> > Not a blocker but it does make reviews easier.
> >
> Will do.
> 
> > Matt
> >
> > >     /** @ct: the command transport communication channel */
> > >     struct intel_guc_ct ct;
> > >     /** @slpc: sub-structure containing SLPC related data and objects */
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > index 70d2ee841289..e7f99d051636 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > @@ -651,6 +651,53 @@ int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u3
> > >     return PAGE_ALIGN(alloc_size);
> > >  }
> > >
> > > 2.25.1
> > >
>