[Intel-gfx] [PATCH] drm/i915/mtl: Print SSEU information of all GTs for debugfs

Matt Roper matthew.d.roper at intel.com
Fri Nov 3 17:24:12 UTC 2023


On Fri, Nov 03, 2023 at 11:17:18AM +0000, Tvrtko Ursulin wrote:
> 
> On 03/11/2023 05:29, Gareth Yu wrote:
> > Print another SSEU information addition to the first one.
> > 
> > Cc : Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay at intel.com>
> > Cc : Matt Roper <matthew.d.roper at intel.com>
> > Cc : Ville Syrjälä <ville.syrjala at linux.intel.com>
> > Signed-off-by: Gareth Yu <gareth.yu at intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_debugfs.c   | 13 ++++++++++---
> >   drivers/gpu/drm/i915/i915_gpu_error.c |  6 +++++-
> >   2 files changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index e9b79c2c37d8..b5914a2c0597 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -63,13 +63,16 @@ static int i915_capabilities(struct seq_file *m, void *data)
> >   {
> >   	struct drm_i915_private *i915 = node_to_i915(m->private);
> >   	struct drm_printer p = drm_seq_file_printer(m);
> > +	struct intel_gt *gt;
> > +	unsigned int i;
> >   	seq_printf(m, "pch: %d\n", INTEL_PCH_TYPE(i915));
> >   	intel_device_info_print(INTEL_INFO(i915), RUNTIME_INFO(i915), &p);
> >   	intel_display_device_info_print(DISPLAY_INFO(i915), DISPLAY_RUNTIME_INFO(i915), &p);
> >   	i915_print_iommu_status(i915, &p);
> > -	intel_gt_info_print(&to_gt(i915)->info, &p);
> > +	for_each_gt(gt, i915, i)
> > +		intel_gt_info_print(&gt->info, &p);
> >   	intel_driver_caps_print(&i915->caps, &p);
> >   	kernel_param_lock(THIS_MODULE);
> > @@ -783,9 +786,13 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_drop_caches_fops,
> >   static int i915_sseu_status(struct seq_file *m, void *unused)
> >   {
> >   	struct drm_i915_private *i915 = node_to_i915(m->private);
> > -	struct intel_gt *gt = to_gt(i915);
> > +	struct intel_gt *gt;
> > +	unsigned int i;
> > +
> > +	for_each_gt(gt, i915, i)
> > +		intel_sseu_status(m, gt);
> 
> Don't we have the per GT debugfs directories and files already!?

Yeah, we shouldn't be updating this.  Commit a00eda7d8996 ("drm/i915:
Move sseu debugfs under gt/") notes:

        "The sseu_status debugfs has also been kept at the top level as
        we do have tests that use it; it will be removed once we teach
        the tests to look into the new path."

If there are still IGT tests that haven't been updated, dumping both GTs
here will probably break them since they aren't expecting it.  If they
have all been updated, then we should just move forward with deleting
this device-level SSEU instead.

> 
> > -	return intel_sseu_status(m, gt);
> > +	return 0;
> >   }
> >   static int i915_forcewake_open(struct inode *inode, struct file *file)
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index b4e31e59c799..2adc317af944 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -722,9 +722,13 @@ static void err_print_gt_info(struct drm_i915_error_state_buf *m,
> >   			      struct intel_gt_coredump *gt)
> >   {
> >   	struct drm_printer p = i915_error_printer(m);
> > +	struct drm_i915_private *i915 = gt->_gt->i915;
> > +	struct intel_gt *gt_n;
> > +	unsigned int n;
> >   	intel_gt_info_print(&gt->info, &p);
> > -	intel_sseu_print_topology(gt->_gt->i915, &gt->info.sseu, &p);
> > +	for_each_gt(gt_n, i915, n)
> > +		intel_sseu_print_topology(gt_n->i915, &gt_n->info.sseu, &p);
> 
> Do we need a consistent story across all error capture? Aka why is sseu
> special.
> 
> Also the intel_gt_info_print() above calls intel_sseu_dump so we end up with
> root tile SSEU printed twice?

I'm guessing this call was supposed to be deleted by 0b6613c6b91e
("drm/i915/sseu: Move sseu_info under gt_info").  We should probably go
ahead and do that now do remove the redundancy.

err_print_gt_info() should be printing the GT information (including
SSEU) for whichever GT had the error.  I don't see a reason why we'd
want to dump extra SSEU information for a different GT that wasn't
involved in the error.

Actually, SSEU is the _least_ useful thing to print for extra GTs
because once xehpsdv/pvc are gone from i915, the only platforms that
have multiple GTs are MTL/ARL and the SSEU information will always be
empty on the media GT (there's no DSS or EUs there).


Matt

> 
> There possibly was a Jira years ago to do stuff about multi-tile error
> capture but maybe it got lost.
> 
> Adding Andi if he has comments.
> 
> Regards,
> 
> Tvrtko
> 
> >   }
> >   static void err_print_gt_display(struct drm_i915_error_state_buf *m,

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-gfx mailing list