[Intel-xe] [PATCH 1/2] drm/xe: Print devcoredump to drm_info

Rodrigo Vivi rodrigo.vivi at intel.com
Thu Oct 5 16:13:26 UTC 2023


On Thu, Oct 05, 2023 at 11:38:10AM -0400, Summers, Stuart wrote:
> On Wed, 2023-10-04 at 09:37 -0400, Rodrigo Vivi wrote:
> > On Tue, Sep 26, 2023 at 09:20:55PM +0000, Stuart Summers wrote:
> > > Currently the devcoredump is available in the file system
> > > for a user. However 1) CI isn't currently set up to dump
> > > this data and 2) if we try to dump this during the driver
> > > load when a failure is observed, the driver will be torn
> > > down before we have a chance to actually read this data.
> > > 
> > > Add a quick routine to print this out with drm_info as well
> > > if CONFIG_DRM_XE_DEBUG is enabled.
> > 
> > I feel this is kind of a duplication of simple_error_capture()
> > that I thought we could kill... at least that has a more specific
> 
> What was the reason to kill it?

just ignore that thought. I thought it was not useful after we
got the devcoredump, but you just showed a good case to keep.

> 
> > config: CONFIG_DRM_XE_SIMPLE_ERROR_CAPTURE
> > 
> > I mean, maybe the right thing to do is to kill that and have
> > this one here, but with a dedicated config?!...
> 
> I feel like the ability to dump extra information when an error happens
> during driver load is critical for debug. We could come up with a way
> to dump this during teardown to some other file that is persistent -
> even in /tmp or something?

what I mean now is to just keep one version of it, or keep that
simple error capture, or the one that you are proposing, but not both.
and with a dedicated config like this CONFIG_DRM_XE_SIMPLE_ERROR_CAPTURE
instead of the reuse of the generic debug config like you had here.

> 
> Thanks,
> Stuart
> 
> > 
> > > 
> > > Signed-off-by: Stuart Summers <stuart.summers at intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_devcoredump.c | 17 ++++++++++++++++-
> > >  drivers/gpu/drm/xe/xe_devcoredump.h |  3 +++
> > >  2 files changed, 19 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c
> > > b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > index 68abc0b195be..aa41d8e9b602 100644
> > > --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > @@ -78,7 +78,11 @@ static ssize_t xe_devcoredump_read(char *buffer,
> > > loff_t offset,
> > >         iter.remain = count;
> > >  
> > >         ss = &coredump->snapshot;
> > > -       p = drm_coredump_printer(&iter);
> > > +
> > > +       if (iter.data)
> > > +               p = drm_coredump_printer(&iter);
> > > +       else
> > > +               p = drm_info_printer(coredump_to_xe(coredump)-
> > > >drm.dev);
> > >  
> > >         drm_printf(&p, "**** Xe Device Coredump ****\n");
> > >         drm_printf(&p, "kernel: " UTS_RELEASE "\n");
> > > @@ -102,6 +106,15 @@ static ssize_t xe_devcoredump_read(char
> > > *buffer, loff_t offset,
> > >         return count - iter.remain;
> > >  }
> > >  
> > > +/* Print the coredump locally also for debug purposes */
> > > +void
> > > +xe_devcoredump_print(struct xe_devcoredump *coredump)
> > > +{
> > > +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > > +       xe_devcoredump_read(NULL, 0, 0, coredump, 0);
> > > +#endif
> > > +}
> > > +
> > >  static void xe_devcoredump_free(void *data)
> > >  {
> > >         struct xe_devcoredump *coredump = data;
> > > @@ -192,5 +205,7 @@ void xe_devcoredump(struct xe_exec_queue *q)
> > >  
> > >         dev_coredumpm(xe->drm.dev, THIS_MODULE, coredump, 0,
> > > GFP_KERNEL,
> > >                       xe_devcoredump_read, xe_devcoredump_free);
> > > +
> > > +       xe_devcoredump_print(coredump);
> > >  }
> > >  #endif
> > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h
> > > b/drivers/gpu/drm/xe/xe_devcoredump.h
> > > index 6ac218a5c194..ba0c2a7b71b4 100644
> > > --- a/drivers/gpu/drm/xe/xe_devcoredump.h
> > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.h
> > > @@ -8,6 +8,9 @@
> > >  
> > >  struct xe_device;
> > >  struct xe_exec_queue;
> > > +struct xe_devcoredump;
> > > +
> > > +void xe_devcoredump_print(struct xe_devcoredump *coredump);
> > >  
> > >  #ifdef CONFIG_DEV_COREDUMP
> > >  void xe_devcoredump(struct xe_exec_queue *q);
> > > -- 
> > > 2.34.1
> > > 
> 


More information about the Intel-xe mailing list