[PATCH v6 0/8] drm/xe/guc: Add GuC based register capture for error capture

Souza, Jose jose.souza at intel.com
Tue Mar 26 15:52:45 UTC 2024


On Tue, 2024-03-26 at 11:40 -0400, Dong, Zhanjun wrote:
> See my comments inline below.
> 
> Regards,
> Zhanjun
> On 2024-03-22 5:36 p.m., Dong, Zhanjun wrote:
> > See my comments below.
> > 
> > Regards,
> > Zhanjun
> > 
> > On 2024-03-22 5:28 p.m., Souza, Jose wrote:
> > > On Fri, 2024-03-22 at 11:37 -0700, José Roberto de Souza wrote:
> > > > On Fri, 2024-03-22 at 09:57 -0700, José Roberto de Souza wrote:
> > > > > On Tue, 2024-03-19 at 07:36 -0700, Zhanjun Dong wrote:
> > > > > > Port GuC based register capture for error capture from i915 to Xe.
> > > > > > 
> > > > > > There are 3 parts inside:
> > > > > > . Prepare for capture registers
> > > > > >      There is a bo create at guc ads init time, that is very early
> > > > > >      and engi  ne map is not ready, make it hard to calculate the
> > > > > >      capture buffer size, new function created for worst case size
> > > > > >      caluation. Other than that, this part basically follows the i915
> > > > > >      design.
> > > > > > . Process capture notification message
> > > > > >      Basically follows i915 design
> > > > > > . Sysfs command process.
> > > > > >      Xe switched to devcoredump, adopted command line process with
> > > > > >      captured node list.
> > > > > 
> > > > > Just some notes after trying this to debug a issue with this series:
> > > > > 
> > > > > - CONFIG_DRM_XE_CAPTURE_ERROR is not a good name for a config to 
> > > > > enable GuC based capture.
> > > > >     - in my opinion this should not even exist, always capture with 
> > > > > GuC if xe_device_uc_enabled() is true.
> 
> 3 reasons:
> 1. This GuC based register error capture is for debug purpose, does it 
> still important for production stage? I want to leave it an option for 
> users who don't need it.

definitely needed in production

> 2. With this opiton disabled, extra memory could be saved. It might not 
> be a big deal for PC, while might be valuable for customers who want to 
> save few 10KBs to 100KB RAM usage.

not relevant amount, having a accurate register dump is more important as allow us to debug issues reproduced by costumers.

> 3. The feature with an opiton is what we have in i915, where the feature 
> is ported from. So make it the same style to avoid users migrated from 
> i915 complains about it.
> 
> > > > > 
> > > > > - coredump don't have some registers like SAMPLER_INSTDONE, 
> > > > > ROW_INSTDONE, XEHPG_INSTDONE_GEOM_SVG...
> > 
> > Yes, those steering register print out is missing.
> > 
> > > > > Here the output in LNL:
> > > > 
> > > > ah this could explain...
> > > > 
> > > > [  152.189386] xe 0000:00:02.0: 
> > > > [drm:xe_hw_engine_snapshot_from_capture [xe]] GT0: GuC error capture 
> > > > is empty, take snapshot from engine.
> > > > [  152.190215] xe 0000:00:02.0: [drm] Xe device coredump has been 
> > > > created
> > > > 
> > > > but both code paths needs to dump the same registers
> > 
> > I found you have another patch:
> > drm/xe/devcoredump: Print errno if VM snapshot was not captured
> > I will take a look.
> > 
> > > 
> > > Hi Zhanjun
> > > 
> > > Mesa team needs the INSTDONE registers to debug issues and I have 
> > > patches implementing the dump of this registers.
> > > Do you mind if I send this patches the list?
> > 
> > Sure, please send it.
> > 
> > > 
> > > I guess this GuC based register capture will need another version and 
> > > someone with experience on GuC to review, so it will take a while...
> > > 
> > > > 
> > > > > 
> > > > > **** HW Engines ****
> > > > > rcs0 (physical), logical instance=0
> > > > >     Forcewake: domain 0x2, ref 1
> > > > >     HWSTAM: 0xffffffff
> > > > >     RING_HWS_PGA: 0x01693000
> > > > >     RING_START: 0x029d5000
> > > > >     RING_HEAD: 0x00001914
> > > > >     RING_TAIL: 0x00001968
> > > > >     RING_CTL: 0x00003001
> > > > >     RING_MI_MODE: 0x00001000
> > > > >     RING_MODE: 0x00000008
> > > > >     RING_IMR: 0x00000000
> > > > >     RING_ESR: 0x00000000
> > > > >     RING_EMR: 0xffffffff
> > > > >     RING_EIR: 0x00000000
> > > > >     IPEHR: 0x7a000004
> > > > >     ACTHD: 0x0000effeffff9208
> > > > >     RING_BBADDR: 0x0000effeffff9209
> > > > >     RING_DMA_FADD: 0x00000000029d6968
> > > > >     RING_EXECLIST_STATUS: 0x000000004000279c
> > > > >     RING_EXECLIST_SQ_CONTENTS: 0x00000000029d9719
> > > > > 
> > > > > - a lot of style issues, see 'CI.checkpatch'
> 
> After review the CI.checkpatch output, all style issues reported is by 
> purpose.
> 
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
> > > > > > Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
> > > > > > 
> > > > > > Changes from prior revs:
> > > > > >    v6:-  Change hardcoded register snapshot fill to follow mapping 
> > > > > > tables
> > > > > >          When capture is empty, take snapshot from engine
> > > > > >    v5:-  Split dss helper code out as an standalone patch
> > > > > >          Remove old platform registers definition.
> > > > > >          Split register map table to 32 and 64bit each
> > > > > >    v4:-  Move register map table to xe_hw_engine.c
> > > > > >    v3:-  Remove condition compilation in code
> > > > > >    v2:-  Split into multiple chunks
> > > > > > 
> > > > > > Zhanjun Dong (8):
> > > > > >    drm/xe/guc: Add kconfig for GuC based register capture
> > > > > >    drm/xe/guc: Update GuC ADS size for error capture
> > > > > >    drm/xe/guc: Add XE_LP steered register lists
> > > > > >    drm/xe/guc: Add capture size check in GuC log buffer
> > > > > >    drm/xe/guc: Check sizing of guc_capture output
> > > > > >    drm/xe/guc: Extract GuC error capture lists on G2H notification
> > > > > >    drm/xe/guc: Pre-allocate output nodes for extraction
> > > > > >    drm/xe/guc: Plumb GuC-capture into dev coredump
> > > > > > 
> > > > > >   drivers/gpu/drm/xe/Kconfig               |   11 +
> > > > > >   drivers/gpu/drm/xe/Makefile              |    1 +
> > > > > >   drivers/gpu/drm/xe/abi/guc_actions_abi.h |    7 +
> > > > > >   drivers/gpu/drm/xe/regs/xe_gt_regs.h     |    5 +
> > > > > >   drivers/gpu/drm/xe/xe_gt_printk.h        |    3 +
> > > > > >   drivers/gpu/drm/xe/xe_guc.c              |    6 +
> > > > > >   drivers/gpu/drm/xe/xe_guc_ads.c          |  229 +++-
> > > > > >   drivers/gpu/drm/xe/xe_guc_ads_types.h    |    2 +
> > > > > >   drivers/gpu/drm/xe/xe_guc_capture.c      | 1332 
> > > > > > ++++++++++++++++++++++
> > > > > >   drivers/gpu/drm/xe/xe_guc_capture.h      |   21 +
> > > > > >   drivers/gpu/drm/xe/xe_guc_capture_fwif.h |  221 ++++
> > > > > >   drivers/gpu/drm/xe/xe_guc_ct.c           |    2 +
> > > > > >   drivers/gpu/drm/xe/xe_guc_fwif.h         |   68 ++
> > > > > >   drivers/gpu/drm/xe/xe_guc_log.c          |  179 +++
> > > > > >   drivers/gpu/drm/xe/xe_guc_log.h          |   15 +
> > > > > >   drivers/gpu/drm/xe/xe_guc_log_types.h    |   26 +
> > > > > >   drivers/gpu/drm/xe/xe_guc_submit.c       |   21 +-
> > > > > >   drivers/gpu/drm/xe/xe_guc_submit.h       |    3 +
> > > > > >   drivers/gpu/drm/xe/xe_guc_types.h        |    2 +
> > > > > >   drivers/gpu/drm/xe/xe_hw_engine.c        |  261 ++++-
> > > > > >   drivers/gpu/drm/xe/xe_hw_engine.h        |    4 +
> > > > > >   drivers/gpu/drm/xe/xe_hw_engine_types.h  |  117 +-
> > > > > >   22 files changed, 2421 insertions(+), 115 deletions(-)
> > > > > >   create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.c
> > > > > >   create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.h
> > > > > >   create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_fwif.h
> > > > > > 
> > > > > 
> > > > 
> > > 



More information about the Intel-xe mailing list