[Intel-xe] [PATCH 00/14] Introduce xe_devcoredump.

Matthew Brost matthew.brost at intel.com
Tue May 2 08:11:32 UTC 2023


On Wed, Apr 26, 2023 at 04:56:59PM -0400, Rodrigo Vivi wrote:
> Xe needs to align with other drivers on the way that the error states are
> dumped, avoiding a Xe only error_state solution. The goal is to use devcoredump
> infrastructure to report error states, since it produces a standardized way
> by exposing a virtual and temporary /sys/class/devcoredump device.
> 
> The initial goal is to have the simple_error_state in the devcoredump
> so we start using the infrastructure.
> 
> But this is just a start point to start building a useful and
> organized crash dump, using standard infrastructure. Later this
> will be changed to have output that can be parsed by tools and
> used for error replay.

We are certainly missing the GuC log, it would also be really nice to
get the ftrace included too. Not sure if the later is easy, I know I
looked into this on the i915 and couldn't figure it out but this was a
while ago and admittedly didn't try all that hard.

Matt 

> 
> Later, when we are in-tree, the goal is to collaborate with devcoredump
> infrastructure with overall possible improvements, like multiple file support
> for better organization of the dumps, snapshot support, dmesg extra print,
> and whatever may make sense and help the overall infrastructure.
> 
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> 
> Rodrigo Vivi (14):
>   drm/xe: Fix print of RING_EXECLIST_SQ_CONTENTS_HI
>   drm/xe: Introduce the dev_coredump infrastructure.
>   drm/xe: Do not take any action if our device was removed.
>   drm/xe: Extract non mapped regions out of GuC CTB into its own struct.
>   drm/xe: Convert GuC CT print to snapshot capture and print.
>   drm/xe: Add GuC CT snapshot to xe_devcoredump.
>   drm/xe: Introduce guc_submit_types.h with relevant structs.
>   drm/xe: Convert GuC Engine print to snapshot capture and print.
>   drm/xe: Add GuC Submit Engine snapshot to xe_devcoredump.
>   drm/xe: Convert Xe HW Engine print to snapshot capture and print.
>   drm/xe: Add HW Engine snapshot to xe_devcoredump.
>   drm/xe: Limit CONFIG_DRM_XE_SIMPLE_ERROR_CAPTURE to itself.
>   drm/xe: Convert VM print to snapshot capture and print.
>   drm/xe: Add VM snapshot to xe_devcoredump.
> 
>  drivers/gpu/drm/xe/Kconfig                |   1 +
>  drivers/gpu/drm/xe/Makefile               |   1 +
>  drivers/gpu/drm/xe/regs/xe_engine_regs.h  |   3 +-
>  drivers/gpu/drm/xe/xe_devcoredump.c       | 227 ++++++++++++++++++
>  drivers/gpu/drm/xe/xe_devcoredump.h       |  22 ++
>  drivers/gpu/drm/xe/xe_devcoredump_types.h |  60 +++++
>  drivers/gpu/drm/xe/xe_device_types.h      |   4 +
>  drivers/gpu/drm/xe/xe_execlist.c          |   4 +-
>  drivers/gpu/drm/xe/xe_gt_debugfs.c        |   2 +-
>  drivers/gpu/drm/xe/xe_guc_ct.c            | 275 +++++++++++++++-------
>  drivers/gpu/drm/xe/xe_guc_ct.h            |   7 +-
>  drivers/gpu/drm/xe/xe_guc_ct_types.h      |  46 +++-
>  drivers/gpu/drm/xe/xe_guc_fwif.h          |  29 ---
>  drivers/gpu/drm/xe/xe_guc_submit.c        | 258 ++++++++++++++------
>  drivers/gpu/drm/xe/xe_guc_submit.h        |  10 +-
>  drivers/gpu/drm/xe/xe_guc_submit_types.h  | 155 ++++++++++++
>  drivers/gpu/drm/xe/xe_hw_engine.c         | 210 ++++++++++++-----
>  drivers/gpu/drm/xe/xe_hw_engine.h         |   8 +-
>  drivers/gpu/drm/xe/xe_hw_engine_types.h   |  78 ++++++
>  drivers/gpu/drm/xe/xe_pci.c               |   2 +
>  drivers/gpu/drm/xe/xe_vm.c                | 140 +++++++++--
>  drivers/gpu/drm/xe/xe_vm.h                |   6 +-
>  drivers/gpu/drm/xe/xe_vm_types.h          |  18 ++
>  23 files changed, 1288 insertions(+), 278 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_devcoredump.c
>  create mode 100644 drivers/gpu/drm/xe/xe_devcoredump.h
>  create mode 100644 drivers/gpu/drm/xe/xe_devcoredump_types.h
>  create mode 100644 drivers/gpu/drm/xe/xe_guc_submit_types.h
> 
> --
> 2.39.2


More information about the Intel-xe mailing list