[Intel-xe] [RFC 00/25] xe-eudebug: GPU debugging interface

Mika Kuoppala mika.kuoppala at linux.intel.com
Mon Nov 6 11:18:20 UTC 2023


Hi,

This patchset is to allow l0/oneAPI GDB
(or some other debugger) to attach to xe driven
device.

The intent is to provide a similar but not API compatible
functionality as in:
https://dgpu-docs.intel.com/driver/gpu-debugging.html

Debugger first opens a connection to a device through
drm ioctl with debug target as a pid. This will return a
dedicated file descriptor used for debugging for further
events and control.

Xe internal resources that are considered essential
to debugger functionality are relayed as events to the
debugger. On debugger connection, all existing resources
are relayed to debugger (discovery) and from that
point onwards, as they are created/destroyed.

uapi is extended to allow an application/lib to provide
debug metadata information. These are relayed as events
to the debugger so it can decode the program state.
Debug metadata is also attached to capture dumps if
no debugger was connected during the gpu/hw exception.
For capturing, the goal is to allow external tools to
construct symbolic backtraces from error capture dumps.

Along with the resource and metadata events, we provide an event for
EU attention. The debugger can (with the assistance of
SIP program provided with pipeline setup), figure out which
exact eu/thread and instruction encountered the breakpoint or some
other exception.

Latest code can be found in:
https://gitlab.freedesktop.org/miku/kernel/-/tree/eudebug-dev

With the associated tests in:
https://gitlab.freedesktop.org/Dominisg/igt-gpu-tools/-/tree/eudebug-dev

All comments greatly appreciated,
 - Mika


Christoph Manszewski (1):
  drm/xe/xe_eudebug: Add vm bind events w/o acking

Dominik Grzegorzek (15):
  drm/xe/eudebug: Introduce exec_queue events
  drm/xe/eudebug: hw enablement for eudebug
  drm/xe: define XE_MAX_[DSS|EU]_FUSE_BITS globally
  drm/xe: Introduce for_each_dss_steering loop
  drm/xe: Export xe_hw_engine's mmio accessors
  drm/xe: Add runalone engine property
  drm/xe/eudebug: Introduce per device attention scan worker
  drm/xe: Move and export xe_hw_engine lookup.
  drm/xe/eudebug: Introduce EU control interface
  drm/xe: Debug metadata create/destroy ioctls
  xe/drm: Include debug_metadata in usercoredump
  drm/xe: Set vm debug metadata
  drm/xe/: Include attention in usercapture.
  RFC: xe/drm: attach debug metadata to vma
  drm/xe: Capture metadata attached to vma

Mika Kuoppala (9):
  drm/xe/eudebug: Introduce eudebug support
  drm/xe/eudebug: Introduce discovery for resources
  drm/xe: Extend drm_xe_vm_bind_op
  drm/xe/eudebug: uapi for client to debugger metadata (work in
    progress)
  drm/xe/eudebug: Create/destroy user metadata
  drm/xe/eudebug: Set vm metadata
  drm/xe/eudebug: User coredump support
  drm/xe/eudebug: Add per process coredumps
  drm/xe/eudebug: vm open/pread/pwrite

 drivers/gpu/drm/xe/Makefile                  |    6 +-
 drivers/gpu/drm/xe/regs/xe_engine_regs.h     |    4 +
 drivers/gpu/drm/xe/regs/xe_gt_regs.h         |   46 +
 drivers/gpu/drm/xe/xe_debug_metadata.c       |  120 +
 drivers/gpu/drm/xe/xe_debug_metadata.h       |   26 +
 drivers/gpu/drm/xe/xe_debug_metadata_types.h |   28 +
 drivers/gpu/drm/xe/xe_device.c               |   42 +-
 drivers/gpu/drm/xe/xe_device_types.h         |   40 +
 drivers/gpu/drm/xe/xe_eudebug.c              | 2552 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_eudebug.h              |   42 +
 drivers/gpu/drm/xe/xe_eudebug_types.h        |  271 ++
 drivers/gpu/drm/xe/xe_exec_queue.c           |   63 +-
 drivers/gpu/drm/xe/xe_gt_debug.c             |  150 +
 drivers/gpu/drm/xe/xe_gt_debug.h             |   27 +
 drivers/gpu/drm/xe/xe_gt_mcr.c               |   27 +
 drivers/gpu/drm/xe/xe_gt_mcr.h               |   15 +
 drivers/gpu/drm/xe/xe_gt_topology.c          |    3 -
 drivers/gpu/drm/xe/xe_gt_types.h             |    6 +-
 drivers/gpu/drm/xe/xe_hw_engine.c            |   37 +-
 drivers/gpu/drm/xe/xe_hw_engine.h            |   11 +
 drivers/gpu/drm/xe/xe_lrc.c                  |    8 +
 drivers/gpu/drm/xe/xe_lrc.h                  |    3 +
 drivers/gpu/drm/xe/xe_usercoredump.c         |  710 +++++
 drivers/gpu/drm/xe/xe_usercoredump.h         |   26 +
 drivers/gpu/drm/xe/xe_usercoredump_types.h   |  107 +
 drivers/gpu/drm/xe/xe_vm.c                   |  284 +-
 drivers/gpu/drm/xe/xe_vm.h                   |    5 +
 drivers/gpu/drm/xe/xe_vm_types.h             |   39 +
 include/uapi/drm/xe_drm.h                    |  107 +
 include/uapi/drm/xe_drm_tmp.h                |  172 ++
 30 files changed, 4914 insertions(+), 63 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.c
 create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.h
 create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_eudebug.c
 create mode 100644 drivers/gpu/drm/xe/xe_eudebug.h
 create mode 100644 drivers/gpu/drm/xe/xe_eudebug_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.c
 create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.h
 create mode 100644 drivers/gpu/drm/xe/xe_usercoredump.c
 create mode 100644 drivers/gpu/drm/xe/xe_usercoredump.h
 create mode 100644 drivers/gpu/drm/xe/xe_usercoredump_types.h
 create mode 100644 include/uapi/drm/xe_drm_tmp.h

-- 
2.34.1



More information about the Intel-xe mailing list