[PATCH 00/21] GPU debug support (eudebug)
Mika Kuoppala
mika.kuoppala at linux.intel.com
Fri Jul 26 14:07:57 UTC 2024
Hi,
We (Intel eudebug kernel team) would like to submit this
patchset to enable debug support for Intel GPU devices.
The aim is to allow Level-Zero + GDB (or some other tool)
to attach to xe driver in order to receive information
about relevant driver resources, hardware events and to allow debug
related hardware control. End goal is full debug capability
of supported Intel hardware, see [4].
Debugger first opens a connection to a device through
drm ioctl with debug target process as a pid. This will
return a dedicated file descriptor used for debugging
for further events and control.
Xe internal resources that are considered essential
to debugger functionality are relayed as events to the
debugger. On debugger connection, all existing resources
are relayed to debugger (discovery) and from that
point onwards, as they are created/destroyed.
uapi is extended to allow an application/lib to provide
debug metadata information. These are relayed as events
to the debugger so it can decode the program state.
Along with the resource and metadata events, an event for
hardware exceptions, called EU attention, is provided.
The debugger, with the assistance of an exception handling
program called System Routine (short: SIP) provided
with the pipeline setup, can determine which specific
EU/thread and instruction encountered the breakpoint
or other exceptions.
EU controlling ioctl interface is also introduced where
debugger can manipulate individual threads of the currently
active workload. This interface enables the debugger to
interrupt all threads on demand, check their current state
and resume them individually.
The intent is to provide a similar but not API compatible
functionality as in out-of-tree i915 debugger support:
https://dgpu-docs.intel.com/driver/gpu-debugging.html
For xe the aim is to have all support merged in upstream,
starting with this series. With Lunarlake being first targetted
hardware.
I have split the events into xe_drm_eudebug.h instead
pushing everything into xe_drm.h, in order to help
distinguish what is controlled by which descriptor.
If it's through the original xe fd, it is in xe_drm.h and
if it's through the opened debugger connection fd, it
is in xe_drm_eudebug.h.
Latest code can be found in:
[1] https://gitlab.freedesktop.org/miku/kernel/-/tree/eudebug-dev
With the associated IGT tests:
[2] https://gitlab.freedesktop.org/cmanszew/igt-gpu-tools/-/tree/eudebug-dev
The user for this uapi:
[3] https://github.com/intel/compute-runtime
Event loop and thread control interaction can be found at:
https://github.com/intel/compute-runtime/tree/master/level_zero/tools/source/debug/linux/xe
And the wrappers in:
https://github.com/intel/compute-runtime/tree/master/shared/source/os_interface/linux/xe
https://github.com/intel/compute-runtime/blob/master/shared/source/os_interface/linux/xe/ioctl_helper_xe_debugger.cpp
Note that the XE support is disabled by default and you will need
NEO_ENABLE_XE_EU_DEBUG_SUPPORT enabled in order to test.
GDB support:
[4]: https://sourceware.org/pipermail/gdb-patches/2024-July/210264.html
Thank you in advance for any comments and insight.
Andrzej Hajda (1):
drm/xe/eudebug: implement userptr_vma access
Christoph Manszewski (3):
drm/xe/eudebug: Add vm bind and vm bind ops
drm/xe/eudebug: Dynamically toggle debugger functionality
drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test
Dominik Grzegorzek (10):
drm/xe: Export xe_hw_engine's mmio accessors
drm/xe: Move and export xe_hw_engine lookup.
drm/xe/eudebug: Introduce exec_queue events
drm/xe/eudebug: hw enablement for eudebug
drm/xe: Add EUDEBUG_ENABLE exec queue property
drm/xe/eudebug: Introduce per device attention scan worker
drm/xe/eudebug: Introduce EU control interface
drm/xe: Debug metadata create/destroy ioctls
drm/xe: Attach debug metadata to vma
drm/xe/eudebug: Add debug metadata support for xe_eudebug
Jonathan Cavitt (1):
drm/xe/eudebug: Use ptrace_may_access for xe_eudebug_attach
Mika Kuoppala (6):
drm/xe/eudebug: Introduce eudebug support
kernel: export ptrace_may_access
drm/xe/eudebug: Introduce discovery for resources
drm/xe/eudebug: Add UFENCE events with acks
drm/xe/eudebug: vm open/pread/pwrite
drm/xe/eudebug: Implement vm_bind_op discovery
drivers/gpu/drm/xe/Makefile | 5 +-
drivers/gpu/drm/xe/regs/xe_engine_regs.h | 8 +
drivers/gpu/drm/xe/regs/xe_gt_regs.h | 43 +
drivers/gpu/drm/xe/tests/xe_eudebug.c | 170 +
drivers/gpu/drm/xe/tests/xe_live_test_mod.c | 2 +
drivers/gpu/drm/xe/xe_debug_metadata.c | 125 +
drivers/gpu/drm/xe/xe_debug_metadata.h | 25 +
drivers/gpu/drm/xe/xe_debug_metadata_types.h | 28 +
drivers/gpu/drm/xe/xe_device.c | 47 +-
drivers/gpu/drm/xe/xe_device_types.h | 45 +
drivers/gpu/drm/xe/xe_eudebug.c | 3841 ++++++++++++++++++
drivers/gpu/drm/xe/xe_eudebug.h | 51 +
drivers/gpu/drm/xe/xe_eudebug_types.h | 326 ++
drivers/gpu/drm/xe/xe_exec.c | 2 +-
drivers/gpu/drm/xe/xe_exec_queue.c | 80 +-
drivers/gpu/drm/xe/xe_exec_queue_types.h | 7 +
drivers/gpu/drm/xe/xe_gt_debug.c | 152 +
drivers/gpu/drm/xe/xe_gt_debug.h | 27 +
drivers/gpu/drm/xe/xe_hw_engine.c | 39 +-
drivers/gpu/drm/xe/xe_hw_engine.h | 11 +
drivers/gpu/drm/xe/xe_lrc.c | 16 +-
drivers/gpu/drm/xe/xe_lrc.h | 4 +-
drivers/gpu/drm/xe/xe_reg_sr.c | 21 +-
drivers/gpu/drm/xe/xe_reg_sr.h | 4 +-
drivers/gpu/drm/xe/xe_rtp.c | 2 +-
drivers/gpu/drm/xe/xe_rtp_types.h | 1 +
drivers/gpu/drm/xe/xe_sync.c | 49 +-
drivers/gpu/drm/xe/xe_sync.h | 8 +-
drivers/gpu/drm/xe/xe_sync_types.h | 26 +-
drivers/gpu/drm/xe/xe_vm.c | 227 +-
drivers/gpu/drm/xe/xe_vm_types.h | 26 +
include/uapi/drm/xe_drm.h | 96 +-
include/uapi/drm/xe_drm_eudebug.h | 226 ++
kernel/ptrace.c | 1 +
34 files changed, 5655 insertions(+), 86 deletions(-)
create mode 100644 drivers/gpu/drm/xe/tests/xe_eudebug.c
create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.c
create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.h
create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata_types.h
create mode 100644 drivers/gpu/drm/xe/xe_eudebug.c
create mode 100644 drivers/gpu/drm/xe/xe_eudebug.h
create mode 100644 drivers/gpu/drm/xe/xe_eudebug_types.h
create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.c
create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.h
create mode 100644 include/uapi/drm/xe_drm_eudebug.h
--
2.34.1
More information about the Intel-xe
mailing list