[PATCH 00/21] GPU debug support (eudebug)

Matthew Brost matthew.brost at intel.com
Sat Jul 27 05:23:26 UTC 2024


On Fri, Jul 26, 2024 at 05:07:57PM +0300, Mika Kuoppala wrote:
> Hi,
> 
> We (Intel eudebug kernel team) would like to submit this
> patchset to enable debug support for Intel GPU devices.
> 
> The aim is to allow Level-Zero + GDB (or some other tool)
> to attach to xe driver in order to receive information
> about relevant driver resources, hardware events and to allow debug
> related hardware control. End goal is full debug capability
> of supported Intel hardware, see [4].
> 
> Debugger first opens a connection to a device through
> drm ioctl with debug target process as a pid. This will
> return a dedicated file descriptor used for debugging
> for further events and control.
> 
> Xe internal resources that are considered essential
> to debugger functionality are relayed as events to the
> debugger. On debugger connection, all existing resources
> are relayed to debugger (discovery) and from that
> point onwards, as they are created/destroyed.
> 
> uapi is extended to allow an application/lib to provide
> debug metadata information. These are relayed as events
> to the debugger so it can decode the program state.
> 
> Along with the resource and metadata events, an event for
> hardware exceptions, called EU attention, is provided.
> The debugger, with the assistance of an exception handling
> program called System Routine (short: SIP) provided
> with the pipeline setup, can determine which specific
> EU/thread and instruction encountered the breakpoint
> or other exceptions.
> 
> EU controlling ioctl interface is also introduced where
> debugger can manipulate individual threads of the currently
> active workload. This interface enables the debugger to
> interrupt all threads on demand, check their current state
> and resume them individually.
> 
> The intent is to provide a similar but not API compatible
> functionality as in out-of-tree i915 debugger support:
> https://dgpu-docs.intel.com/driver/gpu-debugging.html
> 
> For xe the aim is to have all support merged in upstream,
> starting with this series. With Lunarlake being first targetted
> hardware.
> 
> I have split the events into xe_drm_eudebug.h instead
> pushing everything into xe_drm.h, in order to help
> distinguish what is controlled by which descriptor.
> If it's through the original xe fd, it is in xe_drm.h and
> if it's through the opened debugger connection fd, it
> is in xe_drm_eudebug.h.
> 

Looking through the series, I do have question wrt to GPU fault and
eudebug. I don't see any interaction there. Without knowing eudebug
works, it seems like setting a break point on a GPU access to virtual
address is something a debugger would want. On a faulting device, this
is something we should be able to support. This really comes into play
once we have SVM as the UMD won't be issuing binds either. Curious about
your thoughts here.

If this something that required, in particular with SVM, this something
the SVM and eudebug teams need to collaborate on early to make sure both
designs work with each other.

Matt

> Latest code can be found in:
> [1] https://gitlab.freedesktop.org/miku/kernel/-/tree/eudebug-dev
> 
> With the associated IGT tests:
> [2] https://gitlab.freedesktop.org/cmanszew/igt-gpu-tools/-/tree/eudebug-dev
> 
> The user for this uapi:
> [3] https://github.com/intel/compute-runtime
> Event loop and thread control interaction can be found at:
> https://github.com/intel/compute-runtime/tree/master/level_zero/tools/source/debug/linux/xe
> And the wrappers in:
> https://github.com/intel/compute-runtime/tree/master/shared/source/os_interface/linux/xe
> https://github.com/intel/compute-runtime/blob/master/shared/source/os_interface/linux/xe/ioctl_helper_xe_debugger.cpp
> Note that the XE support is disabled by default and you will need
> NEO_ENABLE_XE_EU_DEBUG_SUPPORT enabled in order to test.
> 
> GDB support:
> [4]: https://sourceware.org/pipermail/gdb-patches/2024-July/210264.html
> 
> Thank you in advance for any comments and insight.
> 
> 
> Andrzej Hajda (1):
>   drm/xe/eudebug: implement userptr_vma access
> 
> Christoph Manszewski (3):
>   drm/xe/eudebug: Add vm bind and vm bind ops
>   drm/xe/eudebug: Dynamically toggle debugger functionality
>   drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test
> 
> Dominik Grzegorzek (10):
>   drm/xe: Export xe_hw_engine's mmio accessors
>   drm/xe: Move and export xe_hw_engine lookup.
>   drm/xe/eudebug: Introduce exec_queue events
>   drm/xe/eudebug: hw enablement for eudebug
>   drm/xe: Add EUDEBUG_ENABLE exec queue property
>   drm/xe/eudebug: Introduce per device attention scan worker
>   drm/xe/eudebug: Introduce EU control interface
>   drm/xe: Debug metadata create/destroy ioctls
>   drm/xe: Attach debug metadata to vma
>   drm/xe/eudebug: Add debug metadata support for xe_eudebug
> 
> Jonathan Cavitt (1):
>   drm/xe/eudebug: Use ptrace_may_access for xe_eudebug_attach
> 
> Mika Kuoppala (6):
>   drm/xe/eudebug: Introduce eudebug support
>   kernel: export ptrace_may_access
>   drm/xe/eudebug: Introduce discovery for resources
>   drm/xe/eudebug: Add UFENCE events with acks
>   drm/xe/eudebug: vm open/pread/pwrite
>   drm/xe/eudebug: Implement vm_bind_op discovery
> 
>  drivers/gpu/drm/xe/Makefile                  |    5 +-
>  drivers/gpu/drm/xe/regs/xe_engine_regs.h     |    8 +
>  drivers/gpu/drm/xe/regs/xe_gt_regs.h         |   43 +
>  drivers/gpu/drm/xe/tests/xe_eudebug.c        |  170 +
>  drivers/gpu/drm/xe/tests/xe_live_test_mod.c  |    2 +
>  drivers/gpu/drm/xe/xe_debug_metadata.c       |  125 +
>  drivers/gpu/drm/xe/xe_debug_metadata.h       |   25 +
>  drivers/gpu/drm/xe/xe_debug_metadata_types.h |   28 +
>  drivers/gpu/drm/xe/xe_device.c               |   47 +-
>  drivers/gpu/drm/xe/xe_device_types.h         |   45 +
>  drivers/gpu/drm/xe/xe_eudebug.c              | 3841 ++++++++++++++++++
>  drivers/gpu/drm/xe/xe_eudebug.h              |   51 +
>  drivers/gpu/drm/xe/xe_eudebug_types.h        |  326 ++
>  drivers/gpu/drm/xe/xe_exec.c                 |    2 +-
>  drivers/gpu/drm/xe/xe_exec_queue.c           |   80 +-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h     |    7 +
>  drivers/gpu/drm/xe/xe_gt_debug.c             |  152 +
>  drivers/gpu/drm/xe/xe_gt_debug.h             |   27 +
>  drivers/gpu/drm/xe/xe_hw_engine.c            |   39 +-
>  drivers/gpu/drm/xe/xe_hw_engine.h            |   11 +
>  drivers/gpu/drm/xe/xe_lrc.c                  |   16 +-
>  drivers/gpu/drm/xe/xe_lrc.h                  |    4 +-
>  drivers/gpu/drm/xe/xe_reg_sr.c               |   21 +-
>  drivers/gpu/drm/xe/xe_reg_sr.h               |    4 +-
>  drivers/gpu/drm/xe/xe_rtp.c                  |    2 +-
>  drivers/gpu/drm/xe/xe_rtp_types.h            |    1 +
>  drivers/gpu/drm/xe/xe_sync.c                 |   49 +-
>  drivers/gpu/drm/xe/xe_sync.h                 |    8 +-
>  drivers/gpu/drm/xe/xe_sync_types.h           |   26 +-
>  drivers/gpu/drm/xe/xe_vm.c                   |  227 +-
>  drivers/gpu/drm/xe/xe_vm_types.h             |   26 +
>  include/uapi/drm/xe_drm.h                    |   96 +-
>  include/uapi/drm/xe_drm_eudebug.h            |  226 ++
>  kernel/ptrace.c                              |    1 +
>  34 files changed, 5655 insertions(+), 86 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/tests/xe_eudebug.c
>  create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.c
>  create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.h
>  create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata_types.h
>  create mode 100644 drivers/gpu/drm/xe/xe_eudebug.c
>  create mode 100644 drivers/gpu/drm/xe/xe_eudebug.h
>  create mode 100644 drivers/gpu/drm/xe/xe_eudebug_types.h
>  create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.c
>  create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.h
>  create mode 100644 include/uapi/drm/xe_drm_eudebug.h
> 
> -- 
> 2.34.1
> 


More information about the Intel-xe mailing list