[PATCH v10 0/8] Add support for EU stall sampling

Harish Chegondi harish.chegondi at intel.com
Tue Feb 18 19:53:50 UTC 2025


The following patch series add support for EU stall sampling,
a new hardware feature first added in PVC and is being supported
in XE2 and later architecture GPUs. This feature would enable
capturing of EU stall data which include the IP address of the
instruction stalled and various stall reason counts.

Support for this feature is being added into Mesa:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142

New IGT tests for EU stall sampling are being added:
https://patchwork.freedesktop.org/series/143030/

This patch series has undergone basic testing with the new IGT tests.

Issues that need investigation:
1. Blocked reads with small user buffers may be blocked even with EU
stall data in the kernel buffer as a previous read has set pollin to
false even when kernel buffer has data that could be read but the user
buffer is too small to read all the data.

Thank You.

v10 a. Fixed error rewinding code
    b. Used cancel_delayed_work_sync() instead of flush_delayed_work()
    c. Replaced per xecore lock with a lock for all the xecore buffers
    d. Remove function description for static functions.
    e. Use extension number while parsing chain of extensions.
    f. Moved code around as per review feedback
v9: a. Split the big patch in v8 into two patches
    b. Moved all drop data handling code into one patch
    c. Several other code improvements as mentioned in the patches
v8: a. Used div_u64() instead of / to fix 32-bit build issue.
    b. Changed copyright year in new files to 2025.
    c. Renamed struct drm_xe_eu_stall_data_pvc to struct xe_eu_stall_data_pvc
    d. Renamed struct drm_xe_eu_stall_data_xe2 to struct xe_eu_stall_data_xe2

v7: a. Renamed input property DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT
       to DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS to be consistent with
       OA. Renamed the corresponding internal variables.
    b. Fixed some commit messages based on review feedback.
    c. Changed sampling_rates from a pointer to flexible array.

v6: a. Changed the uAPI input to accept sampling rate in GPU cycles
       instead of sampling rate multiplier.
    b. Fix buffer wrap around over write bug (Matt Olson).
    c. Include EU stall sampling rates information and per XeCore buffer size in the query information.

v5: Addressed review feedback from v4 including
    a. Removed DRM_XE_EU_STALL_PROP_POLL_PERIOD from the uAPI (Ashutosh)
    b. Separated the patches for Xe_HPC and Xe2 (Matt R)
    c. Moved read() returning -EIO into a separate patch
    d. Removed spinlocks around set_bit() and clear_bit() (Matt R)
    e. Renamed several variables, structures and enums (Ashutosh and
Matt R)
    f. Addressed other review feedback.
v4: Addressed review feedback from v3 including
    a. Split the patch into multiple patches (Matt R)
    b. Added a new device query to get EU stall info (Ashutosh)
    c. Renamed all Dss to xecore (Matt R)
    d. Removed buffer size and disable at open input properties. (Matt R)
    e. Removed the "_SHIFT" macros (Matt R)
    f. Allocate the EU stall buffer only on system memory.
    g. Changed the work arounds to OOB (Matt R)
    h. Other review feedback.
v3: a. Removed data header and changed read() to return -EIO when data is dropped by the HW.
    b. Added a new DRM_XE_OBSERVATION_IOCTL_INFO to query EU stall data record info
    c. Added struct drm_xe_eu_stall_data_pvc and struct drm_xe_eu_stall_data_xe2
       to xe_drm.h. These declarations would help user space to parse the
       EU stall data
    d. Addressed other review comments from v2
v2: Rename xe perf layer as xe observation layer (Ashutosh)

Test-with: cover.1739901972.git.harish.chegondi at intel.com

Reviewed-by: Ben Olson <matthew.olson at intel.com>
Acked-by: Felix Degrood <felix.j.degrood at intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi at intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>

Harish Chegondi (8):
  drm/xe/topology: Add a function to find the index of the last enabled
    DSS in a mask
  drm/xe/uapi: Introduce API for EU stall sampling
  drm/xe/eustall: Add support to init, enable and disable EU stall
    sampling
  drm/xe/eustall: Add support to read() and poll() EU stall data
  drm/xe/eustall: Add support to handle dropped EU stall data
  drm/xe/eustall: Add EU stall sampling support for Xe2
  drm/xe/uapi: Add a device query to get EU stall sampling information
  drm/xe/eustall: Add workaround 22016596838 which applies to PVC.

 drivers/gpu/drm/xe/Makefile                |   1 +
 drivers/gpu/drm/xe/regs/xe_eu_stall_regs.h |  29 +
 drivers/gpu/drm/xe/xe_eu_stall.c           | 946 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_eu_stall.h           |  20 +
 drivers/gpu/drm/xe/xe_gt.c                 |   6 +
 drivers/gpu/drm/xe/xe_gt_topology.h        |  13 +
 drivers/gpu/drm/xe/xe_gt_types.h           |   3 +
 drivers/gpu/drm/xe/xe_observation.c        |  14 +
 drivers/gpu/drm/xe/xe_query.c              |  38 +
 drivers/gpu/drm/xe/xe_trace.h              |  33 +
 drivers/gpu/drm/xe/xe_wa_oob.rules         |   1 +
 include/uapi/drm/xe_drm.h                  |  74 ++
 12 files changed, 1178 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_eu_stall_regs.h
 create mode 100644 drivers/gpu/drm/xe/xe_eu_stall.c
 create mode 100644 drivers/gpu/drm/xe/xe_eu_stall.h

-- 
2.48.1



More information about the Intel-xe mailing list