[PATCH v8 0/7] Add support for EU stall sampling
Harish Chegondi
harish.chegondi at intel.com
Sat Jan 18 05:19:00 UTC 2025
On Thu, Jan 16, 2025 at 03:50:38PM -0600, Olson, Matthew wrote:
> On Wed, Jan 15, 2025 at 12:02:06PM -0800, Harish Chegondi wrote:
> > The following patch series add support for EU stall sampling,
> > a new hardware feature first added in PVC and is being supported
> > in XE2 and later architecture GPUs. This feature would enable
> > capturing of EU stall data which include the IP address of the
> > instruction stalled and various stall reason counts.
> >
> > Support for this feature is being added into Mesa:
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142
> >
> > New IGT tests for EU stall sampling are being added:
> > https://patchwork.freedesktop.org/series/143030/
> >
> > This patch series has undergone basic testing with the new IGT tests.
>
> Our profiler, iaprof, also consumes EU stalls using this patch series and
> generates AI flamegraphs using them. I've been testing mostly with v7 of this
> patch series since it came out, and have had no issues with it. The stalls are
> reasonable (both the reasons and the GPU address that they point to), and we've
> been able to poll them well enough to run our profiler in the background.
>
> I suspect the following was already discussed for one of the earlier versions of
> this series, but is it possible to have even lower sampling rates than what are
> currently provided? We're already selecting the slowest sampling rate (the last
That's the slowest sampling rate supported by the hardware.
> in the array), but CPU usage is too high for our liking, and we're still getting
During EU stall sampling, a timer thread in the driver keeps polling for
new EU stall data approximately once every 10 milliseconds. I am wondering if this
could be contributing to the CPU usage too.
> tens of millions of samples per minute.
>
> >
> > Thank You.
>
> No, thank *you!*
>
> Reviewed-by: Ben Olson <matthew.olson at intel.com>
>
> >
> > v8: a. Used div_u64() instead of / to fix 32-bit build issue.
> > b. Changed copyright year in new files to 2025.
> > c. Renamed struct drm_xe_eu_stall_data_pvc to struct xe_eu_stall_data_pvc
> > d. Renamed struct drm_xe_eu_stall_data_xe2 to struct xe_eu_stall_data_xe2
> >
> > v7: a. Renamed input property DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT
> > to DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS to be consistent with
> > OA. Renamed the corresponding internal variables.
> > b. Fixed some commit messages based on review feedback.
> > c. Changed sampling_rates from a pointer to flexible array.
> >
> > v6: a. Changed the uAPI input to accept sampling rate in GPU cycles
> > instead of sampling rate multiplier.
> > b. Fix buffer wrap around over write bug (Matt Olson).
> > c. Include EU stall sampling rates information and per XeCore buffer size in the query information.
> >
> > v5: Addressed review feedback from v4 including
> > a. Removed DRM_XE_EU_STALL_PROP_POLL_PERIOD from the uAPI (Ashutosh)
> > b. Separated the patches for Xe_HPC and Xe2 (Matt R)
> > c. Moved read() returning -EIO into a separate patch
> > d. Removed spinlocks around set_bit() and clear_bit() (Matt R)
> > e. Renamed several variables, structures and enums (Ashutosh and
> > Matt R)
> > f. Addressed other review feedback.
> > v4: Addressed review feedback from v3 including
> > a. Split the patch into multiple patches (Matt R)
> > b. Added a new device query to get EU stall info (Ashutosh)
> > c. Renamed all Dss to xecore (Matt R)
> > d. Removed buffer size and disable at open input properties. (Matt R)
> > e. Removed the "_SHIFT" macros (Matt R)
> > f. Allocate the EU stall buffer only on system memory.
> > g. Changed the work arounds to OOB (Matt R)
> > h. Other review feedback.
> > v3: a. Removed data header and changed read() to return -EIO when data is dropped by the HW.
> > b. Added a new DRM_XE_OBSERVATION_IOCTL_INFO to query EU stall data record info
> > c. Added struct drm_xe_eu_stall_data_pvc and struct drm_xe_eu_stall_data_xe2
> > to xe_drm.h. These declarations would help user space to parse the
> > EU stall data
> > d. Addressed other review comments from v2
> > v2: Rename xe perf layer as xe observation layer (Ashutosh)
> >
> > Cc: Felix Degrood <felix.j.degrood at intel.com>
> > Signed-off-by: Harish Chegondi <harish.chegondi at intel.com>
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
> >
> > Harish Chegondi (7):
> > drm/xe/topology: Add a function to find the index of the last enabled
> > DSS in a mask
> > drm/xe/uapi: Introduce API for EU stall sampling
> > drm/xe/eustall: Implement EU stall sampling APIs for Xe_HPC
> > drm/xe/eustall: Return -EIO error from read() if HW drops data
> > drm/xe/eustall: Add EU stall sampling support for Xe2
> > drm/xe/uapi: Add a device query to get EU stall sampling information
> > drm/xe/eustall: Add workaround 22016596838 which applies to PVC.
> >
> > drivers/gpu/drm/xe/Makefile | 1 +
> > drivers/gpu/drm/xe/regs/xe_eu_stall_regs.h | 29 +
> > drivers/gpu/drm/xe/xe_eu_stall.c | 1103 ++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_eu_stall.h | 61 ++
> > drivers/gpu/drm/xe/xe_gt.c | 6 +
> > drivers/gpu/drm/xe/xe_gt_topology.h | 13 +
> > drivers/gpu/drm/xe/xe_gt_types.h | 3 +
> > drivers/gpu/drm/xe/xe_observation.c | 14 +
> > drivers/gpu/drm/xe/xe_query.c | 38 +
> > drivers/gpu/drm/xe/xe_trace.h | 33 +
> > drivers/gpu/drm/xe/xe_wa_oob.rules | 1 +
> > include/uapi/drm/xe_drm.h | 74 ++
> > 12 files changed, 1376 insertions(+)
> > create mode 100644 drivers/gpu/drm/xe/regs/xe_eu_stall_regs.h
> > create mode 100644 drivers/gpu/drm/xe/xe_eu_stall.c
> > create mode 100644 drivers/gpu/drm/xe/xe_eu_stall.h
> >
> > --
> > 2.47.1
> >
More information about the Intel-xe
mailing list