[Intel-xe] [PATCH V4 0/2] drm/xe: Count and report low level driver errors

Ofir Bitton obitton at habana.ai
Sun Oct 1 09:00:21 UTC 2023


On 28/09/2023 2:16, Niranjana Vishwanathapura wrote:
> On Wed, Sep 27, 2023 at 07:59:17PM +0530, Tejas Upadhyay wrote:
>> This series adds low level driver error counter. Devided
>> into below patches:
>> 1. Add APIs to count different category of errors under Tile and GT
>> 2. Add counter increment at all existing error tap points
>>
>> Focus is not to add new error checks but maintain counter for errors
>> on existing errors which can create performance impact.
>>
>> TODO: Later on when netlink interface is ready, we will export these
>> counters through netlink interface.
>>
> 
> I am not sure if this is a good idea.
> Aren't we making it kind of uapi? User can also get this information
> from dmesg. Besides we can add trace events if required.
> Is there any prior discussion of this design which you can point
> me to?
> 
> Niranjana
> 
>> Tejas Upadhyay (2):
>>  drm/xe: Introduce low level driver error counting APIs
>>  drm/xe: Update counter for low level driver errors
>>
>> drivers/gpu/drm/xe/xe_device_types.h        |  9 +++++
>> drivers/gpu/drm/xe/xe_gt.c                  | 18 +++++++++
>> drivers/gpu/drm/xe/xe_gt.h                  |  3 ++
>> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 15 +++++--
>> drivers/gpu/drm/xe/xe_gt_types.h            | 10 +++++
>> drivers/gpu/drm/xe/xe_guc.c                 | 15 ++++---
>> drivers/gpu/drm/xe/xe_guc_ct.c              | 43 ++++++++++++--------
>> drivers/gpu/drm/xe/xe_guc_pc.c              | 16 +++++---
>> drivers/gpu/drm/xe/xe_guc_submit.c          | 44 +++++++++++++++------
>> drivers/gpu/drm/xe/xe_irq.c                 |  6 ++-
>> drivers/gpu/drm/xe/xe_reg_sr.c              | 19 ++++++---
>> drivers/gpu/drm/xe/xe_tile.c                | 18 +++++++++
>> drivers/gpu/drm/xe/xe_tile.h                |  2 +
>> 13 files changed, 165 insertions(+), 53 deletions(-)
>>
>> -- 
>> 2.25.1
>>

Instead of defining a new uapi we can expose those counters through a 
debugfs ioctl, you can see the discussion here:
https://patchwork.freedesktop.org/patch/556278/?series=123403&rev=1

Another option is to create a new debugfs node for dumping the counters,
as long as there is no actual necessity for the UMD to read those counters.

--
Ofir


More information about the Intel-xe mailing list