[Intel-xe] [PATCH V4 0/2] drm/xe: Count and report low level driver errors
Ofir Bitton
obitton at habana.ai
Sun Oct 1 09:00:21 UTC 2023
On 28/09/2023 2:16, Niranjana Vishwanathapura wrote:
> On Wed, Sep 27, 2023 at 07:59:17PM +0530, Tejas Upadhyay wrote:
>> This series adds low level driver error counter. Devided
>> into below patches:
>> 1. Add APIs to count different category of errors under Tile and GT
>> 2. Add counter increment at all existing error tap points
>>
>> Focus is not to add new error checks but maintain counter for errors
>> on existing errors which can create performance impact.
>>
>> TODO: Later on when netlink interface is ready, we will export these
>> counters through netlink interface.
>>
>
> I am not sure if this is a good idea.
> Aren't we making it kind of uapi? User can also get this information
> from dmesg. Besides we can add trace events if required.
> Is there any prior discussion of this design which you can point
> me to?
>
> Niranjana
>
>> Tejas Upadhyay (2):
>> drm/xe: Introduce low level driver error counting APIs
>> drm/xe: Update counter for low level driver errors
>>
>> drivers/gpu/drm/xe/xe_device_types.h | 9 +++++
>> drivers/gpu/drm/xe/xe_gt.c | 18 +++++++++
>> drivers/gpu/drm/xe/xe_gt.h | 3 ++
>> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 15 +++++--
>> drivers/gpu/drm/xe/xe_gt_types.h | 10 +++++
>> drivers/gpu/drm/xe/xe_guc.c | 15 ++++---
>> drivers/gpu/drm/xe/xe_guc_ct.c | 43 ++++++++++++--------
>> drivers/gpu/drm/xe/xe_guc_pc.c | 16 +++++---
>> drivers/gpu/drm/xe/xe_guc_submit.c | 44 +++++++++++++++------
>> drivers/gpu/drm/xe/xe_irq.c | 6 ++-
>> drivers/gpu/drm/xe/xe_reg_sr.c | 19 ++++++---
>> drivers/gpu/drm/xe/xe_tile.c | 18 +++++++++
>> drivers/gpu/drm/xe/xe_tile.h | 2 +
>> 13 files changed, 165 insertions(+), 53 deletions(-)
>>
>> --
>> 2.25.1
>>
Instead of defining a new uapi we can expose those counters through a
debugfs ioctl, you can see the discussion here:
https://patchwork.freedesktop.org/patch/556278/?series=123403&rev=1
Another option is to create a new debugfs node for dumping the counters,
as long as there is no actual necessity for the UMD to read those counters.
--
Ofir
More information about the Intel-xe
mailing list