[PATCH 4/4] drm/xe/xe_drm_client: Add per drm client reset stats

Simona Vetter simona.vetter at ffwll.ch
Wed Feb 19 13:45:55 UTC 2025


On Tue, Feb 18, 2025 at 06:45:30PM +0000, Tvrtko Ursulin wrote:
> 
> On 14/02/2025 20:37, Jonathan Cavitt wrote:
> > Add a counter to xe_drm_client that tracks the number of times the
> > engine has been reset since the drm client was created.
> > 
> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_drm_client.c | 2 ++
> >   drivers/gpu/drm/xe/xe_drm_client.h | 2 ++
> >   drivers/gpu/drm/xe/xe_guc_submit.c | 4 +++-
> >   3 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
> > index f15560d0b6ff..ecd2ce99fd19 100644
> > --- a/drivers/gpu/drm/xe/xe_drm_client.c
> > +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> > @@ -492,6 +492,8 @@ static void show_blames(struct drm_printer *p, struct drm_file *file)
> >   	client = xef->client;
> > +	drm_printf(p, "drm-client-reset-count:%u\n",
> > +		   atomic_read(&client->reset_count));
> 
> When drm- prefix is used keys have to be agreed in drm-usage-stats.rst.
> Therefore I suggest exploring across different drivers and seeing if anyone
> else would be interested. Maybe people who worked on the DRM common wedged
> event for example.

+1 on standardizing wedge/reset tracking across drivers more. I guess
ideally we could integrate this into one thing to make sure it's
consistently reported across all drivers.
-Sima

> 
> Or in cases when new stats are not universally useful drivers can prefix
> with xe-. We had this discussion recently with some panthor internal memory
> stats.
> 
> Regards,
> 
> Tvrtko
> 
> >   	drm_printf(p, "\n");
> >   	drm_printf(p, "- Exec queue ban list -\n");
> >   	spin_lock(&client->blame_lock);
> > diff --git a/drivers/gpu/drm/xe/xe_drm_client.h b/drivers/gpu/drm/xe/xe_drm_client.h
> > index d21fd0b90742..c35de675ccfa 100644
> > --- a/drivers/gpu/drm/xe/xe_drm_client.h
> > +++ b/drivers/gpu/drm/xe/xe_drm_client.h
> > @@ -53,6 +53,8 @@ struct xe_drm_client {
> >   	 * Protected by @blame_lock;
> >   	 */
> >   	struct list_head blame_list;
> > +	/** @reset_count: number of times this drm client has seen an engine reset */
> > +	atomic_t reset_count;
> >   #endif
> >   };
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index d9da5c89429e..8810abc8f04a 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1988,7 +1988,9 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
> >   		return -EPROTO;
> >   	hwe = q->hwe;
> > -
> > +#ifdef CONFIG_PROC_FS
> > +	atomic_inc(&q->xef->client->reset_count);
> > +#endif
> >   	xe_gt_info(gt, "Engine reset: engine_class=%s, logical_mask: 0x%x, guc_id=%d",
> >   		   xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id);
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-xe mailing list