[PATCH] drm/xe/debugfs: Make sysfs gt force reset synchronous

Poosa, Karthik karthik.poosa at intel.com
Wed Dec 27 12:59:39 UTC 2023


Hi Anshuman,

1. Can't force_reset be synchronous ?

2. regarding adding wait in get freq APIs,

there is already a flag 'pc->freq_ready' in xe_guc_pc_get_xxx_freq APIs, 
which returns -EAGAIN if reset in progress, instead of waiting, which is 
causing the test failures.

On 27-12-2023 16:06, Gupta, Anshuman wrote:
>
>> -----Original Message-----
>> From: Poosa, Karthik <karthik.poosa at intel.com>
>> Sent: Wednesday, December 27, 2023 2:02 PM
>> To: intel-xe at lists.freedesktop.org
>> Cc: Gupta, Anshuman <anshuman.gupta at intel.com>; Nilawar, Badal
>> <badal.nilawar at intel.com>; Brost, Matthew <matthew.brost at intel.com>;
>> Vivi, Rodrigo <rodrigo.vivi at intel.com>; Poosa, Karthik
>> <karthik.poosa at intel.com>
>> Subject: [PATCH] drm/xe/debugfs: Make sysfs gt force reset synchronous
>>
>> Wait for gt reset to complete before returning from force_reset sysfs call.
>> Without this igt test freq_reset_multiple fails sporadically in case xe_guc_pc is
>> not started.
>>
>> v2:
>> - Changed wait for completion to interruptible (Anshuman).
>> - Moved timeout to xe_gt.h (Anshuman).
>> - Created a debugfs for updating timeout (Rodrigo).
>>
>> Testcase: igt at xe_guc_pc@freq_reset_multiple
>> Signed-off-by: Karthik Poosa <karthik.poosa at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_gt.c         |  2 ++
>>   drivers/gpu/drm/xe/xe_gt_debugfs.c | 12 ++++++++++++
>>   drivers/gpu/drm/xe/xe_gt_types.h   |  6 ++++++
>>   3 files changed, 20 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c index
>> 3af2adec1295..47abb9336c58 100644
>> --- a/drivers/gpu/drm/xe/xe_gt.c
>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>> @@ -65,6 +65,7 @@ struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
>>
>>   	gt->tile = tile;
>>   	gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0);
>> +	init_completion(&gt->reset_done);
>>
>>   	return gt;
>>   }
>> @@ -633,6 +634,7 @@ static int gt_reset(struct xe_gt *gt)
>>   	xe_device_mem_access_put(gt_to_xe(gt));
>>   	XE_WARN_ON(err);
>>
>> +	complete(&gt->reset_done);
>>   	xe_gt_info(gt, "reset done\n");
>>
>>   	return 0;
>> diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c
>> b/drivers/gpu/drm/xe/xe_gt_debugfs.c
>> index c4b67cf09f8f..fbda886c8a95 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_debugfs.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c
>> @@ -58,8 +58,16 @@ static int hw_engines(struct seq_file *m, void *data)
>> static int force_reset(struct seq_file *m, void *data)  {
>>   	struct xe_gt *gt = node_to_gt(m->private);
>> +	struct xe_device *xe = gt_to_xe(gt);
>> +	long ret;
>>
>>   	xe_gt_reset_async(gt);
>> +	ret = wait_for_completion_interruptible_timeout(&gt->reset_done,
>> +
> This would defeat the purpose of xe_gt_reset_async(), as this will make force_reset
> synchronous , I think we need wait_for_completion_interruptible_timeout in  xe_gt_freq sysfs
> before reading the guc pc frequency.
> Something like below.
> guc_pc_freq_ready()
> {
> 	wait_for_completion_interruptible_timeout()
> }
>
> Thanks,
> Anshuman Gupta.
> 						msecs_to_jiffies(gt-
>>> reset_timeout_ms));
>> +	if (ret <= 0) {
>> +		drm_err(&xe->drm, "gt reset timed out/interrputed, ret
>> %ld\n", ret);
>> +		return -ETIMEDOUT;
>> +	}
>>
>>   	return 0;
>>   }
>> @@ -225,6 +233,10 @@ void xe_gt_debugfs_register(struct xe_gt *gt)
>>   		return;
>>   	}
>>
>> +	/* set a default timeout */
>> +	gt->reset_timeout_ms = 1000;
>> +	debugfs_create_u32("gt_reset_timeout_ms", 0600, root,
>> +					&gt->reset_timeout_ms);
>>   	/*
>>   	 * Allocate local copy as we need to pass in the GT to the debugfs
>>   	 * entry and drm_debugfs_create_files just references the
>> drm_info_list diff --git a/drivers/gpu/drm/xe/xe_gt_types.h
>> b/drivers/gpu/drm/xe/xe_gt_types.h
>> index f74684660475..824cefde20d2 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_types.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_types.h
>> @@ -358,6 +358,12 @@ struct xe_gt {
>>   		/** @oob: bitmap with active OOB workaroudns */
>>   		unsigned long *oob;
>>   	} wa_active;
>> +
>> +	/** @reset_done : completion for GT reset */
>> +	struct completion reset_done;
>> +
>> +	/** @gt_reset_timeout_ms : gt reset timeout in ms */
>> +	u32 reset_timeout_ms;
>>   };
>>
>>   #endif
>> --
>> 2.25.1


More information about the Intel-xe mailing list