<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div>On Wed, 2023-05-24 at 12:43 -0400, Rodrigo Vivi wrote:</div>
<blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
<div>On Wed, May 24, 2023 at 09:57:00PM +0530, Balasubramani Vivekanandan wrote:<br>
</div>
<blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
<div>On 24.05.2023 12:36, Himal Prasad Ghimiray wrote:<br>
</div>
<blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
<div>In case of gt reset failure, KMD notifies userspace about failure<br>
</div>
<div>via uevent. To validate this notification we need to ensure gt<br>
</div>
<div>reset fails and there is no mechanism to cause failure from hardware.<br>
</div>
<div>Hence added a debugfs which will cause fake reset failure.<br>
</div>
<div><br>
</div>
<div>v1(Rodrigo)<br>
</div>
<div>- Change the variable to fake_reset_failure_in_progress.<br>
</div>
<div>- Drop usage of READ_ONCE and WRITE_ONCE.<br>
</div>
<div>- Follow consistency for variable assignment. Either use<br>
</div>
<div> functions for all the assignments or don't use for any.<br>
</div>
<div><br>
</div>
<div>Cc: Rodrigo Vivi <<a href="mailto:rodrigo.vivi@intel.com">rodrigo.vivi@intel.com</a>><br>
</div>
<div>Signed-off-by: Himal Prasad Ghimiray <<a href="mailto:himal.prasad.ghimiray@intel.com">himal.prasad.ghimiray@intel.com</a>><br>
</div>
<div>---<br>
</div>
<div> drivers/gpu/drm/xe/xe_gt.c | 15 ++++++++++++++-<br>
</div>
<div> drivers/gpu/drm/xe/xe_gt_debugfs.c | 11 +++++++++++<br>
</div>
<div> drivers/gpu/drm/xe/xe_gt_types.h | 1 +<br>
</div>
<div> 3 files changed, 26 insertions(+), 1 deletion(-)<br>
</div>
<div><br>
</div>
<div>diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c<br>
</div>
<div>index 80d42c7c7cfa..b3ac8b3ab455 100644<br>
</div>
<div>--- a/drivers/gpu/drm/xe/xe_gt.c<br>
</div>
<div>+++ b/drivers/gpu/drm/xe/xe_gt.c<br>
</div>
<div>@@ -506,6 +506,9 @@ int xe_gt_init(struct xe_gt *gt)<br>
</div>
<div> int err;<br>
</div>
<div> int i;<br>
</div>
<div> <br>
</div>
<div>+ /*Fake reset failure should be disabled by default*/<br>
</div>
<div>+ gt->reset.fake_reset_failure_in_progress = false;<br>
</div>
<div>+<br>
</div>
<div> INIT_WORK(>->reset.worker, gt_reset_worker);<br>
</div>
<div> <br>
</div>
<div> for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i) {<br>
</div>
<div>@@ -601,8 +604,18 @@ static int gt_reset(struct xe_gt *gt)<br>
</div>
<div> xe_gt_info(gt, "reset started\n");<br>
</div>
<div> <br>
</div>
<div> xe_gt_sanitize(gt);<br>
</div>
<div>-<br>
</div>
<div> xe_device_mem_access_get(gt_to_xe(gt));<br>
</div>
<div>+<br>
</div>
<div>+ err = gt->reset.fake_reset_failure_in_progress;<br>
</div>
<div>+ if (err) {<br>
</div>
</blockquote>
<div><br>
</div>
<div>This doesn't look the right way. It should be something like<br>
</div>
<div><br>
</div>
<div>```<br>
</div>
<div>if (gt->reset.fake_reset_failure_in_progress) {<br>
</div>
<div> <br>
</div>
<div> err = -ECANCELED; or whatever appropriate<br>
</div>
<div> ...<br>
</div>
<div>```<br>
</div>
<div><br>
</div>
<blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
<div>+ xe_gt_info(gt, "Fake GT reset failure is in progress\n");<br>
</div>
<div>+<br>
</div>
<div>+ /* Disable fake reset failure for next call */<br>
</div>
<div>+ gt->reset.fake_reset_failure_in_progress = false;<br>
</div>
<div>+<br>
</div>
<div>+ goto err_msg;<br>
</div>
<div>+ }<br>
</div>
<div>+<br>
</div>
<div> err = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);<br>
</div>
<div> if (err)<br>
</div>
<div> goto err_msg;<br>
</div>
<div>diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c b/drivers/gpu/drm/xe/xe_gt_debugfs.c<br>
</div>
<div>index 8bf441e850a0..13d2a974bc00 100644<br>
</div>
<div>--- a/drivers/gpu/drm/xe/xe_gt_debugfs.c<br>
</div>
<div>+++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c<br>
</div>
<div>@@ -127,6 +127,16 @@ static int register_save_restore(struct seq_file *m, void *data)<br>
</div>
<div> return 0;<br>
</div>
<div> }<br>
</div>
<div> <br>
</div>
<div>+static int fake_reset_failure(struct seq_file *m, void *data)<br>
</div>
<div>+{<br>
</div>
<div>+ struct xe_gt *gt = node_to_gt(m->private);<br>
</div>
<div>+<br>
</div>
<div>+ gt->reset.fake_reset_failure_in_progress = true;<br>
</div>
<div>+ xe_gt_reset_async(gt);<br>
</div>
<div>+<br>
</div>
<div>+ return 0;<br>
</div>
<div>+}<br>
</div>
<div>+<br>
</div>
<div> static const struct drm_info_list debugfs_list[] = {<br>
</div>
<div> {"hw_engines", hw_engines, 0},<br>
</div>
<div> {"force_reset", force_reset, 0},<br>
</div>
<div>@@ -135,6 +145,7 @@ static const struct drm_info_list debugfs_list[] = {<br>
</div>
<div> {"steering", steering, 0},<br>
</div>
<div> {"ggtt", ggtt, 0},<br>
</div>
<div> {"register-save-restore", register_save_restore, 0},<br>
</div>
<div>+ {"fake_reset_failure", fake_reset_failure, 0},<br>
</div>
<div> };<br>
</div>
<div> <br>
</div>
<div> void xe_gt_debugfs_register(struct xe_gt *gt)<br>
</div>
<div>diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h<br>
</div>
<div>index 7c47d67aa8be..746d65c4aacc 100644<br>
</div>
<div>--- a/drivers/gpu/drm/xe/xe_gt_types.h<br>
</div>
<div>+++ b/drivers/gpu/drm/xe/xe_gt_types.h<br>
</div>
<div>@@ -168,6 +168,7 @@ struct xe_gt {<br>
</div>
<div> <br>
</div>
<div> /** @reset: state for GT resets */<br>
</div>
<div> struct {<br>
</div>
<div>+ bool fake_reset_failure_in_progress;<br>
</div>
<div> /**<br>
</div>
<div> * @worker: work so GT resets can done async allowing to reset<br>
</div>
<div> * code to safely flush all code paths<br>
</div>
</blockquote>
<div><br>
</div>
<div>@Rodrigo<br>
</div>
<div>This feature is purely to support test the driver and not something which<br>
</div>
<div>helps debugging. I am concerned whether such code should be part of<br>
</div>
<div>production build. I am thinking if this feature or any such patches in<br>
</div>
<div>future could be made part of a kernel config, which can be enabled in CI<br>
</div>
<div>when the driver is subjected to testing.<br>
</div>
</blockquote>
<div><br>
</div>
<div>I like this idea of having things like this behind a kernel config,<br>
</div>
<div>but we need to be careful to not simply break IGT or force IGT to always<br>
</div>
<div>depend on this flag.<br>
</div>
<div><br>
</div>
<div>Cc: Mauro Carvalho Chehab <<a href="mailto:mchehab@kernel.org">mchehab@kernel.org</a>><br>
</div>
<div>Cc: Francois Dugast <<a href="mailto:francois.dugast@intel.com">francois.dugast@intel.com</a>><br>
</div>
</blockquote>
<div><br>
</div>
<div>(actually doing the cc now)</div>
<div><br>
</div>
<blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
<div><br>
</div>
<blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
<div><br>
</div>
<div>Regards,<br>
</div>
<div>Bala<br>
</div>
<div><br>
</div>
<blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
<div>-- <br>
</div>
<div>2.25.1<br>
</div>
<div><br>
</div>
</blockquote>
</blockquote>
</blockquote>
<div><br>
</div>
<div><span></span></div>
</body>
</html>