<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 3/8/2022 10:00 PM, Sharma, Shashank
wrote:<br>
</div>
<blockquote type="cite" cite="mid:bc293ab7-db45-2b16-aeb8-291cffef8ba4@amd.com">Hello
Andrey
<br>
<br>
On 3/8/2022 5:26 PM, Andrey Grodzovsky wrote:
<br>
<blockquote type="cite">
<br>
On 2022-03-07 11:26, Shashank Sharma wrote:
<br>
<blockquote type="cite">From: Shashank Sharma
<a class="moz-txt-link-rfc2396E" href="mailto:shashank.sharma@amd.com"><shashank.sharma@amd.com></a>
<br>
<br>
This patch adds a work function, which will get scheduled
<br>
in event of a GPU reset, and will send a uevent to user with
<br>
some reset context infomration, like a PID and some flags.
<br>
</blockquote>
<br>
<br>
Where is the actual scheduling of the work function ? Shouldn't
<br>
there be a patch for that too ?
<br>
<br>
</blockquote>
<br>
Yes, Amar is working on that patch, on top of these patches. They
should be out soon. I thought it was a good idea to get quick
feedback on the basic patches before we build something on top of
it.
<br>
<br>
</blockquote>
<p>schedule_work() will be called in the function
amdgpu_do_asic_reset () <br>
</p>
<p>after getting vram_lost info:<br>
</p>
<p>vram_lost = amdgpu_device_check_vram_lost(tmp_adev);</p>
<p>update amdgpu_reset_event_ctx and call schedule_work()</p>
<ul>
<li>vram_lost</li>
<li>reset_context->job->vm->task_info.process_name</li>
<li>reset_context->job->vm->task_info.pid</li>
</ul>
Regards,<br>
S.Amarnath<br>
<blockquote type="cite" cite="mid:bc293ab7-db45-2b16-aeb8-291cffef8ba4@amd.com">- Shashank
<br>
<br>
<blockquote type="cite">Andrey
<br>
<br>
<br>
<blockquote type="cite">
<br>
The userspace can do some recovery and post-processing work
<br>
based on this event.
<br>
<br>
V2:
<br>
- Changed the name of the work to gpu_reset_event_work
<br>
(Christian)
<br>
- Added a structure to accommodate some additional information
<br>
(like a PID and some flags)
<br>
<br>
Cc: Alexander Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexander.deucher@amd.com"><alexander.deucher@amd.com></a>
<br>
Cc: Christian Koenig <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a>
<br>
Signed-off-by: Shashank Sharma <a class="moz-txt-link-rfc2396E" href="mailto:shashank.sharma@amd.com"><shashank.sharma@amd.com></a>
<br>
---
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19
+++++++++++++++++++
<br>
2 files changed, 26 insertions(+)
<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
<br>
index d8b854fcbffa..7df219fe363f 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
<br>
@@ -813,6 +813,11 @@ struct amd_powerplay {
<br>
#define AMDGPU_RESET_MAGIC_NUM 64
<br>
#define AMDGPU_MAX_DF_PERFMONS 4
<br>
#define AMDGPU_PRODUCT_NAME_LEN 64
<br>
+struct amdgpu_reset_event_ctx {
<br>
+ uint64_t pid;
<br>
+ uint32_t flags;
<br>
+};
<br>
+
<br>
struct amdgpu_device {
<br>
struct device *dev;
<br>
struct pci_dev *pdev;
<br>
@@ -1063,6 +1068,7 @@ struct amdgpu_device {
<br>
int asic_reset_res;
<br>
struct work_struct xgmi_reset_work;
<br>
+ struct work_struct gpu_reset_event_work;
<br>
struct list_head reset_list;
<br>
long gfx_timeout;
<br>
@@ -1097,6 +1103,7 @@ struct amdgpu_device {
<br>
pci_channel_state_t pci_channel_state;
<br>
struct amdgpu_reset_control *reset_cntl;
<br>
+ struct amdgpu_reset_event_ctx reset_event_ctx;
<br>
uint32_t
ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
<br>
bool ram_is_direct_mapped;
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
<br>
index ed077de426d9..c43d099da06d 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
<br>
@@ -73,6 +73,7 @@
<br>
#include <linux/pm_runtime.h>
<br>
#include <drm/drm_drv.h>
<br>
+#include <drm/drm_sysfs.h>
<br>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
<br>
MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
<br>
@@ -3277,6 +3278,23 @@ bool
amdgpu_device_has_dc_support(struct amdgpu_device *adev)
<br>
return
amdgpu_device_asic_has_dc_support(adev->asic_type);
<br>
}
<br>
+static void amdgpu_device_reset_event_func(struct work_struct
*__work)
<br>
+{
<br>
+ struct amdgpu_device *adev = container_of(__work, struct
amdgpu_device,
<br>
+ gpu_reset_event_work);
<br>
+ struct amdgpu_reset_event_ctx *event_ctx =
&adev->reset_event_ctx;
<br>
+
<br>
+ /*
<br>
+ * A GPU reset has happened, indicate the userspace and
pass the
<br>
+ * following information:
<br>
+ * - pid of the process involved,
<br>
+ * - if the VRAM is valid or not,
<br>
+ * - indicate that userspace may want to collect the
ftrace event
<br>
+ * data from the trace event.
<br>
+ */
<br>
+ drm_sysfs_reset_event(&adev->ddev,
event_ctx->pid, event_ctx->flags);
<br>
+}
<br>
+
<br>
static void amdgpu_device_xgmi_reset_func(struct work_struct
*__work)
<br>
{
<br>
struct amdgpu_device *adev =
<br>
@@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct
amdgpu_device *adev,
<br>
amdgpu_device_delay_enable_gfx_off);
<br>
INIT_WORK(&adev->xgmi_reset_work,
amdgpu_device_xgmi_reset_func);
<br>
+ INIT_WORK(&adev->gpu_reset_event_work,
amdgpu_device_reset_event_func);
<br>
adev->gfx.gfx_off_req_count = 1;
<br>
adev->pm.ac_power = power_supply_is_system_supplied()
> 0;
<br>
</blockquote>
</blockquote>
</blockquote>
</body>
</html>