<html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <div class="moz-cite-prefix">On 3/8/2022 10:00 PM, Sharma, Shashank wrote: </div> <blockquote type="cite" cite="mid:bc293ab7-db45-2b16-aeb8-291cffef8ba4@amd.com">Hello Andrey On 3/8/2022 5:26 PM, Andrey Grodzovsky wrote: <blockquote type="cite"> On 2022-03-07 11:26, Shashank Sharma wrote: <blockquote type="cite">From: Shashank Sharma <a class="moz-txt-link-rfc2396E" href="mailto:shashank.sharma@amd.com"><shashank.sharma@amd.com></a> This patch adds a work function, which will get scheduled in event of a GPU reset, and will send a uevent to user with some reset context infomration, like a PID and some flags. </blockquote> Where is the actual scheduling of the work function ? Shouldn't there be a patch for that too ? </blockquote> Yes, Amar is working on that patch, on top of these patches. They should be out soon. I thought it was a good idea to get quick feedback on the basic patches before we build something on top of it. </blockquote> schedule_work() will be called in the function amdgpu_do_asic_reset () after getting vram_lost info: vram_lost = amdgpu_device_check_vram_lost(tmp_adev); update amdgpu_reset_event_ctx and call schedule_work() <ul> <li>vram_lost</li> <li>reset_context->job->vm->task_info.process_name</li> <li>reset_context->job->vm->task_info.pid</li> </ul> Regards, S.Amarnath <blockquote type="cite" cite="mid:bc293ab7-db45-2b16-aeb8-291cffef8ba4@amd.com">- Shashank <blockquote type="cite">Andrey <blockquote type="cite"> The userspace can do some recovery and post-processing work based on this event. V2: - Changed the name of the work to gpu_reset_event_work (Christian) - Added a structure to accommodate some additional information (like a PID and some flags) Cc: Alexander Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexander.deucher@amd.com"><alexander.deucher@amd.com></a> Cc: Christian Koenig <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a> Signed-off-by: Shashank Sharma <a class="moz-txt-link-rfc2396E" href="mailto:shashank.sharma@amd.com"><shashank.sharma@amd.com></a> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index d8b854fcbffa..7df219fe363f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -813,6 +813,11 @@ struct amd_powerplay { #define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4 #define AMDGPU_PRODUCT_NAME_LEN 64 +struct amdgpu_reset_event_ctx { + uint64_t pid; + uint32_t flags; +}; + struct amdgpu_device { struct device *dev; struct pci_dev *pdev; @@ -1063,6 +1068,7 @@ struct amdgpu_device { int asic_reset_res; struct work_struct xgmi_reset_work; + struct work_struct gpu_reset_event_work; struct list_head reset_list; long gfx_timeout; @@ -1097,6 +1103,7 @@ struct amdgpu_device { pci_channel_state_t pci_channel_state; struct amdgpu_reset_control *reset_cntl; + struct amdgpu_reset_event_ctx reset_event_ctx; uint32_t ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE]; bool ram_is_direct_mapped; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index ed077de426d9..c43d099da06d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -73,6 +73,7 @@ #include <linux/pm_runtime.h> #include <drm/drm_drv.h> +#include <drm/drm_sysfs.h> MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev) return amdgpu_device_asic_has_dc_support(adev->asic_type); } +static void amdgpu_device_reset_event_func(struct work_struct *__work) +{ + struct amdgpu_device *adev = container_of(__work, struct amdgpu_device, + gpu_reset_event_work); + struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx; + + /* + * A GPU reset has happened, indicate the userspace and pass the + * following information: + * - pid of the process involved, + * - if the VRAM is valid or not, + * - indicate that userspace may want to collect the ftrace event + * data from the trace event. + */ + drm_sysfs_reset_event(&adev->ddev, event_ctx->pid, event_ctx->flags); +} + static void amdgpu_device_xgmi_reset_func(struct work_struct *__work) { struct amdgpu_device *adev = @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, amdgpu_device_delay_enable_gfx_off); INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func); + INIT_WORK(&adev->gpu_reset_event_work, amdgpu_device_reset_event_func); adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0; </blockquote> </blockquote> </blockquote> </body> </html>