<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 3/8/2022 10:00 PM, Sharma, Shashank
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:bc293ab7-db45-2b16-aeb8-291cffef8ba4@amd.com">Hello
      Andrey
      <br>
      <br>
      On 3/8/2022 5:26 PM, Andrey Grodzovsky wrote:
      <br>
      <blockquote type="cite">
        <br>
        On 2022-03-07 11:26, Shashank Sharma wrote:
        <br>
        <blockquote type="cite">From: Shashank Sharma
          <a class="moz-txt-link-rfc2396E" href="mailto:shashank.sharma@amd.com"><shashank.sharma@amd.com></a>
          <br>
          <br>
          This patch adds a work function, which will get scheduled
          <br>
          in event of a GPU reset, and will send a uevent to user with
          <br>
          some reset context infomration, like a PID and some flags.
          <br>
        </blockquote>
        <br>
        <br>
        Where is the actual scheduling of the work function ? Shouldn't
        <br>
        there be a patch for that too ?
        <br>
        <br>
      </blockquote>
      <br>
      Yes, Amar is working on that patch, on top of these patches. They
      should be out soon. I thought it was a good idea to get quick
      feedback on the basic patches before we build something on top of
      it.
      <br>
      <br>
    </blockquote>
    <p>schedule_work() will be called in the function
      amdgpu_do_asic_reset () <br>
    </p>
    <p>after getting vram_lost info:<br>
    </p>
    <p>vram_lost = amdgpu_device_check_vram_lost(tmp_adev);</p>
    <p>update  amdgpu_reset_event_ctx and call schedule_work()</p>
    <ul>
      <li>vram_lost</li>
      <li>reset_context->job->vm->task_info.process_name</li>
      <li>reset_context->job->vm->task_info.pid</li>
    </ul>
    Regards,<br>
    S.Amarnath<br>
    <blockquote type="cite" cite="mid:bc293ab7-db45-2b16-aeb8-291cffef8ba4@amd.com">- Shashank
      <br>
      <br>
      <blockquote type="cite">Andrey
        <br>
        <br>
        <br>
        <blockquote type="cite">
          <br>
          The userspace can do some recovery and post-processing work
          <br>
          based on this event.
          <br>
          <br>
          V2:
          <br>
          - Changed the name of the work to gpu_reset_event_work
          <br>
             (Christian)
          <br>
          - Added a structure to accommodate some additional information
          <br>
             (like a PID and some flags)
          <br>
          <br>
          Cc: Alexander Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexander.deucher@amd.com"><alexander.deucher@amd.com></a>
          <br>
          Cc: Christian Koenig <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a>
          <br>
          Signed-off-by: Shashank Sharma <a class="moz-txt-link-rfc2396E" href="mailto:shashank.sharma@amd.com"><shashank.sharma@amd.com></a>
          <br>
          ---
          <br>
            drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  7 +++++++
          <br>
            drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19
          +++++++++++++++++++
          <br>
            2 files changed, 26 insertions(+)
          <br>
          <br>
          diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
          b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
          <br>
          index d8b854fcbffa..7df219fe363f 100644
          <br>
          --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
          <br>
          +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
          <br>
          @@ -813,6 +813,11 @@ struct amd_powerplay {
          <br>
            #define AMDGPU_RESET_MAGIC_NUM 64
          <br>
            #define AMDGPU_MAX_DF_PERFMONS 4
          <br>
            #define AMDGPU_PRODUCT_NAME_LEN 64
          <br>
          +struct amdgpu_reset_event_ctx {
          <br>
          +    uint64_t pid;
          <br>
          +    uint32_t flags;
          <br>
          +};
          <br>
          +
          <br>
            struct amdgpu_device {
          <br>
                struct device            *dev;
          <br>
                struct pci_dev            *pdev;
          <br>
          @@ -1063,6 +1068,7 @@ struct amdgpu_device {
          <br>
                int asic_reset_res;
          <br>
                struct work_struct        xgmi_reset_work;
          <br>
          +    struct work_struct        gpu_reset_event_work;
          <br>
                struct list_head        reset_list;
          <br>
                long                gfx_timeout;
          <br>
          @@ -1097,6 +1103,7 @@ struct amdgpu_device {
          <br>
                pci_channel_state_t        pci_channel_state;
          <br>
                struct amdgpu_reset_control     *reset_cntl;
          <br>
          +    struct amdgpu_reset_event_ctx   reset_event_ctx;
          <br>
                uint32_t                       
          ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
          <br>
                bool                ram_is_direct_mapped;
          <br>
          diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
          <br>
          index ed077de426d9..c43d099da06d 100644
          <br>
          --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
          <br>
          +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
          <br>
          @@ -73,6 +73,7 @@
          <br>
            #include <linux/pm_runtime.h>
          <br>
            #include <drm/drm_drv.h>
          <br>
          +#include <drm/drm_sysfs.h>
          <br>
            MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
          <br>
            MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
          <br>
          @@ -3277,6 +3278,23 @@ bool
          amdgpu_device_has_dc_support(struct amdgpu_device *adev)
          <br>
                return
          amdgpu_device_asic_has_dc_support(adev->asic_type);
          <br>
            }
          <br>
          +static void amdgpu_device_reset_event_func(struct work_struct
          *__work)
          <br>
          +{
          <br>
          +    struct amdgpu_device *adev = container_of(__work, struct
          amdgpu_device,
          <br>
          +                          gpu_reset_event_work);
          <br>
          +    struct amdgpu_reset_event_ctx *event_ctx =
          &adev->reset_event_ctx;
          <br>
          +
          <br>
          +    /*
          <br>
          +     * A GPU reset has happened, indicate the userspace and
          pass the
          <br>
          +     * following information:
          <br>
          +     *    - pid of the process involved,
          <br>
          +     *    - if the VRAM is valid or not,
          <br>
          +     *    - indicate that userspace may want to collect the
          ftrace event
          <br>
          +     * data from the trace event.
          <br>
          +     */
          <br>
          +    drm_sysfs_reset_event(&adev->ddev,
          event_ctx->pid, event_ctx->flags);
          <br>
          +}
          <br>
          +
          <br>
            static void amdgpu_device_xgmi_reset_func(struct work_struct
          *__work)
          <br>
            {
          <br>
                struct amdgpu_device *adev =
          <br>
          @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct
          amdgpu_device *adev,
          <br>
                          amdgpu_device_delay_enable_gfx_off);
          <br>
                INIT_WORK(&adev->xgmi_reset_work,
          amdgpu_device_xgmi_reset_func);
          <br>
          +    INIT_WORK(&adev->gpu_reset_event_work,
          amdgpu_device_reset_event_func);
          <br>
                adev->gfx.gfx_off_req_count = 1;
          <br>
                adev->pm.ac_power = power_supply_is_system_supplied()
          > 0;
          <br>
        </blockquote>
      </blockquote>
    </blockquote>
  </body>
</html>