<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2022-05-19 03:58, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:3abdd2a3-2cb8-fc8f-fd5d-a8a7cd5fc828@gmail.com">
      
      <br>
      <br>
      <div class="moz-cite-prefix">Am 18.05.22 um 16:24 schrieb Andrey
        Grodzovsky:<br>
      </div>
      <blockquote type="cite" cite="mid:ce60a983-9906-e33f-a2cc-6fedb958a124@amd.com">
        <p><br>
        </p>
        <div class="moz-cite-prefix">On 2022-05-18 02:07, Christian
          König wrote:<br>
        </div>
        <blockquote type="cite" cite="mid:1a7fd05f-490b-9999-5f0b-e84af26504a9@amd.com">Am
          17.05.22 um 21:20 schrieb Andrey Grodzovsky: <br>
          <blockquote type="cite">Problem: <br>
            During hive reset caused by command timing out on a ring <br>
            extra resets are generated by triggered by KFD which is <br>
            unable to accesses registers on the resetting ASIC. <br>
            <br>
            Fix: Rework GPU reset to actively stop any pending reset <br>
            works while another in progress. <br>
            <br>
            v2: Switch from generic list as was in v1[1] to eplicit <br>
            stopping of each reset request from each reset source <br>
            per each request submitter. <br>
          </blockquote>
          <br>
          Looks mostly good to me. <br>
          <br>
          Apart from the naming nit pick on patch #1 the only thing I
          couldn't of hand figure out is why you are using a delayed
          work everywhere instead of a just a work item. <br>
          <br>
          That needs a bit further explanation what's happening here. <br>
          <br>
          Christian. <br>
        </blockquote>
        <p><br>
        </p>
        <p>Check APIs for cancelling work vs. delayed work -</p>
        <p>For work_struct the only public API is this - <a class="moz-txt-link-freetext" href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fkernel%2Fworkqueue.c%23L3214&data=05%7C01%7Candrey.grodzovsky%40amd.com%7C29fac8e733b14ee2551b08da396d5093%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637885438944206302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6FMDy2apJCIkLEW8DE1t7sAfnDIJ%2Fm%2BCGo7ME5nRMzI%3D&reserved=0" originalsrc="https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3214" shash="fuy0PYQymaWnVrdD3sOhZLnqjUKO8VSpEEYhcosSkT1HuW7cRvY6RZF5ybgsnzOmJ2tMlglk7Oa+SzXu4sO2Hho0Krmu5/IcsAbeEADZ+W7J3SuYSBJaCLwwpSn/EH7pfpKGeHIglBMQ4PLkmcISc9J4wguUPxaiHx3FX92n6yM=" moz-do-not-send="true">https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3214</a>
          - blocking cancel.</p>
        <p>For delayed_work we have both blocking and non blocking
          public APIs - <br>
        </p>
        <p><a class="moz-txt-link-freetext" href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fkernel%2Fworkqueue.c%23L3295&data=05%7C01%7Candrey.grodzovsky%40amd.com%7C29fac8e733b14ee2551b08da396d5093%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637885438944206302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ci%2BVEDdbi7ub%2B9p4y5tth7SZvjBiRCkBm4%2FrhDavEnM%3D&reserved=0" originalsrc="https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295" shash="fFYuA/PjoBFLlKdXj0Oivc/vDEX78KLK90RDFaoCYwWlUbXrgZ+MqhJvXMet0hRhM8A8FFmMxWRW3rnjTRbt0OsdqQ0/fFn86hFR79AjYwtUlpraZMYQU5WE9k5uSZ9B3rxdeGUqEPlt4cSszC+YfhAi/t8wjdHq/U/skYPttIc=" moz-do-not-send="true">https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295</a></p>
        <p><a class="moz-txt-link-freetext" href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fkernel%2Fworkqueue.c%23L3295&data=05%7C01%7Candrey.grodzovsky%40amd.com%7C29fac8e733b14ee2551b08da396d5093%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637885438944206302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ci%2BVEDdbi7ub%2B9p4y5tth7SZvjBiRCkBm4%2FrhDavEnM%3D&reserved=0" originalsrc="https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295" shash="fFYuA/PjoBFLlKdXj0Oivc/vDEX78KLK90RDFaoCYwWlUbXrgZ+MqhJvXMet0hRhM8A8FFmMxWRW3rnjTRbt0OsdqQ0/fFn86hFR79AjYwtUlpraZMYQU5WE9k5uSZ9B3rxdeGUqEPlt4cSszC+YfhAi/t8wjdHq/U/skYPttIc=" moz-do-not-send="true">https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295</a></p>
        <p>I prefer not to go now into convincing core kernel people of
          exposing another interface for our own sake - from my past
          experience API changes in core code has slim chances and a lot
          of time spent on back and forth arguments.</p>
        <p>"<span style="color: rgb(32, 33, 36); font-family: arial,
            sans-serif; font-size: 16px; font-style: normal;
            font-variant-ligatures: normal; font-variant-caps: normal;
            letter-spacing: normal; orphans: 2; text-align: left;
            text-indent: 0px; text-transform: none; white-space: normal;
            widows: 2; word-spacing: 0px; -webkit-text-stroke-width:
            0px; background-color: rgb(255, 255, 255);
            text-decoration-thickness: initial; text-decoration-style:
            initial; text-decoration-color: initial;">If the mountain
            will not come to Muhammad, then Muhammad must go to the
            mountain" ;)</span><b style="color: rgb(32, 33, 36);
            font-family: arial, sans-serif; font-size: 16px; font-style:
            normal; font-variant-ligatures: normal; font-variant-caps:
            normal; letter-spacing: normal; orphans: 2; text-align:
            left; text-indent: 0px; text-transform: none; white-space:
            normal; widows: 2; word-spacing: 0px;
            -webkit-text-stroke-width: 0px; background-color: rgb(255,
            255, 255); text-decoration-thickness: initial;
            text-decoration-style: initial; text-decoration-color:
            initial;"><br>
          </b></p>
      </blockquote>
      <br>
      Ah, good point. The cancel_work() function was removed a few years
      ago:<br>
      <br>
      commit 6417250d3f894e66a68ba1cd93676143f2376a6f<br>
      Author: Stephen Hemminger <a class="moz-txt-link-rfc2396E" href="mailto:stephen@networkplumber.org" moz-do-not-send="true"><stephen@networkplumber.org></a><br>
      Date:   Tue Mar 6 19:34:42 2018 -0800<br>
      <br>
          workqueue: remove unused cancel_work()<br>
          <br>
          Found this by accident.<br>
          There are no usages of bare cancel_work() in current kernel
      source.<br>
          <br>
          Signed-off-by: Stephen Hemminger <a class="moz-txt-link-rfc2396E" href="mailto:stephen@networkplumber.org" moz-do-not-send="true"><stephen@networkplumber.org></a><br>
          Signed-off-by: Tejun Heo <a class="moz-txt-link-rfc2396E" href="mailto:tj@kernel.org" moz-do-not-send="true"><tj@kernel.org></a><br>
      <br>
      <br>
      Maybe just revert that patch, export the function and use it. I
      think there is plenty of justification for this.<br>
      <br>
      Thanks,<br>
      Christian.<br>
    </blockquote>
    <p><br>
    </p>
    <p>Ok - i will send them a patch - let's see what they say.</p>
    <p>Andrey</p>
    <p><br>
    </p>
    <blockquote type="cite" cite="mid:3abdd2a3-2cb8-fc8f-fd5d-a8a7cd5fc828@gmail.com"> <br>
      <blockquote type="cite" cite="mid:ce60a983-9906-e33f-a2cc-6fedb958a124@amd.com">
        <p><b style="color: rgb(32, 33, 36); font-family: arial,
            sans-serif; font-size: 16px; font-style: normal;
            font-variant-ligatures: normal; font-variant-caps: normal;
            letter-spacing: normal; orphans: 2; text-align: left;
            text-indent: 0px; text-transform: none; white-space: normal;
            widows: 2; word-spacing: 0px; -webkit-text-stroke-width:
            0px; background-color: rgb(255, 255, 255);
            text-decoration-thickness: initial; text-decoration-style:
            initial; text-decoration-color: initial;"> </b></p>
        <p>Andrey</p>
        <blockquote type="cite" cite="mid:1a7fd05f-490b-9999-5f0b-e84af26504a9@amd.com"> <br>
          <blockquote type="cite"> <br>
            [1] - <a class="moz-txt-link-freetext" href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20220504161841.24669-1-andrey.grodzovsky%40amd.com%2F&data=05%7C01%7Candrey.grodzovsky%40amd.com%7C29fac8e733b14ee2551b08da396d5093%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637885438944206302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NVufR4wmuP%2B6hE6VitPr5Jti1%2FcgxpRywcGpzqMIEnk%3D&reserved=0" originalsrc="https://lore.kernel.org/all/20220504161841.24669-1-andrey.grodzovsky@amd.com/" shash="zrNOsjV9bngBrJJI2hBLdeJZ3W0TApHJpzwO8AyVkzTTrFUsORyPq9+qnes8eYlLlYNF5eGF9xy6/lRZDZg25nd1A25eFM6woda0KbYtDrSCvmcplvwsLXCpw36JGS+4wi9Jxt52Jj8Gyh9Hu/QQh2nUCgPIelzGU/jQCFIfZk4=" moz-do-not-send="true">https://lore.kernel.org/all/20220504161841.24669-1-andrey.grodzovsky@amd.com/</a><br>
            <br>
            Andrey Grodzovsky (7): <br>
               drm/amdgpu: Cache result of last reset at reset domain
            level. <br>
               drm/amdgpu: Switch to delayed work from work_struct. <br>
               drm/admgpu: Serialize RAS recovery work directly into
            reset domain <br>
                 queue. <br>
               drm/amdgpu: Add delayed work for GPU reset from debugfs <br>
               drm/amdgpu: Add delayed work for GPU reset from kfd. <br>
               drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to
            <br>
                 amdgpu_device_gpu_recover <br>
               drm/amdgpu: Stop any pending reset if another in
            progress. <br>
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  4 +- <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +++++- <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  1 + <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 62
            +++++++++++----------- <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 19 ++++++- <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +- <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 10 ++-- <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h    |  2 +- <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  |  1 + <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  5 +- <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  2 +- <br>
              drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c      |  6 +-- <br>
              drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c      |  6 +-- <br>
              drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c      |  6 +-- <br>
              14 files changed, 87 insertions(+), 54 deletions(-) <br>
            <br>
          </blockquote>
          <br>
        </blockquote>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>