<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">That's a good point as well, maybe we
      should have separate timeouts for gfx and compute?<br>
      <br>
      Something like 5 seconds for gfx and 1 minute (or even longer) for
      compute?<br>
      <br>
      Anyway I agree that we can worry about that later on, patch is
      Reviewed-by: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a> for
      now.<br>
      <br>
      Regards,<br>
      Christian.<br>
      <br>
      Am 20.03.2018 um 15:16 schrieb Deucher, Alexander:<br>
    </div>
    <blockquote type="cite"
cite="mid:DM5PR12MB1820FEE50DE4EBD1E44B676BF7AB0@DM5PR12MB1820.namprd12.prod.outlook.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
      <div id="divtagdefaultwrapper"
style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;"
        dir="ltr">
        <p style="margin-top:0;margin-bottom:0">My concern was that
          compute will always have the timeout disabled with no way to
          override it even if you enable GPU reset.  I guess we can
          address that down the road.</p>
        <p style="margin-top:0;margin-bottom:0"><br>
        </p>
        <p style="margin-top:0;margin-bottom:0">Acked-by: Alex Deucher
          <a class="moz-txt-link-rfc2396E" href="mailto:alexander.deucher@amd.com"><alexander.deucher@amd.com></a><br>
        </p>
      </div>
      <hr style="display:inline-block;width:98%" tabindex="-1">
      <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
          face="Calibri, sans-serif" color="#000000"><b>From:</b>
          Koenig, Christian<br>
          <b>Sent:</b> Tuesday, March 20, 2018 6:14:29 AM<br>
          <b>To:</b> Quan, Evan; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
          <b>Cc:</b> Deucher, Alexander<br>
          <b>Subject:</b> Re: [PATCH] drm/amdgpu: disable job timeout on
          GPU reset disabled</font>
        <div> </div>
      </div>
      <div class="BodyFragment"><font size="2"><span
            style="font-size:11pt;">
            <div class="PlainText">Hi Evan,<br>
              <br>
              that one is perfect if you ask me. Just reading up on the
              history of <br>
              that patch, Alex what was your concern with that?<br>
              <br>
              Regarding printing this as error, that's a really good
              point as well. We <br>
              should probably reduce it to a warning or even info
              severity.<br>
              <br>
              Regards,<br>
              Christian.<br>
              <br>
              Am 20.03.2018 um 03:11 schrieb Quan, Evan:<br>
              > Hi Christian,<br>
              ><br>
              > The messages prompted on timeout are Errors not just
              Warnings although we did not see any real problem(for the
              dgemm special case). That's why we say it confusing.<br>
              > And i suppose you want a fix like my previous
              patch(see attachment).<br>
              ><br>
              > Regards,<br>
              > Evan<br>
              >> -----Original Message-----<br>
              >> From: Christian König [<a
                href="mailto:ckoenig.leichtzumerken@gmail.com"
                moz-do-not-send="true">mailto:ckoenig.leichtzumerken@gmail.com</a>]<br>
              >> Sent: Monday, March 19, 2018 5:42 PM<br>
              >> To: Quan, Evan <a class="moz-txt-link-rfc2396E" href="mailto:Evan.Quan@amd.com"><Evan.Quan@amd.com></a>;
              <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
              >> Cc: Deucher, Alexander
              <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com"><Alexander.Deucher@amd.com></a><br>
              >> Subject: Re: [PATCH] drm/amdgpu: disable job
              timeout on GPU reset<br>
              >> disabled<br>
              >><br>
              >> Am 19.03.2018 um 07:08 schrieb Evan Quan:<br>
              >>> Since under some heavy computing
              environment(dgemm test), it takes the<br>
              >>> asic over 10+ seconds to finish the
              dispatched single job which will<br>
              >>> trigger the timeout. It's quite confusing
              although it does not seem to<br>
              >>> bring any real problems.<br>
              >>> As a quick workround, we choose to disable
              timeout when GPU reset is<br>
              >>> disabled.<br>
              >> NAK, I enabled those warning intentionally even
              when the GPU recovery is<br>
              >> disabled to have a hint in the logs what goes
              wrong.<br>
              >><br>
              >> Please only increase the timeout for the compute
              queue and/or add a<br>
              >> separate timeout for them.<br>
              >><br>
              >> Regards,<br>
              >> Christian.<br>
              >><br>
              >><br>
              >>> Change-Id:
              I3a95d856ba4993094dc7b6269649e470c5b053d2<br>
              >>> Signed-off-by: Evan Quan
              <a class="moz-txt-link-rfc2396E" href="mailto:evan.quan@amd.com"><evan.quan@amd.com></a><br>
              >>> ---<br>
              >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
              | 7 +++++++<br>
              >>>    1 file changed, 7 insertions(+)<br>
              >>><br>
              >>> diff --git
              a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
              >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
              >>> index 8bd9c3f..9d6a775 100644<br>
              >>> ---
              a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
              >>> +++
              b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
              >>> @@ -861,6 +861,13 @@ static void<br>
              >> amdgpu_device_check_arguments(struct
              amdgpu_device *adev)<br>
              >>>              amdgpu_lockup_timeout = 10000;<br>
              >>>      }<br>
              >>><br>
              >>> +   /*<br>
              >>> +    * Disable timeout when GPU reset is
              disabled to avoid confusing<br>
              >>> +    * timeout messages in the kernel log.<br>
              >>> +    */<br>
              >>> +   if (amdgpu_gpu_recovery == 0 ||
              amdgpu_gpu_recovery == -1)<br>
              >>> +           amdgpu_lockup_timeout = INT_MAX;<br>
              >>> +<br>
              >>>      adev->firmware.load_type =
              amdgpu_ucode_get_load_type(adev,<br>
              >> amdgpu_fw_load_type);<br>
              >>>    }<br>
              >>><br>
              <br>
            </div>
          </span></font></div>
    </blockquote>
    <br>
  </body>
</html>