<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">That's a good point as well, maybe we
should have separate timeouts for gfx and compute?<br>
<br>
Something like 5 seconds for gfx and 1 minute (or even longer) for
compute?<br>
<br>
Anyway I agree that we can worry about that later on, patch is
Reviewed-by: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a> for
now.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 20.03.2018 um 15:16 schrieb Deucher, Alexander:<br>
</div>
<blockquote type="cite"
cite="mid:DM5PR12MB1820FEE50DE4EBD1E44B676BF7AB0@DM5PR12MB1820.namprd12.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
<div id="divtagdefaultwrapper"
style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;"
dir="ltr">
<p style="margin-top:0;margin-bottom:0">My concern was that
compute will always have the timeout disabled with no way to
override it even if you enable GPU reset. I guess we can
address that down the road.</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">Acked-by: Alex Deucher
<a class="moz-txt-link-rfc2396E" href="mailto:alexander.deucher@amd.com"><alexander.deucher@amd.com></a><br>
</p>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
face="Calibri, sans-serif" color="#000000"><b>From:</b>
Koenig, Christian<br>
<b>Sent:</b> Tuesday, March 20, 2018 6:14:29 AM<br>
<b>To:</b> Quan, Evan; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
<b>Cc:</b> Deucher, Alexander<br>
<b>Subject:</b> Re: [PATCH] drm/amdgpu: disable job timeout on
GPU reset disabled</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span
style="font-size:11pt;">
<div class="PlainText">Hi Evan,<br>
<br>
that one is perfect if you ask me. Just reading up on the
history of <br>
that patch, Alex what was your concern with that?<br>
<br>
Regarding printing this as error, that's a really good
point as well. We <br>
should probably reduce it to a warning or even info
severity.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 20.03.2018 um 03:11 schrieb Quan, Evan:<br>
> Hi Christian,<br>
><br>
> The messages prompted on timeout are Errors not just
Warnings although we did not see any real problem(for the
dgemm special case). That's why we say it confusing.<br>
> And i suppose you want a fix like my previous
patch(see attachment).<br>
><br>
> Regards,<br>
> Evan<br>
>> -----Original Message-----<br>
>> From: Christian König [<a
href="mailto:ckoenig.leichtzumerken@gmail.com"
moz-do-not-send="true">mailto:ckoenig.leichtzumerken@gmail.com</a>]<br>
>> Sent: Monday, March 19, 2018 5:42 PM<br>
>> To: Quan, Evan <a class="moz-txt-link-rfc2396E" href="mailto:Evan.Quan@amd.com"><Evan.Quan@amd.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
>> Cc: Deucher, Alexander
<a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com"><Alexander.Deucher@amd.com></a><br>
>> Subject: Re: [PATCH] drm/amdgpu: disable job
timeout on GPU reset<br>
>> disabled<br>
>><br>
>> Am 19.03.2018 um 07:08 schrieb Evan Quan:<br>
>>> Since under some heavy computing
environment(dgemm test), it takes the<br>
>>> asic over 10+ seconds to finish the
dispatched single job which will<br>
>>> trigger the timeout. It's quite confusing
although it does not seem to<br>
>>> bring any real problems.<br>
>>> As a quick workround, we choose to disable
timeout when GPU reset is<br>
>>> disabled.<br>
>> NAK, I enabled those warning intentionally even
when the GPU recovery is<br>
>> disabled to have a hint in the logs what goes
wrong.<br>
>><br>
>> Please only increase the timeout for the compute
queue and/or add a<br>
>> separate timeout for them.<br>
>><br>
>> Regards,<br>
>> Christian.<br>
>><br>
>><br>
>>> Change-Id:
I3a95d856ba4993094dc7b6269649e470c5b053d2<br>
>>> Signed-off-by: Evan Quan
<a class="moz-txt-link-rfc2396E" href="mailto:evan.quan@amd.com"><evan.quan@amd.com></a><br>
>>> ---<br>
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
| 7 +++++++<br>
>>> 1 file changed, 7 insertions(+)<br>
>>><br>
>>> diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
>>> index 8bd9c3f..9d6a775 100644<br>
>>> ---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
>>> +++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
>>> @@ -861,6 +861,13 @@ static void<br>
>> amdgpu_device_check_arguments(struct
amdgpu_device *adev)<br>
>>> amdgpu_lockup_timeout = 10000;<br>
>>> }<br>
>>><br>
>>> + /*<br>
>>> + * Disable timeout when GPU reset is
disabled to avoid confusing<br>
>>> + * timeout messages in the kernel log.<br>
>>> + */<br>
>>> + if (amdgpu_gpu_recovery == 0 ||
amdgpu_gpu_recovery == -1)<br>
>>> + amdgpu_lockup_timeout = INT_MAX;<br>
>>> +<br>
>>> adev->firmware.load_type =
amdgpu_ucode_get_load_type(adev,<br>
>> amdgpu_fw_load_type);<br>
>>> }<br>
>>><br>
<br>
</div>
</span></font></div>
</blockquote>
<br>
</body>
</html>