[PATCH] drm/amdgpu: disable job timeout on GPU reset disabled

Quan, Evan Evan.Quan at amd.com
Tue Mar 20 02:11:24 UTC 2018


Hi Christian,

The messages prompted on timeout are Errors not just Warnings although we did not see any real problem(for the dgemm special case). That's why we say it confusing.
And i suppose you want a fix like my previous patch(see attachment).

Regards,
Evan
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com]
> Sent: Monday, March 19, 2018 5:42 PM
> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: disable job timeout on GPU reset
> disabled
> 
> Am 19.03.2018 um 07:08 schrieb Evan Quan:
> > Since under some heavy computing environment(dgemm test), it takes the
> > asic over 10+ seconds to finish the dispatched single job which will
> > trigger the timeout. It's quite confusing although it does not seem to
> > bring any real problems.
> > As a quick workround, we choose to disable timeout when GPU reset is
> > disabled.
> 
> NAK, I enabled those warning intentionally even when the GPU recovery is
> disabled to have a hint in the logs what goes wrong.
> 
> Please only increase the timeout for the compute queue and/or add a
> separate timeout for them.
> 
> Regards,
> Christian.
> 
> 
> >
> > Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
> > Signed-off-by: Evan Quan <evan.quan at amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 8bd9c3f..9d6a775 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -861,6 +861,13 @@ static void
> amdgpu_device_check_arguments(struct amdgpu_device *adev)
> >   		amdgpu_lockup_timeout = 10000;
> >   	}
> >
> > +	/*
> > +	 * Disable timeout when GPU reset is disabled to avoid confusing
> > +	 * timeout messages in the kernel log.
> > +	 */
> > +	if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
> > +		amdgpu_lockup_timeout = INT_MAX;
> > +
> >   	adev->firmware.load_type = amdgpu_ucode_get_load_type(adev,
> amdgpu_fw_load_type);
> >   }
> >

-------------- next part --------------
An embedded message was scrubbed...
From: "Quan, Evan" <Evan.Quan at amd.com>
Subject: [PATCH] drm/amdgpu: no job timeout setting on compute queues
Date: Fri, 16 Mar 2018 04:52:32 +0000
Size: 4757
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180320/cb42d918/attachment.mht>


More information about the amd-gfx mailing list