[PATCH] drm/amdgpu: support gpu recovery tests on compute rings

Quan, Evan Evan.Quan at amd.com
Fri Apr 26 08:20:10 UTC 2019


My concern is there is already one module parameter "lockup_timeout".
parm:           lockup_timeout:GPU lockup timeout in ms > 0 (default 10000) (int)

Adding one more "timeout" seems redundant. 
And that will makes the description of "lockup_timeout"(seems working for all jobs) does not match its real effect(affect only non-compute jobs).

A better way is to rename "lockup_timeout" to "non-compute lockup_timeout". But I do not think we can change existing module parameter. Right?

Regards,
Evan
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> Christian K?nig
> Sent: Friday, April 26, 2019 3:34 PM
> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Xu, Feifei <Feifei.Xu at amd.com>; Cui, Flora <Flora.Cui at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: support gpu recovery tests on compute
> rings
> 
> Am 26.04.19 um 09:24 schrieb Evan Quan:
> > A new module parameter is added for determining whether or not to
> > enforce timeout on compute jobs.
> 
> Can we rework that a bit and instead of a bool have a separate millisecond
> timeout for compute?
> 
> E.g. default is 0 and that means MAX_SCHEDULE_TIMEOUT unless we are
> under SRIOV.
> Any other value is just the timeout in milliseconds.
> 
> Christian.
> 
> >
> > Change-Id: If14b75977312e42dac0431072456e5b69cf1bc2f
> > Signed-off-by: Evan Quan <evan.quan at amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h       | 1 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   | 8 ++++++++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 3 ++-
> >   3 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index e16dcee2bf75..ee624d993df7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -166,6 +166,7 @@ extern int amdgpu_si_support;
> >   #ifdef CONFIG_DRM_AMDGPU_CIK
> >   extern int amdgpu_cik_support;
> >   #endif
> > +extern bool amdgpu_compute_timeout_enforced;
> >
> >   #define AMDGPU_VM_MAX_NUM_CTX			4096
> >   #define AMDGPU_SG_THRESHOLD			(256*1024*1024)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 13a68f62bcc8..91de3e90fae9 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -140,6 +140,7 @@ struct amdgpu_mgpu_info mgpu_info = {
> >   };
> >   int amdgpu_ras_enable = -1;
> >   uint amdgpu_ras_mask = 0xffffffff;
> > +bool amdgpu_compute_timeout_enforced = false;
> >
> >   /**
> >    * DOC: vramlimit (int)
> > @@ -234,6 +235,13 @@ module_param_named(msi, amdgpu_msi, int,
> 0444);
> >   MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms > 0
> (default 10000)");
> >   module_param_named(lockup_timeout, amdgpu_lockup_timeout, int,
> > 0444);
> >
> > +/**
> > + * DOC: compute_timeout_enforced (bool)
> > + * Whether or not to enforce timeout on compute jobs (1 = enable, 0 =
> disable). The default is 0.
> > + */
> > +MODULE_PARM_DESC(compute_timeout_enforced, "Enforce timeout
> on
> > +compute jobs (1 = enable, 0 = disable (default))");
> > +module_param_named(compute_timeout_enforced,
> > +amdgpu_compute_timeout_enforced, bool, 0444);
> > +
> >   /**
> >    * DOC: dpm (int)
> >    * Override for dynamic power management setting (1 = enable, 0 =
> disable). The default is -1 (auto).
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 4dee2326b29c..4adffad04dbc 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -453,7 +453,8 @@ int amdgpu_fence_driver_init_ring(struct
> amdgpu_ring *ring,
> >   	if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
> >   		/* for non-sriov case, no timeout enforce on compute ring */
> >   		if ((ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)
> > -				&& !amdgpu_sriov_vf(ring->adev))
> > +				&& !amdgpu_sriov_vf(ring->adev)
> > +				&& !amdgpu_compute_timeout_enforced)
> >   			timeout = MAX_SCHEDULE_TIMEOUT;
> >   		else
> >   			timeout =
> msecs_to_jiffies(amdgpu_lockup_timeout);
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list