[PATCH] drm/amdgpu: support gpu recovery tests on compute rings
Christian König
ckoenig.leichtzumerken at gmail.com
Fri Apr 26 08:33:27 UTC 2019
Am 26.04.19 um 10:20 schrieb Quan, Evan:
> My concern is there is already one module parameter "lockup_timeout".
> parm: lockup_timeout:GPU lockup timeout in ms > 0 (default 10000) (int)
>
> Adding one more "timeout" seems redundant.
> And that will makes the description of "lockup_timeout"(seems working for all jobs) does not match its real effect(affect only non-compute jobs).
>
> A better way is to rename "lockup_timeout" to "non-compute lockup_timeout". But I do not think we can change existing module parameter. Right?
No, that's fine. Module parameters are not part of the API which needs
to stay backward compatible.
Maybe use compute_lockup_timeout and other_lockup_timeout or something
similar?
Regards,
Christian.
>
> Regards,
> Evan
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>> Christian K?nig
>> Sent: Friday, April 26, 2019 3:34 PM
>> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Xu, Feifei <Feifei.Xu at amd.com>; Cui, Flora <Flora.Cui at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: support gpu recovery tests on compute
>> rings
>>
>> Am 26.04.19 um 09:24 schrieb Evan Quan:
>>> A new module parameter is added for determining whether or not to
>>> enforce timeout on compute jobs.
>> Can we rework that a bit and instead of a bool have a separate millisecond
>> timeout for compute?
>>
>> E.g. default is 0 and that means MAX_SCHEDULE_TIMEOUT unless we are
>> under SRIOV.
>> Any other value is just the timeout in milliseconds.
>>
>> Christian.
>>
>>> Change-Id: If14b75977312e42dac0431072456e5b69cf1bc2f
>>> Signed-off-by: Evan Quan <evan.quan at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 ++++++++
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 3 ++-
>>> 3 files changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index e16dcee2bf75..ee624d993df7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -166,6 +166,7 @@ extern int amdgpu_si_support;
>>> #ifdef CONFIG_DRM_AMDGPU_CIK
>>> extern int amdgpu_cik_support;
>>> #endif
>>> +extern bool amdgpu_compute_timeout_enforced;
>>>
>>> #define AMDGPU_VM_MAX_NUM_CTX 4096
>>> #define AMDGPU_SG_THRESHOLD (256*1024*1024)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index 13a68f62bcc8..91de3e90fae9 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -140,6 +140,7 @@ struct amdgpu_mgpu_info mgpu_info = {
>>> };
>>> int amdgpu_ras_enable = -1;
>>> uint amdgpu_ras_mask = 0xffffffff;
>>> +bool amdgpu_compute_timeout_enforced = false;
>>>
>>> /**
>>> * DOC: vramlimit (int)
>>> @@ -234,6 +235,13 @@ module_param_named(msi, amdgpu_msi, int,
>> 0444);
>>> MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms > 0
>> (default 10000)");
>>> module_param_named(lockup_timeout, amdgpu_lockup_timeout, int,
>>> 0444);
>>>
>>> +/**
>>> + * DOC: compute_timeout_enforced (bool)
>>> + * Whether or not to enforce timeout on compute jobs (1 = enable, 0 =
>> disable). The default is 0.
>>> + */
>>> +MODULE_PARM_DESC(compute_timeout_enforced, "Enforce timeout
>> on
>>> +compute jobs (1 = enable, 0 = disable (default))");
>>> +module_param_named(compute_timeout_enforced,
>>> +amdgpu_compute_timeout_enforced, bool, 0444);
>>> +
>>> /**
>>> * DOC: dpm (int)
>>> * Override for dynamic power management setting (1 = enable, 0 =
>> disable). The default is -1 (auto).
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index 4dee2326b29c..4adffad04dbc 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -453,7 +453,8 @@ int amdgpu_fence_driver_init_ring(struct
>> amdgpu_ring *ring,
>>> if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
>>> /* for non-sriov case, no timeout enforce on compute ring */
>>> if ((ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)
>>> - && !amdgpu_sriov_vf(ring->adev))
>>> + && !amdgpu_sriov_vf(ring->adev)
>>> + && !amdgpu_compute_timeout_enforced)
>>> timeout = MAX_SCHEDULE_TIMEOUT;
>>> else
>>> timeout =
>> msecs_to_jiffies(amdgpu_lockup_timeout);
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list