[PATCH v3] drm/amdgpu: Increase KIQ invalidate_tlbs timeout
Jay Cornwall
jay.cornwall at amd.com
Wed Apr 2 16:33:59 UTC 2025
On 4/2/2025 02:37, Christian König wrote:
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index ffca74a476da..3cdb5f8325aa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -356,7 +356,6 @@ enum amdgpu_kiq_irq {
>> AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0,
>> AMDGPU_CP_KIQ_IRQ_LAST
>> };
>> -#define SRIOV_USEC_TIMEOUT 1200000 /* wait 12 * 100ms for SRIOV */
>> #define MAX_KIQ_REG_WAIT 5000 /* in usecs, 5ms */
>> #define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */
>> #define MAX_KIQ_REG_TRY 1000
>
> Unrelated to this patch here, but defines like those *must* have an AMDGPU_ prefix.
>
> Please fix in a follow up patch.
Sure. A deeper problem which has led to these macros is the duplication
of polling logic across several different files.
We could instead move this code into amdgpu_fence_wait_polling. All
clients would then abort early on in_reset or in_interrupt. There are a
couple of users with different timeouts (adev->usec_timeout and a
hard-coded 2100ms) which could be unified or retained with a fixed 5ms
polling interval.
adev->usec_timeout is too low for this particular system under load.
More information about the amd-gfx
mailing list