[PATCH v3] drm/amdgpu: Increase KIQ invalidate_tlbs timeout

Jay Cornwall jay.cornwall at amd.com
Wed Apr 2 16:33:59 UTC 2025


On 4/2/2025 02:37, Christian König wrote:

>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index ffca74a476da..3cdb5f8325aa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -356,7 +356,6 @@ enum amdgpu_kiq_irq {
>>   	AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0,
>>   	AMDGPU_CP_KIQ_IRQ_LAST
>>   };
>> -#define SRIOV_USEC_TIMEOUT  1200000 /* wait 12 * 100ms for SRIOV */
>>   #define MAX_KIQ_REG_WAIT       5000 /* in usecs, 5ms */
>>   #define MAX_KIQ_REG_BAILOUT_INTERVAL   5 /* in msecs, 5ms */
>>   #define MAX_KIQ_REG_TRY 1000
> 
> Unrelated to this patch here, but defines like those *must* have an AMDGPU_ prefix.
> 
> Please fix in a follow up patch.

Sure. A deeper problem which has led to these macros is the duplication 
of polling logic across several different files.

We could instead move this code into amdgpu_fence_wait_polling. All 
clients would then abort early on in_reset or in_interrupt. There are a 
couple of users with different timeouts (adev->usec_timeout and a 
hard-coded 2100ms) which could be unified or retained with a fixed 5ms 
polling interval.

adev->usec_timeout is too low for this particular system under load.


More information about the amd-gfx mailing list