[PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)

Felix Kuehling felix.kuehling at amd.com
Fri Mar 2 20:47:42 UTC 2018


On 2018-03-02 04:29 AM, Liu, Monk wrote:
> In_atomic() isnot encouraged to be used to judge if sleep is possible, see the macros of it
>
> #define in_atomic() (preept_count() != 0)

OK. But my point is still that you're not testing the right thing when
you check in_interrupt(). The comment before the in_atomic macro
definition states the limitations and says "do not use in driver code".
Unfortunately it doesn't suggest any alternative. I think in_interrupt
is actually worse, because it misses even more cases than in_atomic.

Regards,
  Felix

>
>
> /Monk
>
> -----Original Message-----
> From: Kuehling, Felix 
> Sent: 2018年3月1日 23:50
> To: amd-gfx at lists.freedesktop.org; Liu, Monk <Monk.Liu at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)
>
> On 2018-02-28 02:27 AM, Monk Liu wrote:
>> sometimes GPU is switched to other VFs and won't swich back soon, so 
>> the kiq reg access will not signal within a short period, instead of 
>> busy waiting a long time(MAX_KEQ_REG_WAIT) and returning TMO we can 
>> istead sleep 5ms and try again later (non irq context)
>>
>> And since the waiting in kiq_r/weg is busy wait, so MAX_KIQ_REG_WAIT 
>> shouldn't set to a long time, set it to 10ms is more appropriate.
>>
>> if gpu already in reset state, don't retry the KIQ reg access 
>> otherwise it would always hang because KIQ was already die usually.
>>
>> v2:
>> replace schedule() with msleep() for the wait
>>
>> Change-Id: I8fc807ce85a8d30d2b50153f3f3a6eda344ef994
>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 15 +++++++++++++--
>>  1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> index b832651..1672f5b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> @@ -22,7 +22,7 @@
>>   */
>>  
>>  #include "amdgpu.h"
>> -#define MAX_KIQ_REG_WAIT	100000000 /* in usecs */
>> +#define MAX_KIQ_REG_WAIT	10000 /* in usecs, 10ms */
>>  
>>  uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)  { @@ -152,9 
>> +152,14 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>>  	amdgpu_ring_commit(ring);
>>  	spin_unlock_irqrestore(&kiq->ring_lock, flags);
>>  
>> +retry_read:
>>  	r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>>  	if (r < 1) {
>>  		DRM_ERROR("wait for kiq fence error: %ld\n", r);
>> +		if (!in_interrupt() && !adev->in_gpu_reset) {
> You should check in_atomic here. Because it's invalid to sleep in atomic context (e.g. while holding a spin lock) even when not in an interrupt.
> This seems to happen a lot for indirect register access, e.g.
> soc15_pcie_rreg.
>
> Regards,
>   Felix
>
>> +			msleep(5);
>> +			goto retry_read;
>> +		}
>>  		return ~0;
>>  	}
>>  	val = adev->wb.wb[adev->virt.reg_val_offs];
>> @@ -179,9 +184,15 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v)
>>  	amdgpu_ring_commit(ring);
>>  	spin_unlock_irqrestore(&kiq->ring_lock, flags);
>>  
>> +retry_write:
>>  	r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>> -	if (r < 1)
>> +	if (r < 1) {
>>  		DRM_ERROR("wait for kiq fence error: %ld\n", r);
>> +		if (!in_interrupt() && !adev->in_gpu_reset) {
>> +			msleep(5);
>> +			goto retry_write;
>> +		}
>> +	}
>>  }
>>  
>>  /**



More information about the amd-gfx mailing list