[PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)

Thu Mar 1 08:42:32 UTC 2018

Sounds reasonable.

I would add a define for the 5ms timeout, with that done the patch is 
Reviewed-by: Christian König <christian.koenig at amd.com> as well.

Regards,
Christian.

Am 01.03.2018 um 09:40 schrieb Ding, Pixel:
> Reviewed-by: Pixel Ding <Pixel.Ding at amd.com>
>
> —
> Sincerely Yours,
> Pixel
>
>
> On 01/03/2018, 4:21 PM, "amd-gfx on behalf of Liu, Monk" <amd-gfx-bounces at lists.freedesktop.org on behalf of Monk.Liu at amd.com> wrote:
>
>      Hi Christian
>      
>      KIQ is used by kernel, and we submit commands on it without gpu scheduler,
>      In sriov env, register access must go through KIQ in non-exclusive mode because we don’t want one VF access MMIO during the period that this VF is not occupying
>      GPU resource,
>      
>      But since now we use busy polling wait for the fence of KIQ, this way it is too latency to let KIQ wait before KIQ job finish, but since KIQ is also part of CP
>      So it may serve other VFs, and that cause a situation that KIQ job in certain VF may need at least 1~2 seconds, which is not good to use busy wait,
>      
>      So my patch changed something to let this busy wait bail out for a moment when not in IRQ context, another concern is if KIQ is in current VF, but current
>      VF already die, then all KIQ job will not signal, so it's still need this bail out and retry scheme to make sure CPU not always in that busy polling status
>      
>      Thanks
>      /Monk
>      
>      -----Original Message-----
>      From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com]
>      Sent: 2018年3月1日 16:13
>      To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
>      Subject: Re: [PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)
>      
>       From a coding style perspective a define for the 5ms timeout would be nice to have and it might be better to use a do { } while instead of the goto, but those are really minor issues.
>      
>      But from the technical perspective I can't fully judge if that is a good idea or not cause I'm not so deeply know how the KIQ works.
>      
>      Christian.
>      
>      Am 01.03.2018 um 06:52 schrieb Liu, Monk:
>      > Anyone review this ?
>      >
>      > -----Original Message-----
>      > From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf
>      > Of Monk Liu
>      > Sent: 2018年2月28日 15:28
>      > To: amd-gfx at lists.freedesktop.org
>      > Cc: Liu, Monk <Monk.Liu at amd.com>
>      > Subject: [PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)
>      >
>      > sometimes GPU is switched to other VFs and won't swich back soon, so
>      > the kiq reg access will not signal within a short period, instead of
>      > busy waiting a long time(MAX_KEQ_REG_WAIT) and returning TMO we can
>      > istead sleep 5ms and try again later (non irq context)
>      >
>      > And since the waiting in kiq_r/weg is busy wait, so MAX_KIQ_REG_WAIT shouldn't set to a long time, set it to 10ms is more appropriate.
>      >
>      > if gpu already in reset state, don't retry the KIQ reg access otherwise it would always hang because KIQ was already die usually.
>      >
>      > v2:
>      > replace schedule() with msleep() for the wait
>      >
>      > Change-Id: I8fc807ce85a8d30d2b50153f3f3a6eda344ef994
>      > Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>      > ---
>      >   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 15 +++++++++++++--
>      >   1 file changed, 13 insertions(+), 2 deletions(-)
>      >
>      > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>      > b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>      > index b832651..1672f5b 100644
>      > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>      > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>      > @@ -22,7 +22,7 @@
>      >    */
>      >
>      >   #include "amdgpu.h"
>      > -#define MAX_KIQ_REG_WAIT	100000000 /* in usecs */
>      > +#define MAX_KIQ_REG_WAIT	10000 /* in usecs, 10ms */
>      >
>      >   uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)  { @@ -152,9 +152,14 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>      >   	amdgpu_ring_commit(ring);
>      >   	spin_unlock_irqrestore(&kiq->ring_lock, flags);
>      >
>      > +retry_read:
>      >   	r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>      >   	if (r < 1) {
>      >   		DRM_ERROR("wait for kiq fence error: %ld\n", r);
>      > +		if (!in_interrupt() && !adev->in_gpu_reset) {
>      > +			msleep(5);
>      > +			goto retry_read;
>      > +		}
>      >   		return ~0;
>      >   	}
>      >   	val = adev->wb.wb[adev->virt.reg_val_offs];
>      > @@ -179,9 +184,15 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v)
>      >   	amdgpu_ring_commit(ring);
>      >   	spin_unlock_irqrestore(&kiq->ring_lock, flags);
>      >
>      > +retry_write:
>      >   	r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>      > -	if (r < 1)
>      > +	if (r < 1) {
>      >   		DRM_ERROR("wait for kiq fence error: %ld\n", r);
>      > +		if (!in_interrupt() && !adev->in_gpu_reset) {
>      > +			msleep(5);
>      > +			goto retry_write;
>      > +		}
>      > +	}
>      >   }
>      >
>      >   /**
>      > --
>      > 2.7.4
>      >
>      > _______________________________________________
>      > amd-gfx mailing list
>      > amd-gfx at lists.freedesktop.org
>      > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>      > _______________________________________________
>      > amd-gfx mailing list
>      > amd-gfx at lists.freedesktop.org
>      > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>      
>      _______________________________________________
>      amd-gfx mailing list
>      amd-gfx at lists.freedesktop.org
>      https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>      
>