[PATCH] drm/amdgpu: refine kiq access register

Tao, Yintian Yintian.Tao at amd.com
Wed Apr 22 11:49:36 UTC 2020


Hi  Christian


Can you help answer the questions below? Thanks in advance.
-----Original Message-----
From: Koenig, Christian <Christian.Koenig at amd.com> 
Sent: 2020年4月22日 19:03
To: Tao, Yintian <Yintian.Tao at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: refine kiq access register

Am 22.04.20 um 11:29 schrieb Yintian Tao:
> According to the current kiq access register method, there will be 
> race condition when using KIQ to read register if multiple clients 
> want to read at same time just like the expample below:
> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the 
> seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B poll 
> the seqno-1 5. the kiq complete these two read operation 6. client-A 
> to read the register at the wb buffer and
>     get REG-1 value
>
> And if there are multiple clients to frequently write registers 
> through KIQ which may raise the KIQ ring buffer overwritten problem.
>
> Therefore, allocate fixed number wb slot for rreg use and limit the 
> submit number which depends on the kiq ring_size in order to prevent 
> the overwritten problem.
>
> v2: directly use amdgpu_device_wb_get() for each read instead
>      of to reserve fixde number slot.
>      if there is no enough kiq ring buffer or rreg slot then
>      directly print error log and return instead of busy waiting

I would split that into three patches. One for each problem we have here:

1. Fix kgd_hiq_mqd_load() and maybe other occasions to use spin_lock_irqsave().
[yttao]: Do you mean that we need to use spin_lock_irqsave for the functions just like kgd_hiq_mqd_load()?

2. Prevent the overrung of the KIQ. Please drop the approach with the atomic here. Instead just add a amdgpu_fence_wait_polling() into
amdgpu_fence_emit_polling() as I discussed with Monk.
[yttao]: Sorry, I can't get your original idea for the amdgpu_fence_wait_polling(). Can you give more details about it? Thanks in advance.

"That is actually only a problem because the KIQ uses polling waits.

See amdgpu_fence_emit() waits for the oldest possible fence to be signaled before emitting a new one.

I suggest that we do the same in amdgpu_fence_emit_polling(). A one liner like the following should be enough:

amdgpu_fence_wait_polling(ring, seq - ring->fence_drv.num_fences_mask, timeout);"
[yttao]: there is no usage of num_fences_mask at kiq fence polling, the num_fences_mask is only effective at dma_fence architecture.
		If I understand correctly, do you want the protype code below? If the protype code is wrong, can you help give one sample? Thanks in advance.

int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s) 
{
        uint32_t seq;

        if (!s)
                return -EINVAL;
+		amdgpu_fence_wait_polling(ring, seq, timeout); 
        seq = ++ring->fence_drv.sync_seq;
        amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
                        ¦      seq, 0); 

        *s = seq;

        return 0;
}




3. Use amdgpu_device_wb_get() each time we need to submit a read.
[yttao]: yes, I will do it.

Regards,
Christian.

>
> Signed-off-by: Yintian Tao <yttao at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  8 +-
>   .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c    | 13 ++-
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 13 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c       | 83 +++++++++++++++----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h       |  3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |  5 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c      | 13 ++-
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c        |  8 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c         |  8 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         | 35 +++++---
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        | 13 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         | 13 ++-
>   12 files changed, 167 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 4e1d4cfe7a9f..1157c1a0b888 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -526,7 +526,7 @@ static inline void amdgpu_set_ib_value(struct amdgpu_cs_parser *p,
>   /*
>    * Writeback
>    */
> -#define AMDGPU_MAX_WB 128	/* Reserve at most 128 WB slots for amdgpu-owned rings. */
> +#define AMDGPU_MAX_WB 256	/* Reserve at most 256 WB slots for amdgpu-owned rings. */
>   
>   struct amdgpu_wb {
>   	struct amdgpu_bo	*wb_obj;
> @@ -1028,6 +1028,12 @@ bool amdgpu_device_has_dc_support(struct 
> amdgpu_device *adev);
>   
>   int emu_soc_asic_init(struct amdgpu_device *adev);
>   
> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read,
> +			unsigned long *flags);
> +void amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq, unsigned long 
> +*flags);
> +
> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t *offs); 
> +void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, uint32_t *offs);
>   /*
>    * Registers read & write functions.
>    */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index 691c89705bcd..a65d6a1abc04 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -309,9 +309,11 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
>   			    uint32_t doorbell_off)
>   {
>   	struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   	struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring;
>   	struct v10_compute_mqd *m;
>   	uint32_t mec, pipe;
> +	unsigned long flags = 0;
>   	int r;
>   
>   	m = get_mqd(mqd);
> @@ -324,13 +326,19 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
>   	pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
>   		 mec, pipe, queue_id);
>   
> -	spin_lock(&adev->gfx.kiq.ring_lock);
> +	r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +	if (r) {
> +		pr_err("failed to lock kiq\n");
> +		goto out_unlock;
> +	}
> +
>   	r = amdgpu_ring_alloc(kiq_ring, 7);
>   	if (r) {
>   		pr_err("Failed to alloc KIQ (%d).\n", r);
>   		goto out_unlock;
>   	}
>   
> +	amdgpu_gfx_kiq_consume(kiq, NULL);
>   	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5));
>   	amdgpu_ring_write(kiq_ring,
>   			  PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* Queue_Sel */ @@ -350,8 
> +358,9 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
>   	amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi);
>   	amdgpu_ring_commit(kiq_ring);
>   
> +	amdgpu_gfx_kiq_restore(kiq, NULL);
>   out_unlock:
> -	spin_unlock(&adev->gfx.kiq.ring_lock);
> +	amdgpu_gfx_kiq_unlock(&adev->gfx.kiq, &flags);
>   	release_queue(kgd);
>   
>   	return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index df841c2ac5e7..4435bd716edd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> @@ -307,9 +307,11 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
>   			    uint32_t doorbell_off)
>   {
>   	struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   	struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring;
>   	struct v9_mqd *m;
>   	uint32_t mec, pipe;
> +	unsigned long flags = 0;
>   	int r;
>   
>   	m = get_mqd(mqd);
> @@ -322,13 +324,19 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
>   	pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
>   		 mec, pipe, queue_id);
>   
> -	spin_lock(&adev->gfx.kiq.ring_lock);
> +	r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +	if (r) {
> +		pr_err("failed to lock kiq\n");
> +		goto out_unlock;
> +	}
> +
>   	r = amdgpu_ring_alloc(kiq_ring, 7);
>   	if (r) {
>   		pr_err("Failed to alloc KIQ (%d).\n", r);
>   		goto out_unlock;
>   	}
>   
> +	amdgpu_gfx_kiq_consume(kiq, NULL);
>   	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5));
>   	amdgpu_ring_write(kiq_ring,
>   			  PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* Queue_Sel */ @@ -348,8 
> +356,9 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
>   	amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi);
>   	amdgpu_ring_commit(kiq_ring);
>   
> +	amdgpu_gfx_kiq_restore(kiq, NULL);
>   out_unlock:
> -	spin_unlock(&adev->gfx.kiq.ring_lock);
> +	amdgpu_gfx_kiq_unlock(&adev->gfx.kiq, &flags);
>   	release_queue(kgd);
>   
>   	return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index ea576b4260a4..c0dc7f1849c4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -295,6 +295,43 @@ static int amdgpu_gfx_kiq_acquire(struct amdgpu_device *adev,
>   	return -EINVAL;
>   }
>   
> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read,
> +			unsigned long *flags)
> +{
> +	struct amdgpu_wb *wb = &(kiq->ring.adev)->wb;
> +
> +	spin_lock_irqsave(&kiq->ring_lock, *flags);
> +	if ((atomic64_read(&kiq->submit_avail) > 0) &&
> +	    (read ? find_first_zero_bit(wb->used, wb->num_wb) <
> +	     wb->num_wb : 1)) {
> +		return 0;
> +	} else {
> +		spin_unlock_irqrestore(&kiq->ring_lock, *flags);
> +		pr_err("critical! too more kiq accessers\n");
> +		return -EDEADLK;
> +	}
> +}
> +
> +void amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq, unsigned long 
> +*flags) {
> +	spin_unlock_irqrestore(&kiq->ring_lock, *flags); }
> +
> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t *offs) {
> +	atomic64_dec(&kiq->submit_avail);
> +	if (offs)
> +		amdgpu_device_wb_get(kiq->ring.adev, offs);
> +
> +}
> +
> +void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, uint32_t *offs) {
> +	atomic64_inc(&kiq->submit_avail);
> +	if (offs)
> +		amdgpu_device_wb_free(kiq->ring.adev, *offs); }
> +
>   int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
>   			     struct amdgpu_ring *ring,
>   			     struct amdgpu_irq_src *irq)
> @@ -304,10 +341,6 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device 
> *adev,
>   
>   	spin_lock_init(&kiq->ring_lock);
>   
> -	r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs);
> -	if (r)
> -		return r;
> -
>   	ring->adev = NULL;
>   	ring->ring_obj = NULL;
>   	ring->use_doorbell = true;
> @@ -325,13 +358,15 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
>   			     AMDGPU_RING_PRIO_DEFAULT);
>   	if (r)
>   		dev_warn(adev->dev, "(%d) failed to init kiq ring\n", r);
> +	else
> +		atomic64_set(&kiq->submit_avail, ring->ring_size / 4 /
> +			     (ring->funcs->align_mask + 1));
>   
>   	return r;
>   }
>   
>   void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring)
>   {
> -	amdgpu_device_wb_free(ring->adev, ring->adev->gfx.kiq.reg_val_offs);
>   	amdgpu_ring_fini(ring);
>   }
>   
> @@ -671,19 +706,25 @@ int amdgpu_gfx_cp_ecc_error_irq(struct amdgpu_device *adev,
>   uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>   {
>   	signed long r, cnt = 0;
> -	unsigned long flags;
> -	uint32_t seq;
> +	unsigned long flags = 0;
> +	uint32_t seq, reg_val_offs = 0, value = 0;
>   	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   	struct amdgpu_ring *ring = &kiq->ring;
>   
>   	BUG_ON(!ring->funcs->emit_rreg);
>   
> -	spin_lock_irqsave(&kiq->ring_lock, flags);
> +	r = amdgpu_gfx_kiq_lock(kiq, true, &flags);
> +	if (r) {
> +		pr_err("failed to lock kiq\n");
> +		goto kiq_read_exit;
> +	}
> +
> +	amdgpu_gfx_kiq_consume(kiq, &reg_val_offs);
>   	amdgpu_ring_alloc(ring, 32);
> -	amdgpu_ring_emit_rreg(ring, reg);
> +	amdgpu_ring_emit_rreg(ring, reg, reg_val_offs);
>   	amdgpu_fence_emit_polling(ring, &seq);
>   	amdgpu_ring_commit(ring);
> -	spin_unlock_irqrestore(&kiq->ring_lock, flags);
> +	amdgpu_gfx_kiq_unlock(kiq, &flags);
>   
>   	r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>   
> @@ -707,9 +748,14 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>   	if (cnt > MAX_KIQ_REG_TRY)
>   		goto failed_kiq_read;
>   
> -	return adev->wb.wb[kiq->reg_val_offs];
> +	mb();
> +	value = adev->wb.wb[reg_val_offs];
> +	amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
> +	return value;
>   
>   failed_kiq_read:
> +	amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
> +kiq_read_exit:
>   	pr_err("failed to read reg:%x\n", reg);
>   	return ~0;
>   }
> @@ -717,19 +763,25 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>   void amdgpu_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v)
>   {
>   	signed long r, cnt = 0;
> -	unsigned long flags;
> +	unsigned long flags = 0;
>   	uint32_t seq;
>   	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   	struct amdgpu_ring *ring = &kiq->ring;
>   
>   	BUG_ON(!ring->funcs->emit_wreg);
>   
> -	spin_lock_irqsave(&kiq->ring_lock, flags);
> +	r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +	if (r) {
> +		pr_err("failed to lock kiq\n");
> +		goto kiq_write_exit;
> +	}
> +
> +	amdgpu_gfx_kiq_consume(kiq, NULL);
>   	amdgpu_ring_alloc(ring, 32);
>   	amdgpu_ring_emit_wreg(ring, reg, v);
>   	amdgpu_fence_emit_polling(ring, &seq);
>   	amdgpu_ring_commit(ring);
> -	spin_unlock_irqrestore(&kiq->ring_lock, flags);
> +	amdgpu_gfx_kiq_unlock(kiq, &flags);
>   
>   	r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>   
> @@ -754,8 +806,11 @@ void amdgpu_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v)
>   	if (cnt > MAX_KIQ_REG_TRY)
>   		goto failed_kiq_write;
>   
> +	amdgpu_gfx_kiq_restore(kiq, NULL);
>   	return;
>   
>   failed_kiq_write:
> +	amdgpu_gfx_kiq_restore(kiq, NULL);
> +kiq_write_exit:
>   	pr_err("failed to write reg:%x\n", reg);
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 634746829024..ff7597ca6cad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -96,6 +96,7 @@ struct kiq_pm4_funcs {
>   	int invalidate_tlbs_size;
>   };
>   
> +#define MAX_KIQ_RREG_NUM 64
>   struct amdgpu_kiq {
>   	u64			eop_gpu_addr;
>   	struct amdgpu_bo	*eop_obj;
> @@ -103,7 +104,7 @@ struct amdgpu_kiq {
>   	struct amdgpu_ring	ring;
>   	struct amdgpu_irq_src	irq;
>   	const struct kiq_pm4_funcs *pmf;
> -	uint32_t			reg_val_offs;
> +	atomic64_t		submit_avail;
>   };
>   
>   /*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index f61664ee4940..137d3d2b46e8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -181,7 +181,8 @@ struct amdgpu_ring_funcs {
>   	void (*end_use)(struct amdgpu_ring *ring);
>   	void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>   	void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
> -	void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
> +	void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
> +			  uint32_t reg_val_offs);
>   	void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
>   	void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg,
>   			      uint32_t val, uint32_t mask); @@ -265,7 +266,7 @@ struct 
> amdgpu_ring {
>   #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
>   #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
>   #define amdgpu_ring_emit_cntxcntl(r, d) 
> (r)->funcs->emit_cntxcntl((r), (d)) -#define amdgpu_ring_emit_rreg(r, 
> d) (r)->funcs->emit_rreg((r), (d))
> +#define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), 
> +(d), (o))
>   #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
>   #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))
>   #define amdgpu_ring_emit_reg_write_reg_wait(r, d0, d1, v, m) 
> (r)->funcs->emit_reg_write_reg_wait((r), (d0), (d1), (v), (m)) diff 
> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 8c10084f44ef..a4c3f284691c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -56,13 +56,19 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   	unsigned long flags;
>   	uint32_t seq;
>   
> -	spin_lock_irqsave(&kiq->ring_lock, flags);
> +	r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +	if (r) {
> +		pr_err("failed to lock kiq\n");
> +		goto failed_exit;
> +	}
> +
> +	amdgpu_gfx_kiq_consume(kiq, NULL);
>   	amdgpu_ring_alloc(ring, 32);
>   	amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>   					    ref, mask);
>   	amdgpu_fence_emit_polling(ring, &seq);
>   	amdgpu_ring_commit(ring);
> -	spin_unlock_irqrestore(&kiq->ring_lock, flags);
> +	amdgpu_gfx_kiq_unlock(kiq, &flags);
>   
>   	r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>   
> @@ -80,9 +86,12 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   	if (cnt > MAX_KIQ_REG_TRY)
>   		goto failed_kiq;
>   
> +	amdgpu_gfx_kiq_restore(kiq, NULL);
>   	return;
>   
>   failed_kiq:
> +	amdgpu_gfx_kiq_restore(kiq, NULL);
> +failed_exit:
>   	pr_err("failed to write reg %x wait reg %x\n", reg0, reg1);
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 0a03e2ad5d95..7853dbc3c8bd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -7594,10 +7594,10 @@ static void gfx_v10_0_ring_emit_frame_cntl(struct amdgpu_ring *ring, bool start,
>   	amdgpu_ring_write(ring, v | FRAME_CMD(start ? 0 : 1));
>   }
>   
> -static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, 
> uint32_t reg)
> +static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
> +				     uint32_t reg_val_offs)
>   {
>   	struct amdgpu_device *adev = ring->adev;
> -	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
>   	amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>   	amdgpu_ring_write(ring, 0 |	/* src: register*/
> @@ -7606,9 +7606,9 @@ static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
>   	amdgpu_ring_write(ring, reg);
>   	amdgpu_ring_write(ring, 0);
>   	amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> -				kiq->reg_val_offs * 4));
> +				reg_val_offs * 4));
>   	amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> -				kiq->reg_val_offs * 4));
> +				reg_val_offs * 4));
>   }
>   
>   static void gfx_v10_0_ring_emit_wreg(struct amdgpu_ring *ring, 
> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index fc6c2f2bc76c..bdbd92d4fe45 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -6383,10 +6383,10 @@ static void gfx_v8_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
>   		ring->ring[offset] = (ring->ring_size >> 2) - offset + cur;
>   }
>   
> -static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, 
> uint32_t reg)
> +static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
> +				    uint32_t reg_val_offs)
>   {
>   	struct amdgpu_device *adev = ring->adev;
> -	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
>   	amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>   	amdgpu_ring_write(ring, 0 |	/* src: register*/
> @@ -6395,9 +6395,9 @@ static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
>   	amdgpu_ring_write(ring, reg);
>   	amdgpu_ring_write(ring, 0);
>   	amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> -				kiq->reg_val_offs * 4));
> +				reg_val_offs * 4));
>   	amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> -				kiq->reg_val_offs * 4));
> +				reg_val_offs * 4));
>   }
>   
>   static void gfx_v8_0_ring_emit_wreg(struct amdgpu_ring *ring, 
> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 84fcf842316d..293219452c0f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -4042,14 +4042,21 @@ static int gfx_v9_0_soft_reset(void *handle)
>   static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
>   {
>   	signed long r, cnt = 0;
> -	unsigned long flags;
> -	uint32_t seq;
> +	unsigned long flags = 0;
> +	uint32_t seq, reg_val_offs;
> +	uint64_t value = 0;
>   	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   	struct amdgpu_ring *ring = &kiq->ring;
>   
>   	BUG_ON(!ring->funcs->emit_rreg);
>   
> -	spin_lock_irqsave(&kiq->ring_lock, flags);
> +	r = amdgpu_gfx_kiq_lock(kiq, true, &flags);
> +	if (r) {
> +		pr_err("failed to lock kiq\n");
> +		goto failed_kiq_exit;
> +	}
> +
> +	amdgpu_gfx_kiq_consume(kiq, &reg_val_offs);
>   	amdgpu_ring_alloc(ring, 32);
>   	amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>   	amdgpu_ring_write(ring, 9 |	/* src: register*/
> @@ -4059,12 +4066,12 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
>   	amdgpu_ring_write(ring, 0);
>   	amdgpu_ring_write(ring, 0);
>   	amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> -				kiq->reg_val_offs * 4));
> +				reg_val_offs * 4));
>   	amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> -				kiq->reg_val_offs * 4));
> +				reg_val_offs * 4));
>   	amdgpu_fence_emit_polling(ring, &seq);
>   	amdgpu_ring_commit(ring);
> -	spin_unlock_irqrestore(&kiq->ring_lock, flags);
> +	amdgpu_gfx_kiq_unlock(kiq, &flags);
>   
>   	r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>   
> @@ -4088,10 +4095,14 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
>   	if (cnt > MAX_KIQ_REG_TRY)
>   		goto failed_kiq_read;
>   
> -	return (uint64_t)adev->wb.wb[kiq->reg_val_offs] |
> -		(uint64_t)adev->wb.wb[kiq->reg_val_offs + 1 ] << 32ULL;
> +	value = (uint64_t)adev->wb.wb[reg_val_offs] |
> +		(uint64_t)adev->wb.wb[reg_val_offs + 1 ] << 32ULL;
>   
> +	amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
> +	return value;
>   failed_kiq_read:
> +	amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
> +failed_kiq_exit:
>   	pr_err("failed to read gpu clock\n");
>   	return ~0;
>   }
> @@ -5482,10 +5493,10 @@ static void gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
>   		ring->ring[offset] = (ring->ring_size>>2) - offset + cur;
>   }
>   
> -static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, 
> uint32_t reg)
> +static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
> +				    uint32_t reg_val_offs)
>   {
>   	struct amdgpu_device *adev = ring->adev;
> -	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
>   	amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>   	amdgpu_ring_write(ring, 0 |	/* src: register*/
> @@ -5494,9 +5505,9 @@ static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
>   	amdgpu_ring_write(ring, reg);
>   	amdgpu_ring_write(ring, 0);
>   	amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> -				kiq->reg_val_offs * 4));
> +				reg_val_offs * 4));
>   	amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> -				kiq->reg_val_offs * 4));
> +				reg_val_offs * 4));
>   }
>   
>   static void gfx_v9_0_ring_emit_wreg(struct amdgpu_ring *ring, 
> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index 30b75d79efdb..ab960b052c0d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -415,6 +415,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>   {
>   	int vmid, i;
>   	signed long r;
> +	unsigned long flags = 0;
>   	uint32_t seq;
>   	uint16_t queried_pasid;
>   	bool ret;
> @@ -422,20 +423,28 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>   	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
>   	if (amdgpu_emu_mode == 0 && ring->sched.ready) {
> -		spin_lock(&adev->gfx.kiq.ring_lock);
> +		r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +		if (r) {
> +			pr_err("failed to lock kiq\n");
> +			return -ETIME;
> +		}
> +
>   		/* 2 dwords flush + 8 dwords fence */
>   		amdgpu_ring_alloc(ring, kiq->pmf->invalidate_tlbs_size + 8);
> +		amdgpu_gfx_kiq_consume(kiq, NULL);
>   		kiq->pmf->kiq_invalidate_tlbs(ring,
>   					pasid, flush_type, all_hub);
>   		amdgpu_fence_emit_polling(ring, &seq);
>   		amdgpu_ring_commit(ring);
> -		spin_unlock(&adev->gfx.kiq.ring_lock);
> +		amdgpu_gfx_kiq_unlock(kiq, &flags);
>   		r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
>   		if (r < 1) {
>   			DRM_ERROR("wait for kiq fence error: %ld.\n", r);
> +			amdgpu_gfx_kiq_restore(kiq, false);
>   			return -ETIME;
>   		}
>   
> +		amdgpu_gfx_kiq_restore(kiq, NULL);
>   		return 0;
>   	}
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index fecdbc471983..d14fd56a959e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -589,6 +589,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>   {
>   	int vmid, i;
>   	signed long r;
> +	unsigned long flags = 0;
>   	uint32_t seq;
>   	uint16_t queried_pasid;
>   	bool ret;
> @@ -613,7 +614,13 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>   		if (vega20_xgmi_wa)
>   			ndw += kiq->pmf->invalidate_tlbs_size;
>   
> -		spin_lock(&adev->gfx.kiq.ring_lock);
> +		r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +		if (r) {
> +			pr_err("failed to lock kiq\n");
> +			return -ETIME;
> +		}
> +
> +		amdgpu_gfx_kiq_consume(kiq, NULL);
>   		/* 2 dwords flush + 8 dwords fence */
>   		amdgpu_ring_alloc(ring, ndw);
>   		if (vega20_xgmi_wa)
> @@ -623,13 +630,15 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>   					pasid, flush_type, all_hub);
>   		amdgpu_fence_emit_polling(ring, &seq);
>   		amdgpu_ring_commit(ring);
> -		spin_unlock(&adev->gfx.kiq.ring_lock);
> +		amdgpu_gfx_kiq_unlock(kiq, &flags);
>   		r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
>   		if (r < 1) {
>   			DRM_ERROR("wait for kiq fence error: %ld.\n", r);
> +			amdgpu_gfx_kiq_restore(kiq, NULL);
>   			return -ETIME;
>   		}
>   
> +		amdgpu_gfx_kiq_restore(kiq, NULL);
>   		return 0;
>   	}
>   



More information about the amd-gfx mailing list