[PATCH] drm/amdgpu: refine kiq access register
Tao, Yintian
Yintian.Tao at amd.com
Wed Apr 22 11:49:36 UTC 2020
Hi Christian
Can you help answer the questions below? Thanks in advance.
-----Original Message-----
From: Koenig, Christian <Christian.Koenig at amd.com>
Sent: 2020年4月22日 19:03
To: Tao, Yintian <Yintian.Tao at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: refine kiq access register
Am 22.04.20 um 11:29 schrieb Yintian Tao:
> According to the current kiq access register method, there will be
> race condition when using KIQ to read register if multiple clients
> want to read at same time just like the expample below:
> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the
> seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B poll
> the seqno-1 5. the kiq complete these two read operation 6. client-A
> to read the register at the wb buffer and
> get REG-1 value
>
> And if there are multiple clients to frequently write registers
> through KIQ which may raise the KIQ ring buffer overwritten problem.
>
> Therefore, allocate fixed number wb slot for rreg use and limit the
> submit number which depends on the kiq ring_size in order to prevent
> the overwritten problem.
>
> v2: directly use amdgpu_device_wb_get() for each read instead
> of to reserve fixde number slot.
> if there is no enough kiq ring buffer or rreg slot then
> directly print error log and return instead of busy waiting
I would split that into three patches. One for each problem we have here:
1. Fix kgd_hiq_mqd_load() and maybe other occasions to use spin_lock_irqsave().
[yttao]: Do you mean that we need to use spin_lock_irqsave for the functions just like kgd_hiq_mqd_load()?
2. Prevent the overrung of the KIQ. Please drop the approach with the atomic here. Instead just add a amdgpu_fence_wait_polling() into
amdgpu_fence_emit_polling() as I discussed with Monk.
[yttao]: Sorry, I can't get your original idea for the amdgpu_fence_wait_polling(). Can you give more details about it? Thanks in advance.
"That is actually only a problem because the KIQ uses polling waits.
See amdgpu_fence_emit() waits for the oldest possible fence to be signaled before emitting a new one.
I suggest that we do the same in amdgpu_fence_emit_polling(). A one liner like the following should be enough:
amdgpu_fence_wait_polling(ring, seq - ring->fence_drv.num_fences_mask, timeout);"
[yttao]: there is no usage of num_fences_mask at kiq fence polling, the num_fences_mask is only effective at dma_fence architecture.
If I understand correctly, do you want the protype code below? If the protype code is wrong, can you help give one sample? Thanks in advance.
int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s)
{
uint32_t seq;
if (!s)
return -EINVAL;
+ amdgpu_fence_wait_polling(ring, seq, timeout);
seq = ++ring->fence_drv.sync_seq;
amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
¦ seq, 0);
*s = seq;
return 0;
}
3. Use amdgpu_device_wb_get() each time we need to submit a read.
[yttao]: yes, I will do it.
Regards,
Christian.
>
> Signed-off-by: Yintian Tao <yttao at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 +-
> .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 13 ++-
> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 13 ++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 83 +++++++++++++++----
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 13 ++-
> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 +-
> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 8 +-
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 35 +++++---
> drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 13 ++-
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 13 ++-
> 12 files changed, 167 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 4e1d4cfe7a9f..1157c1a0b888 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -526,7 +526,7 @@ static inline void amdgpu_set_ib_value(struct amdgpu_cs_parser *p,
> /*
> * Writeback
> */
> -#define AMDGPU_MAX_WB 128 /* Reserve at most 128 WB slots for amdgpu-owned rings. */
> +#define AMDGPU_MAX_WB 256 /* Reserve at most 256 WB slots for amdgpu-owned rings. */
>
> struct amdgpu_wb {
> struct amdgpu_bo *wb_obj;
> @@ -1028,6 +1028,12 @@ bool amdgpu_device_has_dc_support(struct
> amdgpu_device *adev);
>
> int emu_soc_asic_init(struct amdgpu_device *adev);
>
> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read,
> + unsigned long *flags);
> +void amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq, unsigned long
> +*flags);
> +
> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t *offs);
> +void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, uint32_t *offs);
> /*
> * Registers read & write functions.
> */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index 691c89705bcd..a65d6a1abc04 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -309,9 +309,11 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
> uint32_t doorbell_off)
> {
> struct amdgpu_device *adev = get_amdgpu_device(kgd);
> + struct amdgpu_kiq *kiq = &adev->gfx.kiq;
> struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring;
> struct v10_compute_mqd *m;
> uint32_t mec, pipe;
> + unsigned long flags = 0;
> int r;
>
> m = get_mqd(mqd);
> @@ -324,13 +326,19 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
> pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
> mec, pipe, queue_id);
>
> - spin_lock(&adev->gfx.kiq.ring_lock);
> + r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> + if (r) {
> + pr_err("failed to lock kiq\n");
> + goto out_unlock;
> + }
> +
> r = amdgpu_ring_alloc(kiq_ring, 7);
> if (r) {
> pr_err("Failed to alloc KIQ (%d).\n", r);
> goto out_unlock;
> }
>
> + amdgpu_gfx_kiq_consume(kiq, NULL);
> amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5));
> amdgpu_ring_write(kiq_ring,
> PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* Queue_Sel */ @@ -350,8
> +358,9 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
> amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi);
> amdgpu_ring_commit(kiq_ring);
>
> + amdgpu_gfx_kiq_restore(kiq, NULL);
> out_unlock:
> - spin_unlock(&adev->gfx.kiq.ring_lock);
> + amdgpu_gfx_kiq_unlock(&adev->gfx.kiq, &flags);
> release_queue(kgd);
>
> return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index df841c2ac5e7..4435bd716edd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> @@ -307,9 +307,11 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
> uint32_t doorbell_off)
> {
> struct amdgpu_device *adev = get_amdgpu_device(kgd);
> + struct amdgpu_kiq *kiq = &adev->gfx.kiq;
> struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring;
> struct v9_mqd *m;
> uint32_t mec, pipe;
> + unsigned long flags = 0;
> int r;
>
> m = get_mqd(mqd);
> @@ -322,13 +324,19 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
> pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
> mec, pipe, queue_id);
>
> - spin_lock(&adev->gfx.kiq.ring_lock);
> + r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> + if (r) {
> + pr_err("failed to lock kiq\n");
> + goto out_unlock;
> + }
> +
> r = amdgpu_ring_alloc(kiq_ring, 7);
> if (r) {
> pr_err("Failed to alloc KIQ (%d).\n", r);
> goto out_unlock;
> }
>
> + amdgpu_gfx_kiq_consume(kiq, NULL);
> amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5));
> amdgpu_ring_write(kiq_ring,
> PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* Queue_Sel */ @@ -348,8
> +356,9 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
> amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi);
> amdgpu_ring_commit(kiq_ring);
>
> + amdgpu_gfx_kiq_restore(kiq, NULL);
> out_unlock:
> - spin_unlock(&adev->gfx.kiq.ring_lock);
> + amdgpu_gfx_kiq_unlock(&adev->gfx.kiq, &flags);
> release_queue(kgd);
>
> return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index ea576b4260a4..c0dc7f1849c4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -295,6 +295,43 @@ static int amdgpu_gfx_kiq_acquire(struct amdgpu_device *adev,
> return -EINVAL;
> }
>
> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read,
> + unsigned long *flags)
> +{
> + struct amdgpu_wb *wb = &(kiq->ring.adev)->wb;
> +
> + spin_lock_irqsave(&kiq->ring_lock, *flags);
> + if ((atomic64_read(&kiq->submit_avail) > 0) &&
> + (read ? find_first_zero_bit(wb->used, wb->num_wb) <
> + wb->num_wb : 1)) {
> + return 0;
> + } else {
> + spin_unlock_irqrestore(&kiq->ring_lock, *flags);
> + pr_err("critical! too more kiq accessers\n");
> + return -EDEADLK;
> + }
> +}
> +
> +void amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq, unsigned long
> +*flags) {
> + spin_unlock_irqrestore(&kiq->ring_lock, *flags); }
> +
> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t *offs) {
> + atomic64_dec(&kiq->submit_avail);
> + if (offs)
> + amdgpu_device_wb_get(kiq->ring.adev, offs);
> +
> +}
> +
> +void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, uint32_t *offs) {
> + atomic64_inc(&kiq->submit_avail);
> + if (offs)
> + amdgpu_device_wb_free(kiq->ring.adev, *offs); }
> +
> int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
> struct amdgpu_ring *ring,
> struct amdgpu_irq_src *irq)
> @@ -304,10 +341,6 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device
> *adev,
>
> spin_lock_init(&kiq->ring_lock);
>
> - r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs);
> - if (r)
> - return r;
> -
> ring->adev = NULL;
> ring->ring_obj = NULL;
> ring->use_doorbell = true;
> @@ -325,13 +358,15 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
> AMDGPU_RING_PRIO_DEFAULT);
> if (r)
> dev_warn(adev->dev, "(%d) failed to init kiq ring\n", r);
> + else
> + atomic64_set(&kiq->submit_avail, ring->ring_size / 4 /
> + (ring->funcs->align_mask + 1));
>
> return r;
> }
>
> void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring)
> {
> - amdgpu_device_wb_free(ring->adev, ring->adev->gfx.kiq.reg_val_offs);
> amdgpu_ring_fini(ring);
> }
>
> @@ -671,19 +706,25 @@ int amdgpu_gfx_cp_ecc_error_irq(struct amdgpu_device *adev,
> uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
> {
> signed long r, cnt = 0;
> - unsigned long flags;
> - uint32_t seq;
> + unsigned long flags = 0;
> + uint32_t seq, reg_val_offs = 0, value = 0;
> struct amdgpu_kiq *kiq = &adev->gfx.kiq;
> struct amdgpu_ring *ring = &kiq->ring;
>
> BUG_ON(!ring->funcs->emit_rreg);
>
> - spin_lock_irqsave(&kiq->ring_lock, flags);
> + r = amdgpu_gfx_kiq_lock(kiq, true, &flags);
> + if (r) {
> + pr_err("failed to lock kiq\n");
> + goto kiq_read_exit;
> + }
> +
> + amdgpu_gfx_kiq_consume(kiq, ®_val_offs);
> amdgpu_ring_alloc(ring, 32);
> - amdgpu_ring_emit_rreg(ring, reg);
> + amdgpu_ring_emit_rreg(ring, reg, reg_val_offs);
> amdgpu_fence_emit_polling(ring, &seq);
> amdgpu_ring_commit(ring);
> - spin_unlock_irqrestore(&kiq->ring_lock, flags);
> + amdgpu_gfx_kiq_unlock(kiq, &flags);
>
> r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>
> @@ -707,9 +748,14 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
> if (cnt > MAX_KIQ_REG_TRY)
> goto failed_kiq_read;
>
> - return adev->wb.wb[kiq->reg_val_offs];
> + mb();
> + value = adev->wb.wb[reg_val_offs];
> + amdgpu_gfx_kiq_restore(kiq, ®_val_offs);
> + return value;
>
> failed_kiq_read:
> + amdgpu_gfx_kiq_restore(kiq, ®_val_offs);
> +kiq_read_exit:
> pr_err("failed to read reg:%x\n", reg);
> return ~0;
> }
> @@ -717,19 +763,25 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
> void amdgpu_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v)
> {
> signed long r, cnt = 0;
> - unsigned long flags;
> + unsigned long flags = 0;
> uint32_t seq;
> struct amdgpu_kiq *kiq = &adev->gfx.kiq;
> struct amdgpu_ring *ring = &kiq->ring;
>
> BUG_ON(!ring->funcs->emit_wreg);
>
> - spin_lock_irqsave(&kiq->ring_lock, flags);
> + r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> + if (r) {
> + pr_err("failed to lock kiq\n");
> + goto kiq_write_exit;
> + }
> +
> + amdgpu_gfx_kiq_consume(kiq, NULL);
> amdgpu_ring_alloc(ring, 32);
> amdgpu_ring_emit_wreg(ring, reg, v);
> amdgpu_fence_emit_polling(ring, &seq);
> amdgpu_ring_commit(ring);
> - spin_unlock_irqrestore(&kiq->ring_lock, flags);
> + amdgpu_gfx_kiq_unlock(kiq, &flags);
>
> r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>
> @@ -754,8 +806,11 @@ void amdgpu_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v)
> if (cnt > MAX_KIQ_REG_TRY)
> goto failed_kiq_write;
>
> + amdgpu_gfx_kiq_restore(kiq, NULL);
> return;
>
> failed_kiq_write:
> + amdgpu_gfx_kiq_restore(kiq, NULL);
> +kiq_write_exit:
> pr_err("failed to write reg:%x\n", reg);
> }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 634746829024..ff7597ca6cad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -96,6 +96,7 @@ struct kiq_pm4_funcs {
> int invalidate_tlbs_size;
> };
>
> +#define MAX_KIQ_RREG_NUM 64
> struct amdgpu_kiq {
> u64 eop_gpu_addr;
> struct amdgpu_bo *eop_obj;
> @@ -103,7 +104,7 @@ struct amdgpu_kiq {
> struct amdgpu_ring ring;
> struct amdgpu_irq_src irq;
> const struct kiq_pm4_funcs *pmf;
> - uint32_t reg_val_offs;
> + atomic64_t submit_avail;
> };
>
> /*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index f61664ee4940..137d3d2b46e8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -181,7 +181,8 @@ struct amdgpu_ring_funcs {
> void (*end_use)(struct amdgpu_ring *ring);
> void (*emit_switch_buffer) (struct amdgpu_ring *ring);
> void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
> - void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
> + void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
> + uint32_t reg_val_offs);
> void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
> void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg,
> uint32_t val, uint32_t mask); @@ -265,7 +266,7 @@ struct
> amdgpu_ring {
> #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
> #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
> #define amdgpu_ring_emit_cntxcntl(r, d)
> (r)->funcs->emit_cntxcntl((r), (d)) -#define amdgpu_ring_emit_rreg(r,
> d) (r)->funcs->emit_rreg((r), (d))
> +#define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r),
> +(d), (o))
> #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
> #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))
> #define amdgpu_ring_emit_reg_write_reg_wait(r, d0, d1, v, m)
> (r)->funcs->emit_reg_write_reg_wait((r), (d0), (d1), (v), (m)) diff
> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 8c10084f44ef..a4c3f284691c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -56,13 +56,19 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
> unsigned long flags;
> uint32_t seq;
>
> - spin_lock_irqsave(&kiq->ring_lock, flags);
> + r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> + if (r) {
> + pr_err("failed to lock kiq\n");
> + goto failed_exit;
> + }
> +
> + amdgpu_gfx_kiq_consume(kiq, NULL);
> amdgpu_ring_alloc(ring, 32);
> amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
> ref, mask);
> amdgpu_fence_emit_polling(ring, &seq);
> amdgpu_ring_commit(ring);
> - spin_unlock_irqrestore(&kiq->ring_lock, flags);
> + amdgpu_gfx_kiq_unlock(kiq, &flags);
>
> r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>
> @@ -80,9 +86,12 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
> if (cnt > MAX_KIQ_REG_TRY)
> goto failed_kiq;
>
> + amdgpu_gfx_kiq_restore(kiq, NULL);
> return;
>
> failed_kiq:
> + amdgpu_gfx_kiq_restore(kiq, NULL);
> +failed_exit:
> pr_err("failed to write reg %x wait reg %x\n", reg0, reg1);
> }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 0a03e2ad5d95..7853dbc3c8bd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -7594,10 +7594,10 @@ static void gfx_v10_0_ring_emit_frame_cntl(struct amdgpu_ring *ring, bool start,
> amdgpu_ring_write(ring, v | FRAME_CMD(start ? 0 : 1));
> }
>
> -static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring,
> uint32_t reg)
> +static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
> + uint32_t reg_val_offs)
> {
> struct amdgpu_device *adev = ring->adev;
> - struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>
> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
> amdgpu_ring_write(ring, 0 | /* src: register*/
> @@ -7606,9 +7606,9 @@ static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
> amdgpu_ring_write(ring, reg);
> amdgpu_ring_write(ring, 0);
> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> - kiq->reg_val_offs * 4));
> + reg_val_offs * 4));
> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> - kiq->reg_val_offs * 4));
> + reg_val_offs * 4));
> }
>
> static void gfx_v10_0_ring_emit_wreg(struct amdgpu_ring *ring,
> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index fc6c2f2bc76c..bdbd92d4fe45 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -6383,10 +6383,10 @@ static void gfx_v8_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
> ring->ring[offset] = (ring->ring_size >> 2) - offset + cur;
> }
>
> -static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring,
> uint32_t reg)
> +static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
> + uint32_t reg_val_offs)
> {
> struct amdgpu_device *adev = ring->adev;
> - struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>
> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
> amdgpu_ring_write(ring, 0 | /* src: register*/
> @@ -6395,9 +6395,9 @@ static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
> amdgpu_ring_write(ring, reg);
> amdgpu_ring_write(ring, 0);
> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> - kiq->reg_val_offs * 4));
> + reg_val_offs * 4));
> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> - kiq->reg_val_offs * 4));
> + reg_val_offs * 4));
> }
>
> static void gfx_v8_0_ring_emit_wreg(struct amdgpu_ring *ring,
> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 84fcf842316d..293219452c0f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -4042,14 +4042,21 @@ static int gfx_v9_0_soft_reset(void *handle)
> static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
> {
> signed long r, cnt = 0;
> - unsigned long flags;
> - uint32_t seq;
> + unsigned long flags = 0;
> + uint32_t seq, reg_val_offs;
> + uint64_t value = 0;
> struct amdgpu_kiq *kiq = &adev->gfx.kiq;
> struct amdgpu_ring *ring = &kiq->ring;
>
> BUG_ON(!ring->funcs->emit_rreg);
>
> - spin_lock_irqsave(&kiq->ring_lock, flags);
> + r = amdgpu_gfx_kiq_lock(kiq, true, &flags);
> + if (r) {
> + pr_err("failed to lock kiq\n");
> + goto failed_kiq_exit;
> + }
> +
> + amdgpu_gfx_kiq_consume(kiq, ®_val_offs);
> amdgpu_ring_alloc(ring, 32);
> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
> amdgpu_ring_write(ring, 9 | /* src: register*/
> @@ -4059,12 +4066,12 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
> amdgpu_ring_write(ring, 0);
> amdgpu_ring_write(ring, 0);
> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> - kiq->reg_val_offs * 4));
> + reg_val_offs * 4));
> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> - kiq->reg_val_offs * 4));
> + reg_val_offs * 4));
> amdgpu_fence_emit_polling(ring, &seq);
> amdgpu_ring_commit(ring);
> - spin_unlock_irqrestore(&kiq->ring_lock, flags);
> + amdgpu_gfx_kiq_unlock(kiq, &flags);
>
> r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>
> @@ -4088,10 +4095,14 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
> if (cnt > MAX_KIQ_REG_TRY)
> goto failed_kiq_read;
>
> - return (uint64_t)adev->wb.wb[kiq->reg_val_offs] |
> - (uint64_t)adev->wb.wb[kiq->reg_val_offs + 1 ] << 32ULL;
> + value = (uint64_t)adev->wb.wb[reg_val_offs] |
> + (uint64_t)adev->wb.wb[reg_val_offs + 1 ] << 32ULL;
>
> + amdgpu_gfx_kiq_restore(kiq, ®_val_offs);
> + return value;
> failed_kiq_read:
> + amdgpu_gfx_kiq_restore(kiq, ®_val_offs);
> +failed_kiq_exit:
> pr_err("failed to read gpu clock\n");
> return ~0;
> }
> @@ -5482,10 +5493,10 @@ static void gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
> ring->ring[offset] = (ring->ring_size>>2) - offset + cur;
> }
>
> -static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring,
> uint32_t reg)
> +static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
> + uint32_t reg_val_offs)
> {
> struct amdgpu_device *adev = ring->adev;
> - struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>
> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
> amdgpu_ring_write(ring, 0 | /* src: register*/
> @@ -5494,9 +5505,9 @@ static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
> amdgpu_ring_write(ring, reg);
> amdgpu_ring_write(ring, 0);
> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> - kiq->reg_val_offs * 4));
> + reg_val_offs * 4));
> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> - kiq->reg_val_offs * 4));
> + reg_val_offs * 4));
> }
>
> static void gfx_v9_0_ring_emit_wreg(struct amdgpu_ring *ring,
> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index 30b75d79efdb..ab960b052c0d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -415,6 +415,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
> {
> int vmid, i;
> signed long r;
> + unsigned long flags = 0;
> uint32_t seq;
> uint16_t queried_pasid;
> bool ret;
> @@ -422,20 +423,28 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
> struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>
> if (amdgpu_emu_mode == 0 && ring->sched.ready) {
> - spin_lock(&adev->gfx.kiq.ring_lock);
> + r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> + if (r) {
> + pr_err("failed to lock kiq\n");
> + return -ETIME;
> + }
> +
> /* 2 dwords flush + 8 dwords fence */
> amdgpu_ring_alloc(ring, kiq->pmf->invalidate_tlbs_size + 8);
> + amdgpu_gfx_kiq_consume(kiq, NULL);
> kiq->pmf->kiq_invalidate_tlbs(ring,
> pasid, flush_type, all_hub);
> amdgpu_fence_emit_polling(ring, &seq);
> amdgpu_ring_commit(ring);
> - spin_unlock(&adev->gfx.kiq.ring_lock);
> + amdgpu_gfx_kiq_unlock(kiq, &flags);
> r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
> if (r < 1) {
> DRM_ERROR("wait for kiq fence error: %ld.\n", r);
> + amdgpu_gfx_kiq_restore(kiq, false);
> return -ETIME;
> }
>
> + amdgpu_gfx_kiq_restore(kiq, NULL);
> return 0;
> }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index fecdbc471983..d14fd56a959e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -589,6 +589,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
> {
> int vmid, i;
> signed long r;
> + unsigned long flags = 0;
> uint32_t seq;
> uint16_t queried_pasid;
> bool ret;
> @@ -613,7 +614,13 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
> if (vega20_xgmi_wa)
> ndw += kiq->pmf->invalidate_tlbs_size;
>
> - spin_lock(&adev->gfx.kiq.ring_lock);
> + r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> + if (r) {
> + pr_err("failed to lock kiq\n");
> + return -ETIME;
> + }
> +
> + amdgpu_gfx_kiq_consume(kiq, NULL);
> /* 2 dwords flush + 8 dwords fence */
> amdgpu_ring_alloc(ring, ndw);
> if (vega20_xgmi_wa)
> @@ -623,13 +630,15 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
> pasid, flush_type, all_hub);
> amdgpu_fence_emit_polling(ring, &seq);
> amdgpu_ring_commit(ring);
> - spin_unlock(&adev->gfx.kiq.ring_lock);
> + amdgpu_gfx_kiq_unlock(kiq, &flags);
> r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
> if (r < 1) {
> DRM_ERROR("wait for kiq fence error: %ld.\n", r);
> + amdgpu_gfx_kiq_restore(kiq, NULL);
> return -ETIME;
> }
>
> + amdgpu_gfx_kiq_restore(kiq, NULL);
> return 0;
> }
>
More information about the amd-gfx
mailing list