[PATCH] drm/amdgpu: request reg_val_offs each kiq read reg
Liu, Shaoyun
Shaoyun.Liu at amd.com
Wed Apr 22 15:51:03 UTC 2020
[AMD Official Use Only - Internal Distribution Only]
OK, I see, the submission it self be signaled so the ring space for this submission will be re-use by other submission , but the CPU still not read the out put value yet.
Thanks
Shaoyun.liu
-----Original Message-----
From: Tao, Yintian <Yintian.Tao at amd.com>
Sent: Wednesday, April 22, 2020 11:47 AM
To: Tao, Yintian <Yintian.Tao at amd.com>; Liu, Shaoyun <Shaoyun.Liu at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: request reg_val_offs each kiq read reg
Add more
Especially for the multi-VF environment, we have to wait through msleep() instead udeay.
Because the max udelay time is 15VF * 6ms(world-switch) = 90ms.
-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Tao, Yintian
Sent: 2020年4月22日 23:43
To: Liu, Shaoyun <Shaoyun.Liu at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: request reg_val_offs each kiq read reg
Hi Shaoyun
No, the second patch can't solve this rare case because only Slot-D is signaled and the Slot-A can be overwritten.
The second patch think the sequence is signaled the Slot-A buffer can be freed.
if you store the output value in each ring buffer itself , each kiq operation will be atomic and self contain .
[yttao]: If we wan to really make the kiq operation be atomic then we have to do the things below:
spin_lock_irqsave(&kiq->ring_lock, flags); .....
Fulfill the command buffer
.....
if (r < 1 && (adev->in_gpu_reset || in_interrupt()))
goto failed_kiq_write;
might_sleep();
while (r < 1 && cnt++ < MAX_KIQ_REG_TRY) {
msleep(MAX_KIQ_REG_BAILOUT_INTERVAL); /* here will break atomic and we need directly use udealy*/
r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
}
spin_lock_irqrestore(&kiq->ring_lock, flags);
Best Regards
Yintian Tao
-----Original Message-----
From: Liu, Shaoyun <Shaoyun.Liu at amd.com>
Sent: 2020年4月22日 23:35
To: Tao, Yintian <Yintian.Tao at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: request reg_val_offs each kiq read reg
[AMD Official Use Only - Internal Distribution Only]
This is the issue you try to solve with your second patch (protect kiq overrun) . For current patch , if you store the output value in each ring buffer itself , each kiq operation will be atomic and self contain .
Shaoyun.liu
-----Original Message-----
From: Tao, Yintian <Yintian.Tao at amd.com>
Sent: Wednesday, April 22, 2020 11:00 AM
To: Koenig, Christian <Christian.Koenig at amd.com>; Liu, Shaoyun <Shaoyun.Liu at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: request reg_val_offs each kiq read reg
Hi Shaoyun
There is one rare corner case which will raise problem when using ring buffer to store value.
It is assumed there are only total four slots at KIQ ring buffer.
And these four slots are fulfilled with command to read registers. Slot-A Slot-B Slot-C Slot-D
And they are waiting for the sequence fences to be signaled. Here, there is one new command to write register to be submitted
1. Slot-A under msleep not to read register 2. Slot-B under msleep not to read register 3. Slot-C under msleep not to read register.
4. Slot-D happen to find the sequence signaled and here the new write command will overwrite the Slot-A contents.
Best Regards
Yintian Tao
-----Original Message-----
From: Koenig, Christian <Christian.Koenig at amd.com>
Sent: 2020年4月22日 22:52
To: Liu, Shaoyun <Shaoyun.Liu at amd.com>; Tao, Yintian <Yintian.Tao at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: request reg_val_offs each kiq read reg
Hi Shaoyun,
the ring buffer is usually filled with command and not read results.
Allocating extra space would only work if we use the special NOP command and that is way more complicated and fragile than just using the wb functions which where made for this stuff.
Regards,
Christian.
Am 22.04.20 um 16:48 schrieb Liu, Shaoyun:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi ,Yintian & Christian
> I still don't understand why we need this complicated change here . Why can not just allocate few more extra space in the ring for each read and use the space to store the output value ?
>
> Regards
> Shaoyun.liu
>
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> Christian König
> Sent: Wednesday, April 22, 2020 8:42 AM
> To: Tao, Yintian <Yintian.Tao at amd.com>; Liu, Monk <Monk.Liu at amd.com>;
> Kuehling, Felix <Felix.Kuehling at amd.com>
> Cc: amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: request reg_val_offs each kiq read
> reg
>
> Am 22.04.20 um 14:36 schrieb Yintian Tao:
>> According to the current kiq read register method, there will be race
>> condition when using KIQ to read register if multiple clients want to
>> read at same time just like the expample below:
>> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the
>> seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B poll
>> the seqno-1 5. the kiq complete these two read operation 6. client-A
>> to read the register at the wb buffer and
>> get REG-1 value
>>
>> Therefore, use amdgpu_device_wb_get() to request reg_val_offs for
>> each kiq read register.
>>
>> v2: fix the error remove
>>
>> Signed-off-by: Yintian Tao <yttao at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 19 ++++++++++-------
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 -
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +++--
>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 7 +++---
>> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 +++---
>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 27 ++++++++++++++++--------
>> 7 files changed, 41 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 4e1d4cfe7a9f..7ee5a4da398a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -526,7 +526,7 @@ static inline void amdgpu_set_ib_value(struct amdgpu_cs_parser *p,
>> /*
>> * Writeback
>> */
>> -#define AMDGPU_MAX_WB 128 /* Reserve at most 128 WB slots for amdgpu-owned rings. */
>> +#define AMDGPU_MAX_WB 256 /* Reserve at most 256 WB slots for amdgpu-owned rings. */
>>
>> struct amdgpu_wb {
>> struct amdgpu_bo *wb_obj;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index ea576b4260a4..d5a59d7c48d6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -304,10 +304,6 @@ int amdgpu_gfx_kiq_init_ring(struct
>> amdgpu_device *adev,
>>
>> spin_lock_init(&kiq->ring_lock);
>>
>> - r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs);
>> - if (r)
>> - return r;
>> -
>> ring->adev = NULL;
>> ring->ring_obj = NULL;
>> ring->use_doorbell = true;
>> @@ -331,7 +327,6 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device
>> *adev,
>>
>> void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring)
>> {
>> - amdgpu_device_wb_free(ring->adev, ring->adev->gfx.kiq.reg_val_offs);
>> amdgpu_ring_fini(ring);
>> }
>>
>> @@ -672,15 +667,20 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>> {
>> signed long r, cnt = 0;
>> unsigned long flags;
>> - uint32_t seq;
>> + uint32_t seq, reg_val_offs = 0, value = 0;
>> struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>> struct amdgpu_ring *ring = &kiq->ring;
>>
>> BUG_ON(!ring->funcs->emit_rreg);
>>
>> spin_lock_irqsave(&kiq->ring_lock, flags);
>> + if (amdgpu_device_wb_get(adev, ®_val_offs)) {
>> + spin_unlock_irqrestore(&kiq->ring_lock, flags);
>> + pr_err("critical bug! too more kiq readers\n");
> Typo here, this should probably read "too many kiq readers".
>
> But I don't think we need the error message here anyway, the failed_kiq_read label also prints an error.
>
> With that fixed the patch is Reviewed-by: Christian König <christian.koenig at amd.com>.
>
> Thanks,
> Christian.
>
>> + goto failed_kiq_read;
>> + }
>> amdgpu_ring_alloc(ring, 32);
>> - amdgpu_ring_emit_rreg(ring, reg);
>> + amdgpu_ring_emit_rreg(ring, reg, reg_val_offs);
>> amdgpu_fence_emit_polling(ring, &seq);
>> amdgpu_ring_commit(ring);
>> spin_unlock_irqrestore(&kiq->ring_lock, flags); @@ -707,7 +707,10
>> @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>> if (cnt > MAX_KIQ_REG_TRY)
>> goto failed_kiq_read;
>>
>> - return adev->wb.wb[kiq->reg_val_offs];
>> + mb();
>> + value = adev->wb.wb[reg_val_offs];
>> + amdgpu_device_wb_free(adev, reg_val_offs);
>> + return value;
>>
>> failed_kiq_read:
>> pr_err("failed to read reg:%x\n", reg); diff --git
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> index 634746829024..ee698f0246d8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> @@ -103,7 +103,6 @@ struct amdgpu_kiq {
>> struct amdgpu_ring ring;
>> struct amdgpu_irq_src irq;
>> const struct kiq_pm4_funcs *pmf;
>> - uint32_t reg_val_offs;
>> };
>>
>> /*
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index f61664ee4940..137d3d2b46e8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -181,7 +181,8 @@ struct amdgpu_ring_funcs {
>> void (*end_use)(struct amdgpu_ring *ring);
>> void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>> void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
>> - void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
>> + void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
>> + uint32_t reg_val_offs);
>> void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
>> void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg,
>> uint32_t val, uint32_t mask); @@ -265,7 +266,7 @@ struct
>> amdgpu_ring {
>> #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
>> #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
>> #define amdgpu_ring_emit_cntxcntl(r, d)
>> (r)->funcs->emit_cntxcntl((r), (d)) -#define amdgpu_ring_emit_rreg(r,
>> d) (r)->funcs->emit_rreg((r), (d))
>> +#define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r),
>> +(d), (o))
>> #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
>> #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))
>> #define amdgpu_ring_emit_reg_write_reg_wait(r, d0, d1, v, m)
>> (r)->funcs->emit_reg_write_reg_wait((r), (d0), (d1), (v), (m)) diff
>> --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 0a03e2ad5d95..0a3d3bc9d441 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -7594,7 +7594,8 @@ static void gfx_v10_0_ring_emit_frame_cntl(struct amdgpu_ring *ring, bool start,
>> amdgpu_ring_write(ring, v | FRAME_CMD(start ? 0 : 1));
>> }
>>
>> -static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring,
>> uint32_t reg)
>> +static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
>> + uint32_t reg_val_offs)
>> {
>> struct amdgpu_device *adev = ring->adev;
>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; @@ -7606,9 +7607,9 @@
>> static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
>> amdgpu_ring_write(ring, reg);
>> amdgpu_ring_write(ring, 0);
>> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + reg_val_offs * 4));
>> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + reg_val_offs * 4));
>> }
>>
>> static void gfx_v10_0_ring_emit_wreg(struct amdgpu_ring *ring,
>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> index fc6c2f2bc76c..ccfa65c3cb5a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> @@ -6383,7 +6383,8 @@ static void gfx_v8_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
>> ring->ring[offset] = (ring->ring_size >> 2) - offset + cur;
>> }
>>
>> -static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring,
>> uint32_t reg)
>> +static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
>> + uint32_t reg_val_offs)
>> {
>> struct amdgpu_device *adev = ring->adev;
>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; @@ -6395,9 +6396,9 @@
>> static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
>> amdgpu_ring_write(ring, reg);
>> amdgpu_ring_write(ring, 0);
>> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + reg_val_offs * 4));
>> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + reg_val_offs * 4));
>> }
>>
>> static void gfx_v8_0_ring_emit_wreg(struct amdgpu_ring *ring,
>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 84fcf842316d..0b706f908d9a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -4043,13 +4043,19 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
>> {
>> signed long r, cnt = 0;
>> unsigned long flags;
>> - uint32_t seq;
>> + uint32_t seq, reg_val_offs = 0;
>> + uint64_t value = 0;
>> struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>> struct amdgpu_ring *ring = &kiq->ring;
>>
>> BUG_ON(!ring->funcs->emit_rreg);
>>
>> spin_lock_irqsave(&kiq->ring_lock, flags);
>> + if (amdgpu_device_wb_get(adev, ®_val_offs)) {
>> + spin_unlock_irqrestore(&kiq->ring_lock, flags);
>> + pr_err("critical bug! too more kiq readers\n");
>> + goto failed_kiq_read;
>> + }
>> amdgpu_ring_alloc(ring, 32);
>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>> amdgpu_ring_write(ring, 9 | /* src: register*/
>> @@ -4059,9 +4065,9 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
>> amdgpu_ring_write(ring, 0);
>> amdgpu_ring_write(ring, 0);
>> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + reg_val_offs * 4));
>> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + reg_val_offs * 4));
>> amdgpu_fence_emit_polling(ring, &seq);
>> amdgpu_ring_commit(ring);
>> spin_unlock_irqrestore(&kiq->ring_lock, flags); @@ -4088,8
>> +4094,11 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct
>> +amdgpu_device *adev)
>> if (cnt > MAX_KIQ_REG_TRY)
>> goto failed_kiq_read;
>>
>> - return (uint64_t)adev->wb.wb[kiq->reg_val_offs] |
>> - (uint64_t)adev->wb.wb[kiq->reg_val_offs + 1 ] << 32ULL;
>> + mb();
>> + value = (uint64_t)adev->wb.wb[reg_val_offs] |
>> + (uint64_t)adev->wb.wb[reg_val_offs + 1 ] << 32ULL;
>> + amdgpu_device_wb_free(adev, reg_val_offs);
>> + return value;
>>
>> failed_kiq_read:
>> pr_err("failed to read gpu clock\n"); @@ -5482,10 +5491,10 @@
>> static void gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
>> ring->ring[offset] = (ring->ring_size>>2) - offset + cur;
>> }
>>
>> -static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring,
>> uint32_t reg)
>> +static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
>> + uint32_t reg_val_offs)
>> {
>> struct amdgpu_device *adev = ring->adev;
>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>
>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>> amdgpu_ring_write(ring, 0 | /* src: register*/
>> @@ -5494,9 +5503,9 @@ static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
>> amdgpu_ring_write(ring, reg);
>> amdgpu_ring_write(ring, 0);
>> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + reg_val_offs * 4));
>> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + reg_val_offs * 4));
>> }
>>
>> static void gfx_v9_0_ring_emit_wreg(struct amdgpu_ring *ring,
>> uint32_t reg,
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CSh
> aoyun.Liu%40amd.com%7C9d960d5be3f141f44f0608d7e6ba8962%7C3dd8961fe4884
> e608e11a82d994e183d%7C0%7C0%7C637231561176171819&sdata=OnDtxn%2FTP
> nmzigVKJEAF37fQdUcfDMWB1xcbrQbDcIc%3D&reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cyintian.tao%40amd.com%7C28627a9c362245c1969d08d7e6d3d568%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637231670346872304&sdata=o0%2FdGt1sqir2eVGTf1JdamDSNvJnbMY5WvyxWa1mUUU%3D&reserved=0
More information about the amd-gfx
mailing list