[PATCH] drm/amdgpu: refine kiq read register
Tao, Yintian
Yintian.Tao at amd.com
Fri Apr 17 11:39:51 UTC 2020
Hi Christian
Can you help give more details about how this spm trace works
After review the gfx_v9_0_update_spm_vmid function, I think it is some confused.
For example:
It is assumed that there are two gfx job which can be submitted to gfx ring.
When second gfx job is submitted, the vmid of first gfx job write to mmRLC_SPM_MC_CNTL may be overwritten by the second gfx job vmid.
I am not sure whether it will raise problem.
Best Regards
Yintian Tao
-----Original Message-----
From: Liu, Monk <Monk.Liu at amd.com>
Sent: 2020年4月17日 17:40
To: Koenig, Christian <Christian.Koenig at amd.com>; Tao, Yintian <Yintian.Tao at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Ming, Davis <Davis.Ming at amd.com>; Jiang, Jerry (SW) <Jerry.Jiang at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: refine kiq read register
Hi Christian
mmRLC_SPM_MC_CNTL
this register is a RLC register, with my understanding it is PF&VF share register, and I did experiment proved it:
1) write abc to it in PF
2) read it from VF, it shows abc
3) write ff to it in VF, read it, it is still abc
So this register with current policy (L1) is a VF read, PF write register, and this register is physically shared among PF/VF
We should not even try to write it in VF side, no matter CPU or KIQ (KIQ write within VF role will also be blocked by the L1 policy)
From what I can see so far: we need to drop this feature for SRIOV, or we need to change Policy
+ at Ming, Davis and @Jiang, Jerry (SW) for awareness
DRM-NEXT kernel branch has a new feature to massively use KIQ to read/write this register " mmRLC_SPM_MC_CNTL" which is a PF w/r bug VF R only register.
We need to figure out what should we do on it
I will talk to UMD guys later (they initiated this feature in our kernel driver ) _____________________________________
Monk Liu|GPU Virtualization Team |AMD
-----Original Message-----
From: Koenig, Christian <Christian.Koenig at amd.com>
Sent: Friday, April 17, 2020 5:14 PM
To: Liu, Monk <Monk.Liu at amd.com>; Tao, Yintian <Yintian.Tao at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Cc: amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: refine kiq read register
> Dynamic alloc each time doing KIQ reg read is a overkill to me
Yeah, that is a rather good argument.
> Now we do KIQ read and write *every time* we do amdgpu_vm_flush
> (omg... what's this ??)
That is updating the VMID used for the SPM trace. And yes this read/modify/write is most likely not a good idea, we should rather just write the value we want to have or don't use the KIQ here.
Most likely the later because IIRC this is a per VF register.
Christian.
Am 17.04.20 um 11:06 schrieb Liu, Monk:
> Christian
>
> See we wanted to map the ring buffers read only and USWC for some time.
> That would result in either not working driver or rather crappy performance.
> <<
>
> For KIQ the ring buffer wouldn't be read only ... should be cacheable
> type
>
> Dynamic alloc each time doing KIQ reg read is a overkill to me, leverage ring buffer is a high efficient way.
>
> Besides looks now the KIQ register reading is really massive, check this code:
>
> 4949 static void gfx_v9_0_update_spm_vmid(struct amdgpu_device *adev,
> unsigned vmid)
> 4950 {
> 4951 u32 data;
> 4952
> 4953 data = RREG32_SOC15(GC, 0, mmRLC_SPM_MC_CNTL);
> 4954
> 4955 data &= ~RLC_SPM_MC_CNTL__RLC_SPM_VMID_MASK;
> 4956 data |= (vmid & RLC_SPM_MC_CNTL__RLC_SPM_VMID_MASK) << RLC_SPM_MC_CNTL__RLC_SPM_VMID__SHIFT;
> 4957
> 4958 WREG32_SOC15(GC, 0, mmRLC_SPM_MC_CNTL, data);
> 4959 }
>
> Now we do KIQ read and write *every time* we do amdgpu_vm_flush
> (omg... what's this ??)
>
>
>
> _____________________________________
> Monk Liu|GPU Virtualization Team |AMD
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: Friday, April 17, 2020 4:59 PM
> To: Liu, Monk <Monk.Liu at amd.com>; Tao, Yintian <Yintian.Tao at amd.com>;
> Kuehling, Felix <Felix.Kuehling at amd.com>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
> Cc: amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: refine kiq read register
>
> Looks like a rather important bug fix to me, but I'm not sure if writing the value into the ring buffer is a good idea.
>
> See we wanted to map the ring buffers read only and USWC for some time.
> That would result in either not working driver or rather crappy performance.
>
> Can't we just call amdgpu_device_wb_get() in amdgpu_device_wb_get() instead and allocate the wb address dynamically?
>
> Regards,
> Christian.
>
> Am 17.04.20 um 09:01 schrieb Liu, Monk:
>> The change Looks good with me, you can put my RB to your patch .
>>
>> Since this patch impact on general logic (not SRIOV only) I would
>> like you wait a little longer for @Kuehling, Felix and @Deucher,
>> Alexander and @Koenig, Christian @Zhang, Hawking
>>
>> If any of them gave you a RB I think we can go this way
>>
>> _____________________________________
>> Monk Liu|GPU Virtualization Team |AMD
>>
>>
>> -----Original Message-----
>> From: Yintian Tao <yttao at amd.com>
>> Sent: Friday, April 17, 2020 2:53 PM
>> To: Liu, Monk <Monk.Liu at amd.com>
>> Cc: amd-gfx at lists.freedesktop.org; Tao, Yintian <Yintian.Tao at amd.com>
>> Subject: [PATCH] drm/amdgpu: refine kiq read register
>>
>> According to the current kiq read register method, there will be race condition when using KIQ to read register if multiple clients want to read at same time just like the expample below:
>> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B poll the seqno-1 5. the kiq complete these two read operation 6. client-A to read the register at the wb buffer and
>> get REG-1 value
>>
>> Therefore, directly make kiq write the register value at the ring buffer then there will be no race condition for the wb buffer.
>>
>> v2: supply the read_clock and move the reg_val_offs back
>>
>> Signed-off-by: Yintian Tao <yttao at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 11 ++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +++--
>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 14 +++++-------
>> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 14 +++++-------
>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 28 ++++++++++++------------
>> 6 files changed, 33 insertions(+), 40 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index ea576b4260a4..4e1c0239e561 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -304,10 +304,6 @@ int amdgpu_gfx_kiq_init_ring(struct
>> amdgpu_device *adev,
>>
>> spin_lock_init(&kiq->ring_lock);
>>
>> - r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs);
>> - if (r)
>> - return r;
>> -
>> ring->adev = NULL;
>> ring->ring_obj = NULL;
>> ring->use_doorbell = true;
>> @@ -331,7 +327,6 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device
>> *adev,
>>
>> void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring) {
>> - amdgpu_device_wb_free(ring->adev, ring->adev->gfx.kiq.reg_val_offs);
>> amdgpu_ring_fini(ring);
>> }
>>
>> @@ -675,12 +670,14 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>> uint32_t seq;
>> struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>> struct amdgpu_ring *ring = &kiq->ring;
>> + uint64_t reg_val_offs = 0;
>>
>> BUG_ON(!ring->funcs->emit_rreg);
>>
>> spin_lock_irqsave(&kiq->ring_lock, flags);
>> amdgpu_ring_alloc(ring, 32);
>> - amdgpu_ring_emit_rreg(ring, reg);
>> + reg_val_offs = (ring->wptr & ring->buf_mask) + 30;
>> + amdgpu_ring_emit_rreg(ring, reg, reg_val_offs);
>> amdgpu_fence_emit_polling(ring, &seq);
>> amdgpu_ring_commit(ring);
>> spin_unlock_irqrestore(&kiq->ring_lock, flags); @@ -707,7 +704,7 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>> if (cnt > MAX_KIQ_REG_TRY)
>> goto failed_kiq_read;
>>
>> - return adev->wb.wb[kiq->reg_val_offs];
>> + return ring->ring[reg_val_offs];
>>
>> failed_kiq_read:
>> pr_err("failed to read reg:%x\n", reg); diff --git
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> index 634746829024..ee698f0246d8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> @@ -103,7 +103,6 @@ struct amdgpu_kiq {
>> struct amdgpu_ring ring;
>> struct amdgpu_irq_src irq;
>> const struct kiq_pm4_funcs *pmf;
>> - uint32_t reg_val_offs;
>> };
>>
>> /*
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index f61664ee4940..a3d88f2aa9f4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -181,7 +181,8 @@ struct amdgpu_ring_funcs {
>> void (*end_use)(struct amdgpu_ring *ring);
>> void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>> void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
>> - void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
>> + void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
>> + uint64_t reg_val_offs);
>> void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
>> void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg,
>> uint32_t val, uint32_t mask); @@ -265,7 +266,7 @@ struct
>> amdgpu_ring { #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r)) #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
>> #define amdgpu_ring_emit_cntxcntl(r, d)
>> (r)->funcs->emit_cntxcntl((r), (d)) -#define amdgpu_ring_emit_rreg(r,
>> d) (r)->funcs->emit_rreg((r), (d))
>> +#define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r),
>> +(d),
>> +(o))
>> #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r),
>> (d), (v)) #define amdgpu_ring_emit_reg_wait(r, d, v, m)
>> (r)->funcs->emit_reg_wait((r), (d), (v), (m)) #define
>> amdgpu_ring_emit_reg_write_reg_wait(r, d0, d1, v, m)
>> (r)->funcs->emit_reg_write_reg_wait((r), (d0), (d1), (v), (m)) diff
>> --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 0a03e2ad5d95..7c9a5e440509 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -7594,21 +7594,19 @@ static void gfx_v10_0_ring_emit_frame_cntl(struct amdgpu_ring *ring, bool start,
>> amdgpu_ring_write(ring, v | FRAME_CMD(start ? 0 : 1)); }
>>
>> -static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring,
>> uint32_t reg)
>> +static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
>> + uint64_t reg_val_offs)
>> {
>> - struct amdgpu_device *adev = ring->adev;
>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>> -
>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>> amdgpu_ring_write(ring, 0 | /* src: register*/
>> (5 << 8) | /* dst: memory */
>> (1 << 20)); /* write confirm */
>> amdgpu_ring_write(ring, reg);
>> amdgpu_ring_write(ring, 0);
>> - amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> - amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + amdgpu_ring_write(ring, lower_32_bits(ring->gpu_addr +
>> + reg_val_offs * 4));
>> + amdgpu_ring_write(ring, upper_32_bits(ring->gpu_addr +
>> + reg_val_offs * 4));
>> }
>>
>> static void gfx_v10_0_ring_emit_wreg(struct amdgpu_ring *ring,
>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> index fc6c2f2bc76c..8e7eee7838e0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> @@ -6383,21 +6383,19 @@ static void gfx_v8_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
>> ring->ring[offset] = (ring->ring_size >> 2) - offset + cur; }
>>
>> -static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring,
>> uint32_t reg)
>> +static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
>> + uint64_t reg_val_offs)
>> {
>> - struct amdgpu_device *adev = ring->adev;
>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>> -
>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>> amdgpu_ring_write(ring, 0 | /* src: register*/
>> (5 << 8) | /* dst: memory */
>> (1 << 20)); /* write confirm */
>> amdgpu_ring_write(ring, reg);
>> amdgpu_ring_write(ring, 0);
>> - amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> - amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + amdgpu_ring_write(ring, lower_32_bits(ring->gpu_addr +
>> + reg_val_offs * 4));
>> + amdgpu_ring_write(ring, upper_32_bits(ring->gpu_addr +
>> + reg_val_offs * 4));
>> }
>>
>> static void gfx_v8_0_ring_emit_wreg(struct amdgpu_ring *ring,
>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 84fcf842316d..ff279b1f5c24 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -4046,11 +4046,13 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
>> uint32_t seq;
>> struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>> struct amdgpu_ring *ring = &kiq->ring;
>> + uint64_t reg_val_offs = 0;
>>
>> BUG_ON(!ring->funcs->emit_rreg);
>>
>> spin_lock_irqsave(&kiq->ring_lock, flags);
>> amdgpu_ring_alloc(ring, 32);
>> + reg_val_offs = (ring->wptr & ring->buf_mask) + 30;
>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>> amdgpu_ring_write(ring, 9 | /* src: register*/
>> (5 << 8) | /* dst: memory */
>> @@ -4058,10 +4060,10 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
>> (1 << 20)); /* write confirm */
>> amdgpu_ring_write(ring, 0);
>> amdgpu_ring_write(ring, 0);
>> - amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> - amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + amdgpu_ring_write(ring, lower_32_bits(ring->gpu_addr +
>> + reg_val_offs * 4));
>> + amdgpu_ring_write(ring, upper_32_bits(ring->gpu_addr +
>> + reg_val_offs * 4));
>> amdgpu_fence_emit_polling(ring, &seq);
>> amdgpu_ring_commit(ring);
>> spin_unlock_irqrestore(&kiq->ring_lock, flags); @@ -4088,8 +4090,8 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
>> if (cnt > MAX_KIQ_REG_TRY)
>> goto failed_kiq_read;
>>
>> - return (uint64_t)adev->wb.wb[kiq->reg_val_offs] |
>> - (uint64_t)adev->wb.wb[kiq->reg_val_offs + 1 ] << 32ULL;
>> + return (uint64_t)ring->ring[reg_val_offs] |
>> + (uint64_t)ring->ring[reg_val_offs + 1 ] << 32ULL;
>>
>> failed_kiq_read:
>> pr_err("failed to read gpu clock\n"); @@ -5482,21 +5484,19 @@
>> static void gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
>> ring->ring[offset] = (ring->ring_size>>2) - offset + cur; }
>>
>> -static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring,
>> uint32_t reg)
>> +static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
>> + uint64_t reg_val_offs)
>> {
>> - struct amdgpu_device *adev = ring->adev;
>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>> -
>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>> amdgpu_ring_write(ring, 0 | /* src: register*/
>> (5 << 8) | /* dst: memory */
>> (1 << 20)); /* write confirm */
>> amdgpu_ring_write(ring, reg);
>> amdgpu_ring_write(ring, 0);
>> - amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> - amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> - kiq->reg_val_offs * 4));
>> + amdgpu_ring_write(ring, lower_32_bits(ring->gpu_addr +
>> + reg_val_offs * 4));
>> + amdgpu_ring_write(ring, upper_32_bits(ring->gpu_addr +
>> + reg_val_offs * 4));
>> }
>>
>> static void gfx_v9_0_ring_emit_wreg(struct amdgpu_ring *ring,
>> uint32_t reg,
>> --
>> 2.17.1
>>
More information about the amd-gfx
mailing list