[PATCH v2 2/2] drm/amdgpu: Don't flush HDP on A+A

Christian König ckoenig.leichtzumerken at gmail.com
Tue Jun 1 17:38:01 UTC 2021


Am 01.06.21 um 16:45 schrieb Eric Huang:
>
> On 2021-06-01 3:05 a.m., Christian König wrote:
>> Am 01.06.21 um 02:06 schrieb Eric Huang:
>>> With XGMI connection flushing HDP on PCIe is unnecessary,
>>> it is also to optimize memory allocation latency.
>>>
>>> Signed-off-by: Eric Huang <jinhuieric.huang at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h | 1 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 3 ++-
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 1 +
>>>   drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c   | 3 +++
>>>   4 files changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h
>>> index 7ec99d591584..1ca23f2f51d2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h
>>> @@ -44,6 +44,7 @@ struct amdgpu_hdp {
>>>       struct ras_common_if            *ras_if;
>>>       const struct amdgpu_hdp_funcs        *funcs;
>>>       const struct amdgpu_hdp_ras_funcs    *ras_funcs;
>>> +    bool    no_flush;
>>>   };
>>>     int amdgpu_hdp_ras_late_init(struct amdgpu_device *adev);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> index aaa2574ce9bc..f31eae2931f3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> @@ -226,7 +226,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, 
>>> unsigned num_ibs,
>>>       if (!(adev->flags & AMD_IS_APU))
>>>   #endif
>>>       {
>>> -        if (ring->funcs->emit_hdp_flush)
>>> +        if (ring->funcs->emit_hdp_flush &&
>>> +            !adev->hdp.no_flush)
>>
>> This still emits the flush through MMIO.
>
> As matter of fact, it doesn't, because amdgpu_asic_flush_hdp will 
> check the flag again in hdp_v4_0.c. I even think the check here is 
> unnecessary for previous asic, because asic other than Aldeberan A+A 
> has to flush tlbs according to hardware specific.

And exactly that is what I'm trying to explain here.

Skipping HDP flushing is something we do for APUs for over 10 years now, 
that is pretty common on other ASICs and not Aldeberan specific at all.

Aldeberan is just the first dGPU we skip this flushing because it has an 
XGMI connection to the CPU.

>>
>> What you need to do is to initialize the hdp.no_flush field for all 
>> asics and architectures and then use that here in the if above this one.
>
> I don't understand why it should be for all asics. Currently only 
> Aldeberan A+A need no TLB flush, we don't have to consider other 
> asics. And hdp.no_flush is a common flag for all asics, Is there an 
> issue in other asics for this flag on tlb flush?

See the code in the IB handling as well. What you should do is to 
generalize this code and not re-invent the wheel specific for Aldeberan.

In other words modify or extend the existing code which skips HDP 
flushing/invalidation.

Cleaning that up by putting the no_flush into the hdp structure sounds 
like a good start to me.

>>
>>> amdgpu_ring_emit_hdp_flush(ring);
>>>           else
>>>               amdgpu_asic_flush_hdp(adev, ring);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index 2749621d5f63..6e1eab615914 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -1223,6 +1223,7 @@ static int gmc_v9_0_early_init(void *handle)
>>>           adev->gmc.xgmi.supported = true;
>>>           adev->gmc.xgmi.connected_to_cpu =
>>> adev->smuio.funcs->is_host_gpu_xgmi_supported(adev);
>>> +        adev->hdp.no_flush = adev->gmc.xgmi.connected_to_cpu;
>>>       }
>>>         gmc_v9_0_set_gmc_funcs(adev);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c
>>> index 74b90cc2bf48..e1b2face8656 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c
>>> @@ -40,6 +40,9 @@
>>>   static void hdp_v4_0_flush_hdp(struct amdgpu_device *adev,
>>>                   struct amdgpu_ring *ring)
>>>   {
>>> +    if (adev->hdp.no_flush)
>>> +        return;
>>> +
>>
>> Just to be clear once more, this approach is a NAK.
>>
>> Checks like this should not be in the hardware specific function.
>
> As I mention above, TLB flush should be specific for Asic for my opinion.

No, that is certainly wrong. This is btw not related to the TLB in any 
way, the HDP is part of the PCIe bus interface.

Regards,
Christian.

>
> Regards,
> Eric
>>
>> Regards,
>> Christian.
>>
>>>       if (!ring || !ring->funcs->emit_wreg)
>>>           WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + 
>>> KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
>>>       else
>>
>



More information about the amd-gfx mailing list