[PATCH v2] drm/amdgpu: always use legacy tlb flush on cyan_skilfish

Fri Sep 15 10:19:13 UTC 2023

Am 15.09.23 um 10:53 schrieb Lang Yu:
> On 09/14/ , Felix Kuehling wrote:
>> On 2023-09-14 10:02, Christian König wrote:
> Do we still need to use legacy flush to emulate heavyweight flush
> if we don't use SVM? And can I push this now?

Felix needs to decide that. From what I understand the KFD needs 
heavyweight flushes for secure SVM operation.

If heavyweight flushes are buggy papering over that by using legacy 
flushes is only a mediocre workaround.

Regards,
Christian.

>
> Regards,
> Lang
>
>
>>> Am 14.09.23 um 15:59 schrieb Felix Kuehling:
>>>> On 2023-09-14 9:39, Christian König wrote:
>>>>> Is a single legacy flush sufficient to emulate an heavyweight
>>>>> flush as well?
>>>>>
>>>>> On previous generations we needed to issue at least two legacy
>>>>> flushes for this.
>>>> I assume you are referring to the Vega20 XGMI workaround. That is a
>>>> very different issue. Because PTEs would be cached in L2, we had to
>>>> always use a heavy-weight flush that would also flush the L2 cache
>>>> as well, and follow that with another legacy flush to deal with race
>>>> conditions where stale PTEs could be re-fetched from L2 before the
>>>> L2 flush was complete.
>>> No, we also have another (badly documented) workaround which issues a
>>> legacy flush before each heavy weight on some hw generations. See the my
>>> TLB flush cleanup patches.
>>>
>>>> A heavy-weight flush guarantees that there are no more possible
>>>> memory accesses using the old PTEs. With physically addressed caches
>>>> on GFXv9 that includes a cache flush because the address translation
>>>> happened before putting data into the cache. I think the address
>>>> translation and cache architecture works differently on GFXv10. So
>>>> maybe the cache-flush is not required here.
>>>>
>>>> But even then a legacy flush probably allows for in-flight memory
>>>> accesses with old physical addresses to complete after the TLB
>>>> flush. So there is a small risk of memory corruption that was
>>>> assumed to not be accessed by the GPU any more. Or when using IOMMU
>>>> device isolation it would result in IOMMU faults if the DMA mappings
>>>> are invalidated slightly too early.
>>> Mhm, that's quite bad. Any idea how to avoid that?
>> A few ideas
>>
>>   * Add an arbitrary delay and hope that it is longer than the FIFOs in
>>     the HW
>>   * Execute an atomic operation to memory on some GPU engine that could
>>     act as a fence, maybe just a RELEASE_MEM on the CP to some writeback
>>     location would do the job
>>   * If needed, RELEASE_MEM could also perform a cache flush
>>
>> Regards,
>>    Felix
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>>    Felix
>>>>
>>>>
>>>>> And please don't push before getting an rb from Felix as well.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>
>>>>> Am 14.09.23 um 11:23 schrieb Lang Yu:
>>>>>> cyan_skilfish has problems with other flush types.
>>>>>>
>>>>>> v2: fix incorrect ternary conditional operator usage.(Yifan)
>>>>>>
>>>>>> Signed-off-by: Lang Yu <Lang.Yu at amd.com>
>>>>>> Cc: <stable at vger.kernel.org> # v5.15+
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++-
>>>>>>    1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> index d3da13f4c80e..c6d11047169a 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> @@ -236,7 +236,8 @@ static void
>>>>>> gmc_v10_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t
>>>>>> vmid,
>>>>>>    {
>>>>>>        bool use_semaphore =
>>>>>> gmc_v10_0_use_invalidate_semaphore(adev, vmhub);
>>>>>>        struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
>>>>>> -    u32 inv_req =
>>>>>> hub->vmhub_funcs->get_invalidate_req(vmid, flush_type);
>>>>>> +    u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,
>>>>>> +              (adev->asic_type != CHIP_CYAN_SKILLFISH) ?
>>>>>> flush_type : 0);
>>>>>>        u32 tmp;
>>>>>>        /* Use register 17 for GART */
>>>>>>        const unsigned int eng = 17;
>>>>>> @@ -331,6 +332,8 @@ static void
>>>>>> gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t
>>>>>> vmid,
>>>>>>          int r;
>>>>>>    +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH)
>>>>>> ? flush_type : 0;
>>>>>> +
>>>>>>        /* flush hdp cache */
>>>>>>        adev->hdp.funcs->flush_hdp(adev, NULL);
>>>>>>    @@ -426,6 +429,8 @@ static int
>>>>>> gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>>>>>>        struct amdgpu_ring *ring = &adev->gfx.kiq[0].ring;
>>>>>>        struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
>>>>>>    +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH)
>>>>>> ? flush_type : 0;
>>>>>> +
>>>>>>        if (amdgpu_emu_mode == 0 && ring->sched.ready) {
>>>>>>            spin_lock(&adev->gfx.kiq[0].ring_lock);
>>>>>>            /* 2 dwords flush + 8 dwords fence */