[PATCH v2] drm/amdgpu: always use legacy tlb flush on cyan_skilfish

Thu Sep 14 14:02:56 UTC 2023

Am 14.09.23 um 15:59 schrieb Felix Kuehling:
>
> On 2023-09-14 9:39, Christian König wrote:
>> Is a single legacy flush sufficient to emulate an heavyweight flush 
>> as well?
>>
>> On previous generations we needed to issue at least two legacy 
>> flushes for this.
> I assume you are referring to the Vega20 XGMI workaround. That is a 
> very different issue. Because PTEs would be cached in L2, we had to 
> always use a heavy-weight flush that would also flush the L2 cache as 
> well, and follow that with another legacy flush to deal with race 
> conditions where stale PTEs could be re-fetched from L2 before the L2 
> flush was complete.

No, we also have another (badly documented) workaround which issues a 
legacy flush before each heavy weight on some hw generations. See the my 
TLB flush cleanup patches.

>
> A heavy-weight flush guarantees that there are no more possible memory 
> accesses using the old PTEs. With physically addressed caches on GFXv9 
> that includes a cache flush because the address translation happened 
> before putting data into the cache. I think the address translation 
> and cache architecture works differently on GFXv10. So maybe the 
> cache-flush is not required here.
>
> But even then a legacy flush probably allows for in-flight memory 
> accesses with old physical addresses to complete after the TLB flush. 
> So there is a small risk of memory corruption that was assumed to not 
> be accessed by the GPU any more. Or when using IOMMU device isolation 
> it would result in IOMMU faults if the DMA mappings are invalidated 
> slightly too early.

Mhm, that's quite bad. Any idea how to avoid that?

Regards,
Christian.

>
> Regards,
>   Felix
>
>
>>
>> And please don't push before getting an rb from Felix as well.
>>
>> Regards,
>> Christian.
>>
>>
>> Am 14.09.23 um 11:23 schrieb Lang Yu:
>>> cyan_skilfish has problems with other flush types.
>>>
>>> v2: fix incorrect ternary conditional operator usage.(Yifan)
>>>
>>> Signed-off-by: Lang Yu <Lang.Yu at amd.com>
>>> Cc: <stable at vger.kernel.org> # v5.15+
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++-
>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> index d3da13f4c80e..c6d11047169a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> @@ -236,7 +236,8 @@ static void gmc_v10_0_flush_vm_hub(struct 
>>> amdgpu_device *adev, uint32_t vmid,
>>>   {
>>>       bool use_semaphore = gmc_v10_0_use_invalidate_semaphore(adev, 
>>> vmhub);
>>>       struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
>>> -    u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid, 
>>> flush_type);
>>> +    u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,
>>> +              (adev->asic_type != CHIP_CYAN_SKILLFISH) ? flush_type 
>>> : 0);
>>>       u32 tmp;
>>>       /* Use register 17 for GART */
>>>       const unsigned int eng = 17;
>>> @@ -331,6 +332,8 @@ static void gmc_v10_0_flush_gpu_tlb(struct 
>>> amdgpu_device *adev, uint32_t vmid,
>>>         int r;
>>>   +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) ? 
>>> flush_type : 0;
>>> +
>>>       /* flush hdp cache */
>>>       adev->hdp.funcs->flush_hdp(adev, NULL);
>>>   @@ -426,6 +429,8 @@ static int 
>>> gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>>>       struct amdgpu_ring *ring = &adev->gfx.kiq[0].ring;
>>>       struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
>>>   +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) ? 
>>> flush_type : 0;
>>> +
>>>       if (amdgpu_emu_mode == 0 && ring->sched.ready) {
>>>           spin_lock(&adev->gfx.kiq[0].ring_lock);
>>>           /* 2 dwords flush + 8 dwords fence */
>>