[PATCH v2] drm/amdgpu: always use legacy tlb flush on cyan_skilfish
Felix Kuehling
felix.kuehling at amd.com
Thu Sep 14 14:17:59 UTC 2023
On 2023-09-14 10:02, Christian König wrote:
>
>
> Am 14.09.23 um 15:59 schrieb Felix Kuehling:
>>
>> On 2023-09-14 9:39, Christian König wrote:
>>> Is a single legacy flush sufficient to emulate an heavyweight flush
>>> as well?
>>>
>>> On previous generations we needed to issue at least two legacy
>>> flushes for this.
>> I assume you are referring to the Vega20 XGMI workaround. That is a
>> very different issue. Because PTEs would be cached in L2, we had to
>> always use a heavy-weight flush that would also flush the L2 cache as
>> well, and follow that with another legacy flush to deal with race
>> conditions where stale PTEs could be re-fetched from L2 before the L2
>> flush was complete.
>
> No, we also have another (badly documented) workaround which issues a
> legacy flush before each heavy weight on some hw generations. See the
> my TLB flush cleanup patches.
>
>>
>> A heavy-weight flush guarantees that there are no more possible
>> memory accesses using the old PTEs. With physically addressed caches
>> on GFXv9 that includes a cache flush because the address translation
>> happened before putting data into the cache. I think the address
>> translation and cache architecture works differently on GFXv10. So
>> maybe the cache-flush is not required here.
>>
>> But even then a legacy flush probably allows for in-flight memory
>> accesses with old physical addresses to complete after the TLB flush.
>> So there is a small risk of memory corruption that was assumed to not
>> be accessed by the GPU any more. Or when using IOMMU device isolation
>> it would result in IOMMU faults if the DMA mappings are invalidated
>> slightly too early.
>
> Mhm, that's quite bad. Any idea how to avoid that?
A few ideas
* Add an arbitrary delay and hope that it is longer than the FIFOs in
the HW
* Execute an atomic operation to memory on some GPU engine that could
act as a fence, maybe just a RELEASE_MEM on the CP to some writeback
location would do the job
* If needed, RELEASE_MEM could also perform a cache flush
Regards,
Felix
>
> Regards,
> Christian.
>
>>
>> Regards,
>> Felix
>>
>>
>>>
>>> And please don't push before getting an rb from Felix as well.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>> Am 14.09.23 um 11:23 schrieb Lang Yu:
>>>> cyan_skilfish has problems with other flush types.
>>>>
>>>> v2: fix incorrect ternary conditional operator usage.(Yifan)
>>>>
>>>> Signed-off-by: Lang Yu <Lang.Yu at amd.com>
>>>> Cc: <stable at vger.kernel.org> # v5.15+
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++-
>>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>> index d3da13f4c80e..c6d11047169a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>> @@ -236,7 +236,8 @@ static void gmc_v10_0_flush_vm_hub(struct
>>>> amdgpu_device *adev, uint32_t vmid,
>>>> {
>>>> bool use_semaphore = gmc_v10_0_use_invalidate_semaphore(adev,
>>>> vmhub);
>>>> struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
>>>> - u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,
>>>> flush_type);
>>>> + u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,
>>>> + (adev->asic_type != CHIP_CYAN_SKILLFISH) ?
>>>> flush_type : 0);
>>>> u32 tmp;
>>>> /* Use register 17 for GART */
>>>> const unsigned int eng = 17;
>>>> @@ -331,6 +332,8 @@ static void gmc_v10_0_flush_gpu_tlb(struct
>>>> amdgpu_device *adev, uint32_t vmid,
>>>> int r;
>>>> + flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) ?
>>>> flush_type : 0;
>>>> +
>>>> /* flush hdp cache */
>>>> adev->hdp.funcs->flush_hdp(adev, NULL);
>>>> @@ -426,6 +429,8 @@ static int
>>>> gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>>>> struct amdgpu_ring *ring = &adev->gfx.kiq[0].ring;
>>>> struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
>>>> + flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) ?
>>>> flush_type : 0;
>>>> +
>>>> if (amdgpu_emu_mode == 0 && ring->sched.ready) {
>>>> spin_lock(&adev->gfx.kiq[0].ring_lock);
>>>> /* 2 dwords flush + 8 dwords fence */
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230914/c5387d9a/attachment.htm>
More information about the amd-gfx
mailing list