[RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available
Christian König
christian.koenig at amd.com
Tue Jul 2 08:10:36 UTC 2024
Am 02.07.24 um 10:08 schrieb Christian König:
> Am 02.07.24 um 03:46 schrieb Icenowy Zheng:
>> 在 2024-07-01星期一的 13:40 +0200,Christian König写道:
>>> Am 29.06.24 um 22:51 schrieb Icenowy Zheng:
>>>> 于 2024年6月30日 GMT+08:00 03:57:47,Jiaxun Yang
>>>> <jiaxun.yang at flygoat.com> 写道:
>>>>> 在2024年6月29日六月 上午6:22,Icenowy Zheng写道:
>>>>> [...]
>>>>>> @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct
>>>>>> ttm_buffer_object *bo,
>>>>>> struct ttm_resource *res,
>>>>>> caching = res->bus.caching;
>>>>>> }
>>>>>>
>>>>>> + /* Downgrade cached mapping for non-snooping devices */
>>>>>> + if (!bo->bdev->dma_coherent && caching == ttm_cached)
>>>>>> + caching = ttm_write_combined;
>>>>> Hi Icenowy,
>>>>>
>>>>> Thanks for your patch! You saved many non-coh PCIe host
>>>>> implementations a day!.
>>> Ah, wait a second.
>>>
>>> Such a thing as non-coherent PCIe implementation doesn't exist. The
>>> PCIe
>>> specification makes it mandatory for memory access to be cache
>>> coherent.
>> Really? I tried to get PCIe spec 2.0, PCI spec 3.0 and PCI-X addendum
>> 1.0, none of this explicitly requires the PCIe controller and the CPU
>> being fully coherent. The PCI-X spec even says "Note that PCI-X, like
>> conventional PCI, does not require systems to support coherent caches
>> for addresses accessed by PCI-X requesters".
>
> See the very first PCI specification, AGP 2.0 and the PCIe extension
> for non-snooped access.
>
> Originally it wasn't well defined what the PCI 1.0 spec meant with
> coherency (e.g. snooping vs uncached).
>
> AGP was the first specification which explicitly defined that all AGP
> memory accesses must be non-snooped and all PCI accesses must snoop
> the CPU caches.
>
> PCIe then had an extension which defined the "No Snooping Attribute"
> to allow emulating the AGP behavior.
>
> For the current PCIe 6.1 specification the non-snoop extension was
> merged into the base specification.
>
> Here see section "2.2.6.5 No Snoop Attribute", e.g. "Hardware enforced
> cache coherency expected"
>
> As well as the notes under section 7.5.3.4 Device Control Register:
>
> Enable No Snoop - If this bit is Set, the Function is permitted to Set
> the No Snoop bit in the Requester
> Attributes of transactions it initiates that do not require hardware
> enforced cache coherency (see Section 2.2.6.5 ).
>
> To summarize it: Not snooping caches is an extension, snooping caches
> is mandatory.
>
>> In addition, in the perspective of Linux, I think bypassing CPU cache
>> of shared memory is considered as coherent access too, see
>> dma_alloc_coherent() function's naming.
>
> Yes that's correct, but this is for platform devices. E.g. other I/O
> from drivers who doesn't need to work with malloced system memory for
> example.
>
> We have quite a bunch of V4L, sound and I also think network devices
> which work like that. But those are non-PCI devices.
>
>>> There are a bunch of non-compliant PCIe implementations which have
>>> broken cache coherency, but those explicitly violate the
>>> specification
>>> and because of that are not supported.
>> Regardless of it violating the spec or not, these devices work with
>> Linux subsystems that use dma_alloc_coherent to allocate DMA buffers
>> (which is the most common case), and GPU drivers just give out cryptic
>> error messages like "ring gfx test failed" without any mention of
>> coherency issues at all, which makes the fact that Linux DRM/TTM
>> subsystem currently requires PCIe snooping CPU cache more obscure.
>
> No, they don't even remotely work. You just got very basic tests working.
>
> Both the Vulkan as well as the OpenGL specification require that you
> can import "normal" malloced() system memory into the GPU driver.
>
> This is not possible without a cache coherent platform architecture.
> So you can't fully support those APIs.
>
> We exercised this quite extensively already and even have a
> confirmation from ARM engineers that the approach of attaching just
> any PCIe root to an ARM IP core is not supported from their side.
>
> And if I'm not completely mistaken the RISC-V specification was also
> updated to disallow stuff like this.
Googling helps: https://github.com/riscv/riscv-platform-specs/issues/83
Regards,
Christian.
>
> So yes you can have boards which implement non-snooped PCIe, but you
> get exactly zero support from hardware vendors to run software on it.
>
> Regards,
> Christian.
>
>>> Regards,
>>> Christian.
>>>
>>>>> Unfortunately I don't think we can safely ttm_cached to
>>>>> ttm_write_comnined, we've
>>>>> had enough drama with write combine behaviour on all different
>>>>> platforms.
>>>>>
>>>>> See drm_arch_can_wc_memory in drm_cache.h.
>>>>>
>>>> Yes this really sounds like an issue.
>>>>
>>>> Maybe the behavior of ttm_write_combined should furtherly be
>>>> decided
>>>> by drm_arch_can_wc_memory() in case of quirks?
>>>>
>>>>> Thanks
>>>>>
>>>>>> +
>>>>>> return ttm_prot_from_caching(caching, tmp);
>>>>>> }
>>>>>> EXPORT_SYMBOL(ttm_io_prot);
>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> index 7b00ddf0ce49f..3335df45fba5e 100644
>>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct
>>>>>> ttm_tt *ttm,
>>>>>> enum ttm_caching caching,
>>>>>> unsigned long extra_pages)
>>>>>> {
>>>>>> + /* Downgrade cached mapping for non-snooping devices */
>>>>>> + if (!bo->bdev->dma_coherent && caching == ttm_cached)
>>>>>> + caching = ttm_write_combined;
>>>>>> +
>>>>>> ttm->num_pages = (PAGE_ALIGN(bo->base.size) >>
>>>>>> PAGE_SHIFT) + extra_pages;
>>>>>> ttm->page_flags = page_flags;
>>>>>> ttm->dma_address = NULL;
>>>>>> diff --git a/include/drm/ttm/ttm_caching.h
>>>>>> b/include/drm/ttm/ttm_caching.h
>>>>>> index a18f43e93abab..f92d7911f50e4 100644
>>>>>> --- a/include/drm/ttm/ttm_caching.h
>>>>>> +++ b/include/drm/ttm/ttm_caching.h
>>>>>> @@ -47,7 +47,8 @@ enum ttm_caching {
>>>>>>
>>>>>> /**
>>>>>> * @ttm_cached: Fully cached like normal system memory,
>>>>>> requires that
>>>>>> - * devices snoop the CPU cache on accesses.
>>>>>> + * devices snoop the CPU cache on accesses. Downgraded
>>>>>> to
>>>>>> + * ttm_write_combined when the snooping capaiblity is
>>>>>> missing.
>>>>>> */
>>>>>> ttm_cached
>>>>>> };
>>>>>> --
>>>>>> 2.45.2
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20240702/a4f1b226/attachment-0001.htm>
More information about the dri-devel
mailing list