<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Am 02.07.24 um 03:46 schrieb Icenowy Zheng:<br>
<blockquote type="cite" cite="mid:6303afecce2dff9e7d30f67e0a74205256e0a524.camel@icenowy.me">
<pre class="moz-quote-pre" wrap="">在 2024-07-01星期一的 13:40 +0200,Christian König写道:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Am 29.06.24 um 22:51 schrieb Icenowy Zheng:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
于 2024年6月30日 GMT+08:00 03:57:47,Jiaxun Yang
<a class="moz-txt-link-rfc2396E" href="mailto:jiaxun.yang@flygoat.com"><jiaxun.yang@flygoat.com></a> 写道:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
在2024年6月29日六月 上午6:22,Icenowy Zheng写道:
[...]
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">@@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct
ttm_buffer_object *bo,
struct ttm_resource *res,
caching = res->bus.caching;
}
+ /* Downgrade cached mapping for non-snooping devices */
+ if (!bo->bdev->dma_coherent && caching == ttm_cached)
+ caching = ttm_write_combined;
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Hi Icenowy,
Thanks for your patch! You saved many non-coh PCIe host
implementations a day!.
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Ah, wait a second.
Such a thing as non-coherent PCIe implementation doesn't exist. The
PCIe
specification makes it mandatory for memory access to be cache
coherent.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Really? I tried to get PCIe spec 2.0, PCI spec 3.0 and PCI-X addendum
1.0, none of this explicitly requires the PCIe controller and the CPU
being fully coherent. The PCI-X spec even says "Note that PCI-X, like
conventional PCI, does not require systems to support coherent caches
for addresses accessed by PCI-X requesters".</pre>
</blockquote>
<br>
See the very first PCI specification, AGP 2.0 and the PCIe extension
for non-snooped access.<br>
<br>
Originally it wasn't well defined what the PCI 1.0 spec meant with
coherency (e.g. snooping vs uncached).<br>
<br>
AGP was the first specification which explicitly defined that all
AGP memory accesses must be non-snooped and all PCI accesses must
snoop the CPU caches.<br>
<br>
PCIe then had an extension which defined the "No Snooping Attribute"
to allow emulating the AGP behavior.<br>
<br>
For the current PCIe 6.1 specification the non-snoop extension was
merged into the base specification.<br>
<br>
Here see section "2.2.6.5 No Snoop Attribute", e.g. "Hardware
enforced cache coherency expected"<br>
<br>
As well as the notes under section 7.5.3.4 Device Control Register:<br>
<br>
Enable No Snoop - If this bit is Set, the Function is permitted to
Set the No Snoop bit in the Requester<br>
Attributes of transactions it initiates that do not require hardware
enforced cache coherency (see Section 2.2.6.5 ).<br>
<br>
To summarize it: Not snooping caches is an extension, snooping
caches is mandatory.<br>
<br>
<blockquote type="cite" cite="mid:6303afecce2dff9e7d30f67e0a74205256e0a524.camel@icenowy.me">
<pre class="moz-quote-pre" wrap="">In addition, in the perspective of Linux, I think bypassing CPU cache
of shared memory is considered as coherent access too, see
dma_alloc_coherent() function's naming.</pre>
</blockquote>
<br>
Yes that's correct, but this is for platform devices. E.g. other I/O
from drivers who doesn't need to work with malloced system memory
for example.<br>
<br>
We have quite a bunch of V4L, sound and I also think network devices
which work like that. But those are non-PCI devices.<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:6303afecce2dff9e7d30f67e0a74205256e0a524.camel@icenowy.me">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
There are a bunch of non-compliant PCIe implementations which have
broken cache coherency, but those explicitly violate the
specification
and because of that are not supported.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Regardless of it violating the spec or not, these devices work with
Linux subsystems that use dma_alloc_coherent to allocate DMA buffers
(which is the most common case), and GPU drivers just give out cryptic
error messages like "ring gfx test failed" without any mention of
coherency issues at all, which makes the fact that Linux DRM/TTM
subsystem currently requires PCIe snooping CPU cache more obscure.</pre>
</blockquote>
<br>
No, they don't even remotely work. You just got very basic tests
working.<br>
<br>
Both the Vulkan as well as the OpenGL specification require that you
can import "normal" malloced() system memory into the GPU driver.<br>
<br>
This is not possible without a cache coherent platform architecture.
So you can't fully support those APIs.<br>
<br>
We exercised this quite extensively already and even have a
confirmation from ARM engineers that the approach of attaching just
any PCIe root to an ARM IP core is not supported from their side.<br>
<br>
And if I'm not completely mistaken the RISC-V specification was also
updated to disallow stuff like this.<br>
<br>
So yes you can have boards which implement non-snooped PCIe, but you
get exactly zero support from hardware vendors to run software on
it.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<blockquote type="cite" cite="mid:6303afecce2dff9e7d30f67e0a74205256e0a524.camel@icenowy.me">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Regards,
Christian.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Unfortunately I don't think we can safely ttm_cached to
ttm_write_comnined, we've
had enough drama with write combine behaviour on all different
platforms.
See drm_arch_can_wc_memory in drm_cache.h.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Yes this really sounds like an issue.
Maybe the behavior of ttm_write_combined should furtherly be
decided
by drm_arch_can_wc_memory() in case of quirks?
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Thanks
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">+
return ttm_prot_from_caching(caching, tmp);
}
EXPORT_SYMBOL(ttm_io_prot);
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
b/drivers/gpu/drm/ttm/ttm_tt.c
index 7b00ddf0ce49f..3335df45fba5e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct
ttm_tt *ttm,
enum ttm_caching caching,
unsigned long extra_pages)
{
+ /* Downgrade cached mapping for non-snooping devices */
+ if (!bo->bdev->dma_coherent && caching == ttm_cached)
+ caching = ttm_write_combined;
+
ttm->num_pages = (PAGE_ALIGN(bo->base.size) >>
PAGE_SHIFT) + extra_pages;
ttm->page_flags = page_flags;
ttm->dma_address = NULL;
diff --git a/include/drm/ttm/ttm_caching.h
b/include/drm/ttm/ttm_caching.h
index a18f43e93abab..f92d7911f50e4 100644
--- a/include/drm/ttm/ttm_caching.h
+++ b/include/drm/ttm/ttm_caching.h
@@ -47,7 +47,8 @@ enum ttm_caching {
/**
* @ttm_cached: Fully cached like normal system memory,
requires that
- * devices snoop the CPU cache on accesses.
+ * devices snoop the CPU cache on accesses. Downgraded
to
+ * ttm_write_combined when the snooping capaiblity is
missing.
*/
ttm_cached
};
--
2.45.2
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
</pre>
</blockquote>
<br>
</body>
</html>