radeon ring 0 test failed on arm64
Peter Geis
pgwipeout at gmail.com
Thu Mar 17 12:21:22 UTC 2022
On Thu, Mar 17, 2022 at 5:15 AM Christian König
<christian.koenig at amd.com> wrote:
>
> Hi Peter,
>
> Am 17.03.22 um 01:14 schrieb Peter Geis:
> > Good Evening,
> >
> > I apologize for raising this email chain from the dead, but there have
> > been some developments that have introduced even more questions.
> > I've looped the Rockchip mailing list into this too, as this affects
> > rk356x, and likely the upcoming rk3588 if [1] is to be believed.
> >
> > TLDR for those not familiar: It seems the rk356x series (and possibly
> > the rk3588) were built without any outer coherent cache.
> > This means (unless Rockchip wants to clarify here) devices such as the
> > ITS and PCIe cannot utilize cache snooping.
>
> well, as far as I know that is a clear violation of the PCIe specification.
>
> Coherent access to system memory is simply a must have.
>From what I've read of the Arm documentation on the AXI bus, this is
supposed to be implemented by design as well.
>
> > This is based on the results of the email chain [2].
> >
> > The new circumstances are as follows:
> > The RPi CM4 Adventure Team as I've taken to calling them has been
> > attempting to get a dGPU working with the very broken Broadcom
> > controller in the RPi CM4.
> > Recently they acquired a SoQuartz rk3566 module which is pin
> > compatible with the CM4, and have taken to trying it out as well.
> >
> > This is how I got involved.
> > It seems they found a trivial way to force the Radeon R600 driver to
> > use Non-Cached memory for everything.
>
> Yeah, you basically just force it into AGP mode :)
>
> There is just absolutely no guarantee that this works reliable.
Ah, that makes sense.
>
> > This single line change, combined with using memset_io instead of
> > memset, allows the ring tests to pass and the card probes successfully
> > (minus the DMA limitations of the rk356x due to the 32 bit
> > interconnect).
> > I discovered using this method that we start having unaligned io
> > memory access faults (bus errors) when running glmark2-drm (running
> > glmark2 directly was impossible, as both X and Wayland crashed too
> > early).
> > I traced this to using what I thought at the time was an unsafe memcpy
> > in the mesa stack.
> > Rewriting this function to force aligned writes solved the problem and
> > allows glmark2-drm to run to completion.
> > With some extensive debugging, I found about half a dozen memcpy
> > functions in mesa that if forced to be aligned would allow Wayland to
> > start, but with hilarious display corruption (see [3]. [4]).
> > The CM4 team is convinced this is an issue with memcpy in glibc, but
> > I'm not convinced it's that simple.
>
> Yes exactly that.
>
> Both OpenGL and Vulkan allow the application to mmap() device memory and
> do any memory access they want with that.
>
> This means that changing memcpy is just a futile effort, it's still
> possible for the application to make an unaligned memory access and that
> is perfectly valid.
I was afraid of that and it reflects what I see with X11's behavior.
>
> > On my two hour drive in to work this morning, I got to thinking.
> > If this was an memcpy fault, this would be universally broken on arm64
> > which is obviously not the case.
> > So I started thinking, what is different here than with systems known to work:
> > 1. No IOMMU for the PCIe controller.
> > 2. The Outer Cache Issue.
>
> Oh, very good point. I would be interested in that as answer as well.
>
> Regards,
> Christian.
>
> >
> > Robin:
> > My questions for you, since you're the smartest person I know about
> > arm64 memory management:
> > Could cache snooping permit unaligned accesses to IO to be safe?
> > Or
> > Is it the lack of an IOMMU that's causing the alignment faults to become fatal?
> > Or
> > Am I insane here?
> >
> > Rockchip:
> > Please update on the status for the Outer Cache errata for ITS services.
> > Please provide an answer to the errata of the PCIe controller, in
> > regard to cache snooping and buffering, for both the rk356x and the
> > upcoming rk3588.
> >
> > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJeffyCN%2Fmirrors%2Fcommit%2F0b985f29304dcb9d644174edacb67298e8049d4f&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ae2dfa3e8ec4a765f8a08da07ab1cb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637830728762044450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ZL3jA2VrnynWbUdFG6naaqrZqcnKRq338n%2Bj50DRa74%3D&reserved=0
> > [2] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F871rbdt4tu.wl-maz%40kernel.org%2FT%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ae2dfa3e8ec4a765f8a08da07ab1cb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637830728762044450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QZy%2Bt%2Fus5f3yxwrHmXpzerXngPpKp3i9ZsF1UJ%2BHvlU%3D&reserved=0
> > [3] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcdn.discordapp.com%2Fattachments%2F926487797844541510%2F953414755970850816%2Funknown.png&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ae2dfa3e8ec4a765f8a08da07ab1cb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637830728762044450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=c29bc87hxyIvnsBK3Fo7FbF7RwJcFr%2FjgBrLIiBb%2FyY%3D&reserved=0
> > [4] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcdn.discordapp.com%2Fattachments%2F926487797844541510%2F953424952042852422%2Funknown.png&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ae2dfa3e8ec4a765f8a08da07ab1cb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637830728762044450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fwygTk%2BDzdla67rdAYb44vlivlby9lFwtcgjLfJEH4A%3D&reserved=0
> >
> > Thank you everyone for your time.
> >
> > Very Respectfully,
> > Peter Geis
> >
> > On Wed, May 26, 2021 at 7:21 AM Christian König
> > <christian.koenig at amd.com> wrote:
> >> Hi Robin,
> >>
> >> Am 26.05.21 um 12:59 schrieb Robin Murphy:
> >>> On 2021-05-26 10:42, Christian König wrote:
> >>>> Hi Robin,
> >>>>
> >>>> Am 25.05.21 um 22:09 schrieb Robin Murphy:
> >>>>> On 2021-05-25 14:05, Alex Deucher wrote:
> >>>>>> On Tue, May 25, 2021 at 8:56 AM Peter Geis <pgwipeout at gmail.com>
> >>>>>> wrote:
> >>>>>>> On Tue, May 25, 2021 at 8:47 AM Alex Deucher
> >>>>>>> <alexdeucher at gmail.com> wrote:
> >>>>>>>> On Tue, May 25, 2021 at 8:42 AM Peter Geis <pgwipeout at gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>> Good Evening,
> >>>>>>>>>
> >>>>>>>>> I am stress testing the pcie controller on the rk3566-quartz64
> >>>>>>>>> prototype SBC.
> >>>>>>>>> This device has 1GB available at <0x3 0x00000000> for the PCIe
> >>>>>>>>> controller, which makes a dGPU theoretically possible.
> >>>>>>>>> While attempting to light off a HD7570 card I manage to get a
> >>>>>>>>> modeset
> >>>>>>>>> console, but ring0 test fails and disables acceleration.
> >>>>>>>>>
> >>>>>>>>> Note, we do not have UEFI, so all PCIe setup is from the Linux
> >>>>>>>>> kernel.
> >>>>>>>>> Any insight you can provide would be much appreciated.
> >>>>>>>> Does your platform support PCIe cache coherency with the CPU? I.e.,
> >>>>>>>> does the CPU allow cache snoops from PCIe devices? That is required
> >>>>>>>> for the driver to operate.
> >>>>>>> Ah, most likely not.
> >>>>>>> This issue has come up already as the GIC isn't permitted to snoop on
> >>>>>>> the CPUs, so I doubt the PCIe controller can either.
> >>>>>>>
> >>>>>>> Is there no way to work around this or is it dead in the water?
> >>>>>> It's required by the pcie spec. You could potentially work around it
> >>>>>> if you can allocate uncached memory for DMA, but I don't think that is
> >>>>>> possible currently. Ideally we'd figure out some way to detect if a
> >>>>>> particular platform supports cache snooping or not as well.
> >>>>> There's device_get_dma_attr(), although I don't think it will work
> >>>>> currently for PCI devices without an OF or ACPI node - we could
> >>>>> perhaps do with a PCI-specific wrapper which can walk up and defer
> >>>>> to the host bridge's firmware description as necessary.
> >>>>>
> >>>>> The common DMA ops *do* correctly keep track of per-device coherency
> >>>>> internally, but drivers aren't supposed to be poking at that
> >>>>> information directly.
> >>>> That sounds like you underestimate the problem. ARM has unfortunately
> >>>> made the coherency for PCI an optional IP.
> >>> Sorry to be that guy, but I'm involved a lot internally with our
> >>> system IP and interconnect, and I probably understand the situation
> >>> better than 99% of the community ;)
> >> I need to apologize, didn't realized who was answering :)
> >>
> >> It just sounded to me that you wanted to suggest to the end user that
> >> this is fixable in software and I really wanted to avoid even more
> >> customers coming around asking how to do this.
> >>
> >>> For the record, the SBSA specification (the closet thing we have to a
> >>> "system architecture") does require that PCIe is integrated in an
> >>> I/O-coherent manner, but we don't have any control over what people do
> >>> in embedded applications (note that we don't make PCIe IP at all, and
> >>> there is plenty of 3rd-party interconnect IP).
> >> So basically it is not the fault of the ARM IP-core, but people are just
> >> stitching together PCIe interconnect IP with a core where it is not
> >> supposed to be used with.
> >>
> >> Do I get that correctly? That's an interesting puzzle piece in the picture.
> >>
> >>>> So we are talking about a hardware limitation which potentially can't
> >>>> be fixed without replacing the hardware.
> >>> You expressed interest in "some way to detect if a particular platform
> >>> supports cache snooping or not", by which I assumed you meant a
> >>> software method for the amdgpu/radeon drivers to call, rather than,
> >>> say, a website that driver maintainers can look up SoC names on. I'm
> >>> saying that that API already exists (just may need a bit more work).
> >>> Note that it is emphatically not a platform-level thing since
> >>> coherency can and does vary per device within a system.
> >> Well, I think this is not something an individual driver should mess
> >> with. What the driver should do is just express that it needs coherent
> >> access to all of system memory and if that is not possible fail to load
> >> with a warning why it is not possible.
> >>
> >>> I wasn't suggesting that Linux could somehow make coherency magically
> >>> work when the signals don't physically exist in the interconnect - I
> >>> was assuming you'd merely want to do something like throw a big
> >>> warning and taint the kernel to help triage bug reports. Some drivers
> >>> like ahci_qoriq and panfrost simply need to know so they can program
> >>> their device to emit the appropriate memory attributes either way, and
> >>> rely on the DMA API to hide the rest of the difference, but if you
> >>> want to treat non-coherent use as unsupported because it would require
> >>> too invasive changes that's fine by me.
> >> Yes exactly that please. I mean not sure how panfrost is doing it, but
> >> at least the Vulkan userspace API specification requires devices to have
> >> coherent access to system memory.
> >>
> >> So even if I would want to do this it is simply not possible because the
> >> application doesn't tell the driver which memory is accessed by the
> >> device and which by the CPU.
> >>
> >> Christian.
> >>
> >>> Robin.
>
More information about the amd-gfx
mailing list