radeon ring 0 test failed on arm64
Christian König
christian.koenig at amd.com
Fri Mar 18 08:35:31 UTC 2022
Am 18.03.22 um 08:51 schrieb Kever Yang:
>
> On 2022/3/17 20:19, Peter Geis wrote:
>> On Wed, Mar 16, 2022 at 11:08 PM Kever Yang
>> <kever.yang at rock-chips.com> wrote:
>>> Hi Peter,
>>>
>>> On 2022/3/17 08:14, Peter Geis wrote:
>>>> Good Evening,
>>>>
>>>> I apologize for raising this email chain from the dead, but there have
>>>> been some developments that have introduced even more questions.
>>>> I've looped the Rockchip mailing list into this too, as this affects
>>>> rk356x, and likely the upcoming rk3588 if [1] is to be believed.
>>>>
>>>> TLDR for those not familiar: It seems the rk356x series (and possibly
>>>> the rk3588) were built without any outer coherent cache.
>>>> This means (unless Rockchip wants to clarify here) devices such as the
>>>> ITS and PCIe cannot utilize cache snooping.
>>>> This is based on the results of the email chain [2].
>>>>
>>>> The new circumstances are as follows:
>>>> The RPi CM4 Adventure Team as I've taken to calling them has been
>>>> attempting to get a dGPU working with the very broken Broadcom
>>>> controller in the RPi CM4.
>>>> Recently they acquired a SoQuartz rk3566 module which is pin
>>>> compatible with the CM4, and have taken to trying it out as well.
>>>>
>>>> This is how I got involved.
>>>> It seems they found a trivial way to force the Radeon R600 driver to
>>>> use Non-Cached memory for everything.
>>>> This single line change, combined with using memset_io instead of
>>>> memset, allows the ring tests to pass and the card probes successfully
>>>> (minus the DMA limitations of the rk356x due to the 32 bit
>>>> interconnect).
>>>> I discovered using this method that we start having unaligned io
>>>> memory access faults (bus errors) when running glmark2-drm (running
>>>> glmark2 directly was impossible, as both X and Wayland crashed too
>>>> early).
>>>> I traced this to using what I thought at the time was an unsafe memcpy
>>>> in the mesa stack.
>>>> Rewriting this function to force aligned writes solved the problem and
>>>> allows glmark2-drm to run to completion.
>>>> With some extensive debugging, I found about half a dozen memcpy
>>>> functions in mesa that if forced to be aligned would allow Wayland to
>>>> start, but with hilarious display corruption (see [3]. [4]).
>>>> The CM4 team is convinced this is an issue with memcpy in glibc, but
>>>> I'm not convinced it's that simple.
>>>>
>>>> On my two hour drive in to work this morning, I got to thinking.
>>>> If this was an memcpy fault, this would be universally broken on arm64
>>>> which is obviously not the case.
>>>> So I started thinking, what is different here than with systems
>>>> known to work:
>>>> 1. No IOMMU for the PCIe controller.
>>>> 2. The Outer Cache Issue.
>>>>
>>>> Robin:
>>>> My questions for you, since you're the smartest person I know about
>>>> arm64 memory management:
>>>> Could cache snooping permit unaligned accesses to IO to be safe?
>>>> Or
>>>> Is it the lack of an IOMMU that's causing the ali gnment faults to
>>>> become fatal?
>>>> Or
>>>> Am I insane here?
>>>>
>>>> Rockchip:
>>>> Please update on the status for the Outer Cache errata for ITS
>>>> services.
>>> Our SoC design team has double check with ARM GIC/ITS IP team for many
>>> times, and the GITS_CBASER
>>> of GIC600 IP does not support hardware bind or config to a fix
>>> value, so
>>> they insist this is an IP
>>> limitation instead of a SoC bug, software should take care of it :(
>>> I will check again if we can provide errata for this issue.
>> Thanks. This is necessary as the mbi-alias provides an imperfect
>> implementation of the ITS and causes certain PCIe cards (eg x520 Intel
>> 10G NIC) to misbehave.
>>
>>>> Please provide an answer to the errata of the PCIe controller, in
>>>> regard to cache snooping and buffering, for both the rk356x and the
>>>> upcoming rk3588.
>>>
>>> Sorry, what is this?
>> Part of the ITS bug is it expects to be cache coherent with the CPU
>> cluster by design.
>> Due to the rk356x being implemented without an outer accessible cache,
>> the ITS and other devices that require cache coherency (PCIe for
>> example) crash in fun ways.
> Then this is still the ITS issue, not PCIe issue.
> PCIe is a peripheral bus controller like USB and other device, the
> driver should maintain the "cache coherency" if there is any, and
> there is no requirement for hardware cache coherency between PCIe and CPU.
Well then I suggest to re-read the PCIe specification.
Cache coherency is defined as mandatory there. Non-cache coherency is an
optional feature.
See section 2.2.6.5 in the PCIe 2.0 specification for a good example.
Regards,
Christian.
>
> We didn't see any transfer error on rk356x PCIe till now, we can take
> a look if it's easy to reproduce.
>
> Thanks,
> - Kever
>
>
>> This means that rk356x cannot implement a specification compliant ITS
>> or PCIe.
>> >From the rk3588 source dump it appears it was produced without an
>> outer accessible cache, which means if true it also will be unable to
>> use any PCIe cards that implement cache coherency as part of their
>> design.
>>
>>>
>>> Thanks,
>>> - Kever
>>>> [1]
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJeffyCN%2Fmirrors%2Fcommit%2F0b985f29304dcb9d644174edacb67298e8049d4f&data=04%7C01%7Cchristian.koenig%40amd.com%7C8bdb8c3a6a2e4643bbfd08da08b42da4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637831867224766930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=LcwZggIwIqjvzjDH2DUnIDwxsgk7WmhE9LK13knx36E%3D&reserved=0
>>>> [2]
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F871rbdt4tu.wl-maz%40kernel.org%2FT%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C8bdb8c3a6a2e4643bbfd08da08b42da4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637831867224766930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fXALLO1EnGi2s8pClt6aMrUlzqDy2KDO8wzpi033qtU%3D&reserved=0
>>>> [3]
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcdn.discordapp.com%2Fattachments%2F926487797844541510%2F953414755970850816%2Funknown.png&data=04%7C01%7Cchristian.koenig%40amd.com%7C8bdb8c3a6a2e4643bbfd08da08b42da4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637831867224766930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=tx%2Bw9ayScUTftjWAFL0GY%2FADQswxEJGRUhgxDw2TSzQ%3D&reserved=0
>>>> [4]
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcdn.discordapp.com%2Fattachments%2F926487797844541510%2F953424952042852422%2Funknown.png&data=04%7C01%7Cchristian.koenig%40amd.com%7C8bdb8c3a6a2e4643bbfd08da08b42da4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637831867224766930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8VXuZvQAhD%2FsQBJ6WEXe0YElD6wCI675oxqHesKhclY%3D&reserved=0
>>>>
>>>> Thank you everyone for your time.
>>>>
>>>> Very Respectfully,
>>>> Peter Geis
>>>>
>>>> On Wed, May 26, 2021 at 7:21 AM Christian König
>>>> <christian.koenig at amd.com> wrote:
>>>>> Hi Robin,
>>>>>
>>>>> Am 26.05.21 um 12:59 schrieb Robin Murphy:
>>>>>> On 2021-05-26 10:42, Christian König wrote:
>>>>>>> Hi Robin,
>>>>>>>
>>>>>>> Am 25.05.21 um 22:09 schrieb Robin Murphy:
>>>>>>>> On 2021-05-25 14:05, Alex Deucher wrote:
>>>>>>>>> On Tue, May 25, 2021 at 8:56 AM Peter Geis <pgwipeout at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> On Tue, May 25, 2021 at 8:47 AM Alex Deucher
>>>>>>>>>> <alexdeucher at gmail.com> wrote:
>>>>>>>>>>> On Tue, May 25, 2021 at 8:42 AM Peter Geis
>>>>>>>>>>> <pgwipeout at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Good Evening,
>>>>>>>>>>>>
>>>>>>>>>>>> I am stress testing the pcie controller on the rk3566-quartz64
>>>>>>>>>>>> prototype SBC.
>>>>>>>>>>>> This device has 1GB available at <0x3 0x00000000> for the PCIe
>>>>>>>>>>>> controller, which makes a dGPU theoretically possible.
>>>>>>>>>>>> While attempting to light off a HD7570 card I manage to get a
>>>>>>>>>>>> modeset
>>>>>>>>>>>> console, but ring0 test fails and disables acceleration.
>>>>>>>>>>>>
>>>>>>>>>>>> Note, we do not have UEFI, so all PCIe setup is from the Linux
>>>>>>>>>>>> kernel.
>>>>>>>>>>>> Any insight you can provide would be much appreciated.
>>>>>>>>>>> Does your platform support PCIe cache coherency with the
>>>>>>>>>>> CPU? I.e.,
>>>>>>>>>>> does the CPU allow cache snoops from PCIe devices? That is
>>>>>>>>>>> required
>>>>>>>>>>> for the driver to operate.
>>>>>>>>>> Ah, most likely not.
>>>>>>>>>> This issue has come up already as the GIC isn't permitted to
>>>>>>>>>> snoop on
>>>>>>>>>> the CPUs, so I doubt the PCIe controller can either.
>>>>>>>>>>
>>>>>>>>>> Is there no way to work around this or is it dead in the water?
>>>>>>>>> It's required by the pcie spec. You could potentially work
>>>>>>>>> around it
>>>>>>>>> if you can allocate uncached memory for DMA, but I don't think
>>>>>>>>> that is
>>>>>>>>> possible currently. Ideally we'd figure out some way to
>>>>>>>>> detect if a
>>>>>>>>> particular platform supports cache snooping or not as well.
>>>>>>>> There's device_get_dma_attr(), although I don't think it will work
>>>>>>>> currently for PCI devices without an OF or ACPI node - we could
>>>>>>>> perhaps do with a PCI-specific wrapper which can walk up and defer
>>>>>>>> to the host bridge's firmware description as necessary.
>>>>>>>>
>>>>>>>> The common DMA ops *do* correctly keep track of per-device
>>>>>>>> coherency
>>>>>>>> internally, but drivers aren't supposed to be poking at that
>>>>>>>> information directly.
>>>>>>> That sounds like you underestimate the problem. ARM has
>>>>>>> unfortunately
>>>>>>> made the coherency for PCI an optional IP.
>>>>>> Sorry to be that guy, but I'm involved a lot internally with our
>>>>>> system IP and interconnect, and I probably understand the situation
>>>>>> better than 99% of the community ;)
>>>>> I need to apologize, didn't realized who was answering :)
>>>>>
>>>>> It just sounded to me that you wanted to suggest to the end user that
>>>>> this is fixable in software and I really wanted to avoid even more
>>>>> customers coming around asking how to do this.
>>>>>
>>>>>> For the record, the SBSA specification (the closet thing we have
>>>>>> to a
>>>>>> "system architecture") does require that PCIe is integrated in an
>>>>>> I/O-coherent manner, but we don't have any control over what
>>>>>> people do
>>>>>> in embedded applications (note that we don't make PCIe IP at all,
>>>>>> and
>>>>>> there is plenty of 3rd-party interconnect IP).
>>>>> So basically it is not the fault of the ARM IP-core, but people
>>>>> are just
>>>>> stitching together PCIe interconnect IP with a core where it is not
>>>>> supposed to be used with.
>>>>>
>>>>> Do I get that correctly? That's an interesting puzzle piece in the
>>>>> picture.
>>>>>
>>>>>>> So we are talking about a hardware limitation which potentially
>>>>>>> can't
>>>>>>> be fixed without replacing the hardware.
>>>>>> You expressed interest in "some way to detect if a particular
>>>>>> platform
>>>>>> supports cache snooping or not", by which I assumed you meant a
>>>>>> software method for the amdgpu/radeon drivers to call, rather than,
>>>>>> say, a website that driver maintainers can look up SoC names on. I'm
>>>>>> saying that that API already exists (just may need a bit more work).
>>>>>> Note that it is emphatically not a platform-level thing since
>>>>>> coherency can and does vary per device within a system.
>>>>> Well, I think this is not something an individual driver should mess
>>>>> with. What the driver should do is just express that it needs
>>>>> coherent
>>>>> access to all of system memory and if that is not possible fail to
>>>>> load
>>>>> with a warning why it is not possible.
>>>>>
>>>>>> I wasn't suggesting that Linux could somehow make coherency
>>>>>> magically
>>>>>> work when the signals don't physically exist in the interconnect - I
>>>>>> was assuming you'd merely want to do something like throw a big
>>>>>> warning and taint the kernel to help triage bug reports. Some
>>>>>> drivers
>>>>>> like ahci_qoriq and panfrost simply need to know so they can program
>>>>>> their device to emit the appropriate memory attributes either
>>>>>> way, and
>>>>>> rely on the DMA API to hide the rest of the difference, but if you
>>>>>> want to treat non-coherent use as unsupported because it would
>>>>>> require
>>>>>> too invasive changes that's fine by me.
>>>>> Yes exactly that please. I mean not sure how panfrost is doing it,
>>>>> but
>>>>> at least the Vulkan userspace API specification requires devices
>>>>> to have
>>>>> coherent access to system memory.
>>>>>
>>>>> So even if I would want to do this it is simply not possible
>>>>> because the
>>>>> application doesn't tell the driver which memory is accessed by the
>>>>> device and which by the CPU.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Robin.
>>> _______________________________________________
>>> Linux-rockchip mailing list
>>> Linux-rockchip at lists.infradead.org
>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-rockchip&data=04%7C01%7Cchristian.koenig%40amd.com%7C8bdb8c3a6a2e4643bbfd08da08b42da4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637831867224766930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=F77FbO3SqslbzKu2%2FnjRLrQF45kljtD3%2FAEXEFd7NQs%3D&reserved=0
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20220318/c2eda7a7/attachment-0001.htm>
More information about the amd-gfx
mailing list