[PATCH] drm/[amdgpu|radeon]: fix memset on io mem

Robin Murphy robin.murphy at arm.com
Fri Dec 18 15:19:20 UTC 2020


On 2020-12-18 14:33, Christian König wrote:
> Am 18.12.20 um 15:17 schrieb Robin Murphy:
>> On 2020-12-17 14:02, Christian König wrote:
>>> [SNIP]
>>> Do you have some background why some ARM boards fail with that?
>>>
>>> We had a couple of reports that memset/memcpy fail in userspace 
>>> (usually system just spontaneously reboots or becomes unresponsive), 
>>> but so far nobody could tell us why that happens?
>>
>> Part of it is that Arm doesn't really have an ideal memory type for 
>> mapping RAM behind PCI (much like we also struggle with the vague 
>> expectations of what write-combine might mean beyond x86). Device 
>> memory can be relaxed to allow gathering, reordering and 
>> write-buffering, but is still a bit too restrictive in other ways - 
>> aligned, non-speculative, etc. - for something that's really just RAM 
>> and expected to be usable as such. Thus to map PCI memory as 
>> "write-combine" we use Normal non-cacheable, which means the CPU MMU 
>> is going to allow software to do all the things it might expect of 
>> RAM, but we're now at the mercy of the menagerie of interconnects and 
>> PCI implementations out there.
> 
> I see. As far as I know we already correctly map the RAM from the GPU as 
> "write-combine".
> 
>> Atomic operations, for example, *might* be resolved by the CPU 
>> coherency mechanism or in the interconnect, such that the PCI host 
>> bridge only sees regular loads and stores, but more often than not 
>> they'll just result in an atomic transaction going all the way to the 
>> host bridge. A super-duper-clever host bridge implementation might 
>> even support that, but the vast majority are likely to just reject it 
>> as invalid.
> 
> Support for atomics is actually specified by an PCIe extension. As far 
> as I know that extension is even necessary for full KFD support on AMD 
> and full Cuda support for NVidia GPUs.
> 
>>
>> Similarly, unaligned accesses, cache line fills/evictions, and such 
>> will often work, since they're essentially just larger read/write 
>> bursts, but some host bridges can be picky and might reject access 
>> sizes they don't like (there's at least one where even 64-bit accesses 
>> don't work. On a 64-bit system...)
> 
> This is breaking our neck here. We need 64bit writes on 64bit systems to 
> end up as one 64bit write at the hardware and not two 32bit writes or 
> otherwise the doorbells won't work correctly.

Just to clarify, that particular case *is* considered catastrophically 
broken ;)

In general you can assume that on AArch64, any aligned 64-bit load or 
store is atomic (64-bit accesses on 32-bit Arm are less well-defined, 
but hopefully nobody cares by now).

> Larger writes are pretty much unproblematic, for P2P our bus interface 
> even supports really large multi byte transfers.
> 
>> If an invalid transaction does reach the host bridge, it's going to 
>> come back to the CPU as an external abort. If we're really lucky that 
>> could be taken synchronously, attributable to a specific instruction, 
>> and just oops/SIGBUS the relevant kernel/userspace thread. Often 
>> though, (particularly with big out-of-order CPUs) it's likely to be 
>> asynchronous and no longer attributable, and thus taken as an SError 
>> event, which in general roughly translates to "part of the SoC has 
>> fallen off". The only reasonable response we have to that is to panic 
>> the system.
> 
> Yeah, that sounds exactly like what we see on some of the ARM boards out 
> there. At least we have an explanation for that behavior now.
> 
> Going to talk about this with our hardware engineers. We might be able 
> to work around some of that stuff, but that is rather tricky to get 
> working under those conditions.

Yeah, unfortunately there's no easy way to judge the quality of any 
given SoC's PCI implementation until you throw your required traffic at 
it and things either break or don't...

Cheers,
Robin.


More information about the dri-devel mailing list