[PATCH] drm/radeon: Fix screen corruption (v2)

Thu Dec 15 11:53:55 UTC 2022

On 2022-12-15 11:40, Luben Tuikov wrote:
> On 2022-12-15 06:27, Christian König wrote:
>> Am 15.12.22 um 11:19 schrieb Luben Tuikov:
>>> On 2022-12-15 04:46, Christian König wrote:
>>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>>>>> On 2022-12-15 03:07, Christian König wrote:
>>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy at arm.com>
>>>>>>>> wrote:
>>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>>>>>
>>>>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>>>>>> AGP chip,
>>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>>>>>> 0x7FFFFFF, and
>>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>>>>>> the last
>>>>>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>>>>>> use GFP_DMA32
>>>>>>>>>> when allocating DMA buffers.
>>>>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>>>>>> going to the wrong place, that is also equally consistent with a
>>>>>>>>> loss of
>>>>>>>>> cache coherency.
>>>>>>>>>
>>>>>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>>>>>> alias
>>>>>>>>> memory to a second physical address, so I could well believe that
>>>>>>>>> something somewhere in the driver stack needs to perform some cache
>>>>>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>>>>>> and thus going wrong for highmem pages.
>>>>>>>>>
>>>>>>>>> So as I said before, I really think this is not about using
>>>>>>>>> GFP_DMA32 at
>>>>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>>>>> One of the wonderful features of AGP is that it has to be used with
>>>>>>>> uncached memory.  The aperture basically just provides a remapping of
>>>>>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>>>>>> has to jump through quite a few hoops to get uncached memory in the
>>>>>>>> first place, so it's likely that that somehow isn't compatible with
>>>>>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>>>>>> I guess in principle yes, if you're careful not to use regular
>>>>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>>>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>>>>>> slipping up.
>>>>>> I theory we should do exactly that in TTM, but we have very few users
>>>>>> who actually still exercise that functionality.
>>>>>>
>>>>>>> Working backwards from primitives like set_memory_uc(), I see various
>>>>>>> paths in TTM where manipulating the caching state is skipped for
>>>>>>> highmem pages, but I wouldn't even know where to start looking for
>>>>>>> whether the right state is propagated to all the places where they
>>>>>>> might eventually be mapped somewhere.
>>>>>> The tt object has the caching state for the pages and
>>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>>>>>> userspace/vmalloc mappings.
>>>>>>
>>>>> The point of this patch is that dma_addressing_limited() is unsuitable as
>>>>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
>>>>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
>>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
>>>> Well I would rather say that dma_addressing_limited() works, but the
>>>> default value from dma_get_required_mask() is broken.
>>>>
>>> dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF.
>>
>> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB
>> addressable memory (27 bits set)? Or is there another F missing?
> 
> Yeah, I'm missing an F--it is correctly described at the top of the thread above,
> i.e. in the commit of v2 patch.
> 
> 0x7FFF_FFFF, which seems correct, no?
> 
>>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init().
>>>
>>>> 32 bits only work with bounce buffers and we can't use those on graphics
>>>> hardware.
>>>>
>>>>> Is there an objection to this patch, if it fixes the screen corruption?
>>>> Not from my side, but fixing the underlying issues would be better I think.
>>>>
>>> Have they been identified?
>>
>> I'm not 100% sure. I think by using GFP_DMA32 we just work around the
>> issue somehow.
> 
> Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM
> code trying to understand what we do when GFP_DMA32 is not set, and the immediate
> thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the device struct.
> (Then I got down to the caching attributes...)
> 
> It's be nice if we can find the actual issue--what else would it show us that needs fixing...?
> 
> So what do we do with this patch?
> 
> Shouldn't leave it in a limbo--some OSes ship their kernel
> with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations") wholly
> reverted.

Removing dma_addressing_limited() is still wrong, for the reasons given 
in that commit. What we need is an *additional* condition that 
encapsulates "also pass use_dma32 for AGP devices because it avoids some 
weird coherency issue with 32-bit highmem that isn't worth trying to 
debug further".

Thanks,
Robin.