[PATCH] drm/radeon: Fix screen corruption (v2)

Thu Dec 15 09:46:12 UTC 2022

Am 15.12.22 um 10:08 schrieb Luben Tuikov:
> On 2022-12-15 03:07, Christian König wrote:
>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy at arm.com>
>>>> wrote:
>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>
>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>> AGP chip,
>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>> 0x7FFFFFF, and
>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>> the last
>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>> use GFP_DMA32
>>>>>> when allocating DMA buffers.
>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>> going to the wrong place, that is also equally consistent with a
>>>>> loss of
>>>>> cache coherency.
>>>>>
>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>> alias
>>>>> memory to a second physical address, so I could well believe that
>>>>> something somewhere in the driver stack needs to perform some cache
>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>> and thus going wrong for highmem pages.
>>>>>
>>>>> So as I said before, I really think this is not about using
>>>>> GFP_DMA32 at
>>>>> all, but about *not* using GFP_HIGHUSER.
>>>> One of the wonderful features of AGP is that it has to be used with
>>>> uncached memory.  The aperture basically just provides a remapping of
>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>> has to jump through quite a few hoops to get uncached memory in the
>>>> first place, so it's likely that that somehow isn't compatible with
>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>> I guess in principle yes, if you're careful not to use regular
>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>> slipping up.
>> I theory we should do exactly that in TTM, but we have very few users
>> who actually still exercise that functionality.
>>
>>> Working backwards from primitives like set_memory_uc(), I see various
>>> paths in TTM where manipulating the caching state is skipped for
>>> highmem pages, but I wouldn't even know where to start looking for
>>> whether the right state is propagated to all the places where they
>>> might eventually be mapped somewhere.
>> The tt object has the caching state for the pages and
>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>> userspace/vmalloc mappings.
>>
> The point of this patch is that dma_addressing_limited() is unsuitable as
> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())

Well I would rather say that dma_addressing_limited() works, but the 
default value from dma_get_required_mask() is broken.

32 bits only work with bounce buffers and we can't use those on graphics 
hardware.

> Is there an objection to this patch, if it fixes the screen corruption?

Not from my side, but fixing the underlying issues would be better I think.

> Or does TTM need fixing, in that what we really need is to specify whether
> caching is desired and/or DMA32 when we allocate a TTM pool (ttm_pool_init(),
> called from ttm_device_init(), called from radeon_ttm_init.c)?

Could be, but it's more likely that the problem is in the DMA layer 
because we fail to recognize that the device can't access all of the memory.

Regards,
Christian.

>
> Regards,
> Luben
>