[PATCH] drm/radeon: Fix screen corruption (v2)

Fri Jan 20 04:31:59 UTC 2023

On 2023-01-19 11:56, Krylov Michael wrote:
> On Thu, 15 Dec 2022 07:07:33 -0500
> Luben Tuikov <luben.tuikov at amd.com> wrote:
> 
>> On 2022-12-15 06:53, Robin Murphy wrote:
>>> On 2022-12-15 11:40, Luben Tuikov wrote:
>>>> On 2022-12-15 06:27, Christian König wrote:
>>>>> Am 15.12.22 um 11:19 schrieb Luben Tuikov:
>>>>>> On 2022-12-15 04:46, Christian König wrote:
>>>>>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>>>>>>>> On 2022-12-15 03:07, Christian König wrote:
>>>>>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>>>>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy
>>>>>>>>>>> <robin.murphy at arm.com> wrote:
>>>>>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP
>>>>>>>>>>>>> chips.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On older systems with little memory, for instance 1.5
>>>>>>>>>>>>> GiB, using an AGP chip,
>>>>>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask
>>>>>>>>>>>>> is 0x7FFFFFF, and
>>>>>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF
>>>>>>>>>>>>> < 0x7FFFFFFF, false. As such the result of this static
>>>>>>>>>>>>> inline isn't suitable for the last
>>>>>>>>>>>>> argument to ttm_device_init()--it simply needs to now
>>>>>>>>>>>>> whether to use GFP_DMA32
>>>>>>>>>>>>> when allocating DMA buffers.
>>>>>>>>>>>> This sounds wrong to me. If the issues happen on systems
>>>>>>>>>>>> without PAE it clearly can't have anything to with the
>>>>>>>>>>>> actual DMA address size. Not to mention that AFAICS 32-bit
>>>>>>>>>>>> x86 doesn't even have ZONE_DMA32, so GFP_DMA32 would be
>>>>>>>>>>>> functionally meaningless anyway. Although the reported
>>>>>>>>>>>> symptoms initially sounded like they could be caused by
>>>>>>>>>>>> DMA going to the wrong place, that is also equally
>>>>>>>>>>>> consistent with a loss of cache coherency.
>>>>>>>>>>>>
>>>>>>>>>>>> My (limited) understanding of AGP is that the GART can
>>>>>>>>>>>> effectively alias
>>>>>>>>>>>> memory to a second physical address, so I could well
>>>>>>>>>>>> believe that something somewhere in the driver stack needs
>>>>>>>>>>>> to perform some cache maintenance to avoid coherency
>>>>>>>>>>>> issues, and that in these particular setups whatever that
>>>>>>>>>>>> is might be assuming the memory is direct-mapped and thus
>>>>>>>>>>>> going wrong for highmem pages.
>>>>>>>>>>>>
>>>>>>>>>>>> So as I said before, I really think this is not about using
>>>>>>>>>>>> GFP_DMA32 at
>>>>>>>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>>>>>>>> One of the wonderful features of AGP is that it has to be
>>>>>>>>>>> used with uncached memory.  The aperture basically just
>>>>>>>>>>> provides a remapping of physical pages into a linear
>>>>>>>>>>> aperture that you point the GPU at.  TTM has to jump
>>>>>>>>>>> through quite a few hoops to get uncached memory in the
>>>>>>>>>>> first place, so it's likely that that somehow isn't
>>>>>>>>>>> compatible with HIGHMEM.  Can you get uncached HIGHMEM?
>>>>>>>>>> I guess in principle yes, if you're careful not to use
>>>>>>>>>> regular kmap()/kmap_atomic(), and always use
>>>>>>>>>> pgprot_noncached() for userspace/vmalloc mappings, but
>>>>>>>>>> clearly that leaves lots of scope for slipping up.
>>>>>>>>> I theory we should do exactly that in TTM, but we have very
>>>>>>>>> few users who actually still exercise that functionality.
>>>>>>>>>
>>>>>>>>>> Working backwards from primitives like set_memory_uc(), I
>>>>>>>>>> see various paths in TTM where manipulating the caching
>>>>>>>>>> state is skipped for highmem pages, but I wouldn't even know
>>>>>>>>>> where to start looking for whether the right state is
>>>>>>>>>> propagated to all the places where they might eventually be
>>>>>>>>>> mapped somewhere.
>>>>>>>>> The tt object has the caching state for the pages and
>>>>>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co
>>>>>>>>> for the userspace/vmalloc mappings.
>>>>>>>>>
>>>>>>>> The point of this patch is that dma_addressing_limited() is
>>>>>>>> unsuitable as the last parameter to ttm_pool_init(), since if
>>>>>>>> it is "false"--as it is in this particular case--then TTM ends
>>>>>>>> up using HIGHUSER, and we get the screen corruption.
>>>>>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
>>>>>>> Well I would rather say that dma_addressing_limited() works,
>>>>>>> but the default value from dma_get_required_mask() is broken.
>>>>>>>
>>>>>> dma_get_required_mask() for his setup of 1.5 GiB of memory
>>>>>> returns 0x7FFFFFF.
>>>>>
>>>>> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB
>>>>> addressable memory (27 bits set)? Or is there another F missing?
>>>>
>>>> Yeah, I'm missing an F--it is correctly described at the top of
>>>> the thread above, i.e. in the commit of v2 patch.
>>>>
>>>> 0x7FFF_FFFF, which seems correct, no?
>>>>
>>>>>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in
>>>>>> radeon_device_init().
>>>>>>
>>>>>>> 32 bits only work with bounce buffers and we can't use those on
>>>>>>> graphics hardware.
>>>>>>>
>>>>>>>> Is there an objection to this patch, if it fixes the screen
>>>>>>>> corruption?
>>>>>>> Not from my side, but fixing the underlying issues would be
>>>>>>> better I think.
>>>>>>>
>>>>>> Have they been identified?
>>>>>
>>>>> I'm not 100% sure. I think by using GFP_DMA32 we just work around
>>>>> the issue somehow.
>>>>
>>>> Right. Using GFP_DMA32, we don't touch high-mem. I was looking at
>>>> the DRM code trying to understand what we do when GFP_DMA32 is not
>>>> set, and the immediate thing I see is that we set GFP_HIGHUSER
>>>> when use_dma32 is unset in the device struct. (Then I got down to
>>>> the caching attributes...)
>>>>
>>>> It's be nice if we can find the actual issue--what else would it
>>>> show us that needs fixing...?
>>>>
>>>> So what do we do with this patch?
>>>>
>>>> Shouldn't leave it in a limbo--some OSes ship their kernel
>>>> with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with
>>>> addressing limitations") wholly reverted.
>>>
>>> Removing dma_addressing_limited() is still wrong, for the reasons
>>> given in that commit. What we need is an *additional* condition
>>> that encapsulates "also pass use_dma32 for AGP devices because it
>>> avoids some weird coherency issue with 32-bit highmem that isn't
>>> worth trying to debug further".
>>
>> Yes, you had a patch earlier which did exactly that--why not push
>> that patch?
>>
>> Q: If host memory is 1.5 GiB, i.e. mask of 0x7FFF_FFFF, but the
>> device's mask is 0xFFFF_FFFF, shouldn't we use GFP_DMA32, instead of
>> GFP_HIGHUSER?
>>
>> Regards,
>> Luben
>>
> 
> Sorry for being pushy, but given that we are so close to the finish, is
> there any chance that one of the variants will be merged to the kernel
> sources any time soon and if so, can I help with testing? I would really
> benefit from this patch making it into Debian 12.

Well, there's a couple of patches addressing this problem here in this thread.
If consensus is reached, perhaps one of them could be picked up by upstream.
-- 
Regards,
Luben