[PATCH] drm/amdgpu: Raven: don't allow mixing GTT and VRAM

Thu Jul 17 15:34:27 UTC 2025

On 17.07.25 16:58, Alex Deucher wrote:
> On Wed, Jul 16, 2025 at 8:13 PM Brian Geffon <bgeffon at google.com> wrote:
>> On Wed, Jul 16, 2025 at 5:03 PM Alex Deucher <alexdeucher at gmail.com> wrote:
>>> On Wed, Jul 16, 2025 at 12:40 PM Brian Geffon <bgeffon at google.com> wrote:
>>>> On Wed, Jul 16, 2025 at 12:33 PM Alex Deucher <alexdeucher at gmail.com> wrote:
>>>>> On Wed, Jul 16, 2025 at 12:18 PM Brian Geffon <bgeffon at google.com> wrote:
>>>>>>
>>>>>> Commit 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
>>>>>> allowed for newer ASICs to mix GTT and VRAM, this change also noted that
>>>>>> some older boards, such as Stoney and Carrizo do not support this.
>>>>>> It appears that at least one additional ASIC does not support this which
>>>>>> is Raven.
>>>>>>
>>>>>> We observed this issue when migrating a device from a 5.4 to 6.6 kernel
>>>>>> and have confirmed that Raven also needs to be excluded from mixing GTT
>>>>>> and VRAM.
>>>>>
>>>>> Can you elaborate a bit on what the problem is?  For carrizo and
>>>>> stoney this is a hardware limitation (all display buffers need to be
>>>>> in GTT or VRAM, but not both).  Raven and newer don't have this
>>>>> limitation and we tested raven pretty extensively at the time.
>>>>
>>>> Thanks for taking the time to look. We have automated testing and a
>>>> few igt gpu tools tests failed and after debugging we found that
>>>> commit 81d0bcf99009 is what introduced the failures on this hardware
>>>> on 6.1+ kernels. The specific tests that fail are kms_async_flips and
>>>> kms_plane_alpha_blend, excluding Raven from this sharing of GTT and
>>>> VRAM buffers resolves the issue.
>>>
>>> + Harry and Leo
>>>
>>> This sounds like the memory placement issue we discussed last week.
>>> In that case, the issue is related to where the buffer ends up when we
>>> try to do an async flip.  In that case, we can't do an async flip
>>> without a full modeset if the buffers locations are different than the
>>> last modeset because we need to update more than just the buffer base
>>> addresses.  This change works around that limitation by always forcing
>>> display buffers into VRAM or GTT.  Adding raven to this case may fix
>>> those tests but will make the overall experience worse because we'll
>>> end up effectively not being able to not fully utilize both gtt and
>>> vram for display which would reintroduce all of the problems fixed by
>>> 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)").
>>
>> Thanks Alex, the thing is, we only observe this on Raven boards, why
>> would Raven only be impacted by this? It would seem that all devices
>> would have this issue, no? Also, I'm not familiar with how
> 
> It depends on memory pressure and available memory in each pool.
> E.g., initially the display buffer is in VRAM when the initial mode
> set happens.  The watermarks, etc. are set for that scenario.  One of
> the next frames ends up in a pool different than the original.  Now
> the buffer is in GTT.  The async flip interface does a fast validation
> to try and flip as soon as possible, but that validation fails because
> the watermarks need to be updated which requires a full modeset.
> 
> It's tricky to fix because you don't want to use the worst case
> watermarks all the time because that will limit the number available
> display options and you don't want to force everything to a particular
> memory pool because that will limit the amount of memory that can be
> used for display (which is what the patch in question fixed).  Ideally
> the caller would do a test commit before the page flip to determine
> whether or not it would succeed before issuing it and then we'd have
> some feedback mechanism to tell the caller that the commit would fail
> due to buffer placement so it would do a full modeset instead.  We
> discussed this feedback mechanism last week at the display hackfest.

(A separate test commit may not buy anything, the compositor can just try it and react to errors)

Most compositors won't want to set the DRM_MODE_ATOMIC_ALLOW_MODESET flag for a "simple flip", since it could result in user-visible artifacts such as the display intermittently blanking.

If the driver can make it work without user-visible artifacts (e.g. by reprogramming watermarks), it should just do so without DRM_MODE_ATOMIC_ALLOW_MODESET. If not, it should return an error (and possibly more information via the future mechanism).

P.S. Without DRM_MODE_PAGE_FLIP_ASYNC, the driver must always be able to flip at least the primary plane (it can require disabling overlay planes) without DRM_MODE_ATOMIC_ALLOW_MODESET, or the compositor could end up in a corner it can't get out of.

-- 
Earthling Michel Dänzer       \        GNOME / Xwayland / Mesa developer
https://redhat.com             \               Libre software enthusiast