[PATCH 3/6] amd/display: fail on cursor plane without an underlying plane

Kazlauskas, Nicholas nicholas.kazlauskas at amd.com
Mon Mar 8 20:38:48 UTC 2021


On 2021-03-08 3:18 p.m., Daniel Vetter wrote:
> On Fri, Mar 5, 2021 at 10:24 AM Michel Dänzer <michel at daenzer.net> wrote:
>>
>> On 2021-03-04 7:26 p.m., Kazlauskas, Nicholas wrote:
>>> On 2021-03-04 10:35 a.m., Michel Dänzer wrote:
>>>> On 2021-03-04 4:09 p.m., Kazlauskas, Nicholas wrote:
>>>>> On 2021-03-04 4:05 a.m., Michel Dänzer wrote:
>>>>>> On 2021-03-03 8:17 p.m., Daniel Vetter wrote:
>>>>>>> On Wed, Mar 3, 2021 at 5:53 PM Michel Dänzer <michel at daenzer.net>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Moreover, in the same scenario plus an overlay plane enabled with a
>>>>>>>> HW cursor compatible format, if the FB bound to the overlay plane is
>>>>>>>> destroyed, the common DRM code will attempt to disable the overlay
>>>>>>>> plane, but dm_check_crtc_cursor will reject that now. I can't
>>>>>>>> remember
>>>>>>>> exactly what the result is, but AFAIR it's not pretty.
>>>>>>>
>>>>>>> CRTC gets disabled instead. That's why we went with the "always
>>>>>>> require primary plane" hack. I think the only solution here would be
>>>>>>> to enable the primary plane (but not in userspace-visible state, so
>>>>>>> this needs to be done in the dc derived state objects only) that scans
>>>>>>> out black any time we're in such a situation with cursor with no
>>>>>>> planes.
>>>>>>
>>>>>> This is about a scenario described by Nicholas earlier:
>>>>>>
>>>>>> Cursor Plane - ARGB8888
>>>>>>
>>>>>> Overlay Plane - ARGB8888 Desktop/UI with cutout for video
>>>>>>
>>>>>> Primary Plane - NV12 video
>>>>>>
>>>>>> And destroying the FB bound to the overlay plane. The fallback to
>>>>>> disable
>>>>>> the CRTC in atomic_remove_fb only kicks in for the primary plane, so it
>>>>>> wouldn't in this case and would fail. Which would in turn trigger the
>>>>>> WARN in drm_framebuffer_remove (and leave the overlay plane scanning
>>>>>> out
>>>>>> from freed memory?).
>>>>>>
>>>>>>
>>>>>> The cleanest solution might be not to allow any formats incompatible
>>>>>> with
>>>>>> the HW cursor for the primary plane.
>>>>>
>>>>> Legacy X userspace doesn't use overlays but Chrome OS does.
>>>>>
>>>>> This would regress ChromeOS MPO support because it relies on the NV12
>>>>> video plane being on the bottom.
>>>>
>>>> Could it use the NV12 overlay plane below the ARGB primary plane?
>>>
>>> Plane ordering was previously undefined in DRM so we have userspace that
>>> assumes overlays are on top.
>>
>> They can still be by default?
>>
>>> Today we have the z-order property in DRM that defines where it is in
>>> the stack, so technically it could but we'd also be regressing existing
>>> behavior on Chrome OS today.
>>
>> That's unfortunate, but might be the least bad choice overall.
>>
>> BTW, doesn't Chrome OS try to disable the ARGB overlay plane while there are no UI elements to display? If it does, this series might break it anyway (if the cursor plane can be enabled while the ARGB overlay plane is off).
>>
>>
>>>>> When ChromeOS disables MPO it doesn't do it plane by plane, it does it
>>>>> in one go from NV12+ARGB8888 -> ARGB88888.
>>>>
>>>> Even so, we cannot expect all user space to do the same, and we cannot
>>>> allow any user space to trigger a WARN and scanout from freed memory.
>>>
>>> The WARN doesn't trigger because there's still a reference on the FB -
>>
>> The WARN triggers if atomic_remove_fb returns an error, which is the case if it can't disable an overlay plane. I actually hit this with IGT tests while working on b836a274b797 "drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is" (I initially tried allowing the cursor plane to be enabled together with an overlay plane while the primary plane is off).
>>
>>> the reference held by DRM since it's still scanning out the overlay.
>>> Userspace can't reclaim this memory with another buffer allocation
>>> because it's still in use.
>>
>> Good point, so at least there's no scanout of freed memory. Even so, the overlay plane continues displaying contents which user space apparently doesn't want to be displayed anymore.
> 
> Hm I do wonder how much we need to care for this. If you use planes,
> you better use TEST_ONLY in atomic to it's full extend (including
> cursor, if that's a real plane, which it is for every driver except
> msm/mdp4). If userspace screws this up and worse, shuts of planes with
> an RMFB, I think it's not entirely unreasonable to claim that it
> should keep the pieces.
> 
> So maybe we should refine the WARN_ON to not trigger if other planes
> than crtc->primary and crtc->cursor are enabled right now?
> 
>>> It's a little odd that a disable commit can fail, but I don't think
>>> there's anything in DRM core that specifies that this can't happen for
>>> planes.
>>
>> I'd say it's more than just a little odd. :) Being unable to disable an overlay plane seems very surprising, and could make it tricky for user space (not to mention core DRM code like atomic_remove_fb) to find a solution.
>>
>> I'd suggest the amdgpu DM code should rather virtualize the KMS API planes somehow such that an overlay plane can always be disabled. While this might incur some short-term pain, it will likely save more pain overall in the long term.
> 
> Yeah I think this amd dc cursor problem is the first case where
> removing a plane can make things worse.
> 
> Since the hw is what it is, can't we put a transparent plane with
> cursor compatible format in for the case where stuff would fail? So
> not fully virtualize the planes (since I don't see how that helps),
> but just keeping the plane going underneath it all.
> 
> I think that's also what Ville did for i915/gen2, which has the
> requirement that the primary plane must always be on iirc.
> 
> Ofc since amd display doesn't go through pagetables this needs some
> vram, but maybe you can use the scalers to make the requirement a bit
> less drastic.
> -Daniel >

The cursor framebuffer would have to be used as the pipe's primary 
framebuffer in this case because the hardware has nothing underneath to 
scan out along with it.

I'm not sure the atomic async update interface would work well in the 
virtualization case if we had to fallback from using the regular cursor 
programming to using full pipe programming.

This would burn extra power from the secondary pipe but it would also 
add cursor stuttering into the mix because async updates would be blocked.

My preference would be to continue to reject the commits and not 
implement the fallback path because of how suboptimal the whole thing 
is, but this whole thing is just kind of a mess.

Regards,
Nicholas Kazlauskas


More information about the amd-gfx mailing list