[PATCH 4/5] drm/damage-helper: Do partial updates if framebuffer has not been changed

Wed Sep 21 10:18:59 UTC 2022

On Wed, Sep 21, 2022 at 11:39:51AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 21.09.22 um 10:37 schrieb Ville Syrjälä:
> > On Wed, Sep 21, 2022 at 09:59:10AM +0200, Thomas Zimmermann wrote:
> >> Hi Ville
> >>
> >> Am 20.09.22 um 16:31 schrieb Ville Syrjälä:
> >>> On Tue, Sep 20, 2022 at 03:56:18PM +0200, Thomas Zimmermann wrote:
> >>>> Set partial updates on a plane if the framebuffer has not been changed
> >>>> on an atomic commit. If such a plane has damage clips, the driver will
> >>>> use them; otherwise the update is effectively empty. Planes that change
> >>>> their framebuffer still perform a full update.
> >>>
> >>> I have a feeling you're sort of papering over some inefficient
> >>> userspace that's asking updates on planes that don't actually
> >>> need them. I'm not sure adding more code for that is a particularly
> >>> great idea. Wouldn't it be better to just fix the userspace code?
> >>
> >> Some more context might be in order:
> >>
> >> The ast driver currently uses VRAM helpers, which leads to many
> >> problems; including blank screens from the low amount of video memory.
> >> The best solution is to switch SHMEM with BOs on system meory. The video
> >> memory is only the internal buffer for scanout. This update involves a
> >> mempcy from the BO to video memory.
> >>
> >> Ast's hardware is really slow, so it makes sense to reduce the updates
> >> to video memory to a minimum. There's support for cursor planes, which
> >> really makes a difference here.
> >>
> >> But doing any cursor-plane update (e.g., moving, animation) with SHMEM
> >> and the current damage helpers always results in a full-screen memcpy
> >> from the BO to the video memory for the primary plane. As the ast
> >> hardware is slow, performance goes down to a an estimated 5 frame per
> >> seconds. After moving the mouse, one can watch the mouse cursor follow
> >> along the screen for the next seconds. Userspace sends the movement
> >> information and DRM slowly processes them. The same can be observed for
> >> cursor animation.
> >>
> >> The problem is that each change to the cursor plane results in an
> >> atomic_update call for the primary plane. Not having damage information
> >> for the plane just means 'update everything'. Not a problem if redrawing
> >> consists of reprogramming the scanout address. For a memcpy it's not
> >> feasible.
> >>
> >> So can this be fixed in userspace? No realistically IMHO. I've seen this
> >> problem on Weston, Wayland-Gnome and X11-Gnome. And they are all
> >> problematic in their own way. (That's why there are multiple patches
> >> needed.) For example, X11 uses the legacy mouse ioctl, which one of the
> >> patches fixes. The other userspace needs the other heuristics. A
> >> potential userspace fix would mean to always set empty-damage
> >> information on all planes that don't get an update. And I don't consider
> >> X11 fixable TBH.
> >>
> >>>
> >>> Any property change on the plane could need a full plane
> >>> update as well (eg. some color mangement stuff etc.). So you'd
> >>> have to keep adding exceptions to that list whenever new
> >>> properties are added.
> >>
> >> It's not about the occasional change of a property. It's the constant
> >> sluggish redraw when the interface is supposed to be snappy, such as
> >> mouse interaction.
> >>
> >>>
> >>> And I think the semantics of the ioctl(s) has so far been that
> >>> basically any page flip (whether or not it actually changes
> >>> the fb) still ends up doing whatever flushing is needed to
> >>> guarantee the fb contents are up to date on the screen (if
> >>> someone sneaked in some front buffer rendering in between).
> >>> Ie. you could even use basically a nop page flip in place
> >>> of dirtyfb.
> >>
> >> That's why full updates are still the default. Only in cases where it's
> >> save to avoid an update of unaffected planes, we do so.
> > 
> > Umm. Maybe I misread the patch in my haste but it seemed
> > that you consider the thing undamaged if the fb pointer
> > was unchanged. That goes against what I wrote above.
> 
> I try to resolve that in patch 5. The fb_dirty flag signals that the 
> framebuffer's content changed in some way. Patches 4 and 5 could be seen 
> as one change, but that might overload the resulting patch. (Maybe it's 
> not well designed overall. :/)
> 
> > 
> > Though I don't really know if a there is software relying on
> > that behaviuor. I suppose one idea could be to keep that
> > behaviour for the legacy ioctls but not for atomic. Ee. any
> > fb directly specified in a legacy setcrtc/setplane/pageflip
> > is always considered fully damaged. But including an the same
> 
> Assuming the specified fb to be damaged seems like a non-issue to me.
> 
> The problem is with the other framebuffers: if userspace sends us a 
> CURSOR_MOVE ioctl, we can safely assume the cursor fb to be damaged. But 
> we don't want the primary plane's fb to be damaged. Or else we'd redraw 
> the full primary plane.
> 
> 
> > fb in the atomic ioctl does not imply damage. That would
> > mean atomic has to rely on specifying damage explicitly, and
> > any userspace that doesn't do that will be broken.
> 
> The problem again is not in the damage information on planes that 
> legitimately need a redraw. It's all the other planes that are being 
> redrawn as well. Or is there some scenario that I don't see?

I thought we're talking about eg. a cursor update that also
includes the primary plane because apparently userspace is lazy.

I think what you're saying is that currently there is no
damage information for the primary plane so you're attempting
to infer it based on whether the fb property changed or not.

And what I was saying is that IIRC historically specifying
the same fb again has still implied full damage. Changing
that behaviour may or may not break exising userspace.

> 
> > 
> > Or we could introduce a client cap for it I guess, but that
> > would require (minimal) userspace changes.
> > 
> >>
> >> I know that we don't give performance guarantees to userspace. But using
> >> cursor/overlay planes should be faster then not using them. I think
> >> that's a reasonable assumption for userspace to make.
> >>
> >>>
> >>> Another thing the ioctls have always done is actually perform
> >>> the commit whether anything changed or nor. That is, they
> >>> still do all the same the same vblanky stuff and the commit
> >>> takes the same amount of time. Not sure if your idea is
> >>> to potentially short circuit the entire thing and make it
> >>> take no time at all?
> >>
> >> The patches don't change the overall commit logics. All they do is to
> >> set damage updates to a size of 0 if a plane reliably does not need an
> >> update.
> > 
> > What I gathered is that you seemed to only consider the fb
> > contents as something that needs damage handling. That is not
> > the case for stuff like eDP PSR and DSI command mode displays
> > where we need to upload a new full frame whenever also the
> > non-damaged fb contents would get transformed by the hardware
> > on the way to the remote frambuffer. That would mean any change
> > in eg. scanout coordinates, color management, etc.
> 
> There is support for changing scanout coordinates in 
> drm_atomic_helper_damage_iter_init() in patch 2. In general, maybe the 
> heuristic needs to be stricter to only handle updates that only change 
> damage.
> 
> For now, the problem only happens after converting ast to SHMEM. All the 
> other SHMEM-based drivers use a single plane where the problem doesn't 
> happen; or only reprogram the scanout address, which is fast. If the 
> damage-handling logic imposes problems on other drivers, some of it 
> could possibly be moved into ast itself.

Maybe we have two clearly separate classes of uses case:
1. devices where only damage to the fb contents matter (eg. some kind of
   shadow fb that is the same size as the real fb).
2. devices where everything about the scanout process matters (eg. PSR)
?

Maybe there should be helpers to deal with just the first case,
and then some more helpers (or just driver code) to pile the rest
on top as well when needed?

-- 
Ville Syrjälä
Intel