[PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment
Thomas Zimmermann
tzimmermann at suse.de
Mon Jan 20 07:58:20 UTC 2025
Hi
Am 18.01.25 um 03:37 schrieb Marek Olšák:
[...]
>
> 3) Implementing DRM_FORMAT_MOD_LINEAR as having 256B pitch and offset
> alignment. This is what we do today. Even if Intel and some AMD chips
> can do 64B or 128B alignment, they overalign to 256B. With so many
> AMD+NV laptops out there, NV is probably next, unless they already do
> this in the closed source driver.
The dumb-buffer series currently being discussed on dri-devel also
touches handling of scanline pitches. THe actual value varies with each
driver. Should dumb buffers use a default pitch alignment of 256 on al
hardware?
Best regards
Thomas
>
> Marek
>
> On Fri, Jan 17, 2025 at 9:18 AM Simona Vetter <simona.vetter at ffwll.ch>
> wrote:
>
> On Wed, Jan 15, 2025 at 12:20:07PM +0000, Daniel Stone wrote:
> > On Wed, 15 Jan 2025 at 04:05, Marek Olšák <maraeo at gmail.com> wrote:
> > > On Tue, Jan 14, 2025 at 12:58 PM Daniel Stone
> <daniel at fooishbar.org> wrote:
> > >> AMD hardware is the only hardware I know of which doesn't support
> > >> overaligning. Say (not hypothetically) we have a GPU and a
> display
> > >> controller which have a minimum pitch alignment of 32 bytes, no
> > >> minimum height alignment, minimum 32-byte offset alignment,
> minimum
> > >> pitch of 32 bytes, and minimum image size of 32 bytes.
> > >>
> > >> To be maximally compatible, we'd have to expose 28 (pitch
> align) * 32
> > >> (height align) * 28 (offset align) * 28 (min pitch) * 28 (min
> size) ==
> > >> 19668992 individual modifiers when queried, which is 150MB
> per format
> > >> just to store the list of modifiers.
> > >
> > > Maximum compatibility is not required nor expected.
> > >
> > > In your case, only 1 linear modifier would be added for that
> driver, which is: [5 / 0 / 5 / 5 / 5]
> > >
> > > Then if, and only if, compatibility with other devices is
> desired, the driver developer could look at drivers of those other
> devices and determine which other linear modifiers to add. Ideally
> it would be just 1, so there would be a total of 2.
> >
> > Mali (actually two DRM drivers and sort of three Mesa drivers)
> can be
> > paired with any one of 11 KMS drivers (really 12 given that one is a
> > very independent subdriver), and something like 20 different codecs
> > (at least 12 different vendors; I didn't bother counting the actual
> > subdrivers which are all quite different). The VeriSilicon Hantro G2
> > codec driver is shipped by five (that we know of) vendors who
> all have
> > their own KMS drivers. One of those is in the Rockchip RK3588, which
> > (don't ask me why) ships six different codec blocks, with three
> > different drivers, from two different vendors - that's before
> you even
> > get to things like the ISP and NPU which really need to be sharing
> > buffers properly without copies.
> >
> > So yeah, working widely without having to encode specific knowledge
> > everywhere isn't a nice-to-have, it's a hard baseline requirement.
> >
> > >> > DRM_FORMAT_MOD_LINEAR needs to go because it prevents apps
> from detecting whether 2 devices have 0 compatible memory layouts,
> which is a useful thing to know.
> > >>
> > >> I get the point, but again, we have the exact same problem
> today with
> > >> placement, i.e. some devices require buffers to be in or not
> be in
> > >> VRAM or GTT or sysram for some uses, and some devices require
> physical
> > >> contiguity. Solving that problem would require an additional
> 4 bits,
> > >> which brings us to 2.3GB of modifiers per format with the current
> > >> scheme. Not super viable.
> > >
> > > Userspace doesn't determine placement. The kernel memory
> management can move buffers between heaps to accommodate sharing
> between devices as needed. This is a problem in which userspace
> has no say.
> >
> > It really does though!
> >
> > None of these devices use TTM with placement moves, and doing that
> > isn't a fix either. Embedded systems have so low memory
> bandwidth that
> > the difference between choosing the wrong placement and moving it
> > later vs. having the right placement to begin with is the difference
> > between 'this does not work' and 'great, I can ship this'. Which is
> > great if you're a consultancy trying to get paid, but tbh I'd rather
> > work on more interesting things.
> >
> > So yeah, userspace does very much choose the placement. On most
> > drivers, this is either by 'knowing' which device to allocate
> from, or
> > passing a flag to your allocation ioctl. For newer drivers though,
> > there's the dma-heap allocation mechanism which is now upstream and
> > the blessed path, for which userspace needs to explicitly know the
> > desired placement (and must, because fixing it up later is a
> > non-starter).
> >
> > Given that we need to keep LINEAR ~forever for ABI reasons, and
> > because there's no reasonably workable alternative, let's
> abandon the
> > idea of abandoning LINEAR, and try to work with out-of-band
> signalling
> > instead.
> >
> > One idea is to actually pursue the allocator idea and express this
> > properly through constraints. I'd be super in favour of this,
> > unsurprisingly, because it allows us to solve a whole pile of other
> > problems, rather than the extremely narrow AMD/Intel interop case.
> >
> > Another idea for the out-of-band signalling would be to add
> > information-only modifiers, like
> > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_EQ(256), or
> > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_GE(32). But then that doesn't
> really
> > work at all with how people actually use modifiers: as the doc
> > describes, userspace takes and intersects the declared modifier
> lists
> > and passes the result through. The intersection of LINEAR+EQ256 and
> > LINEAR+GE32 is LINEAR, so a userspace that follows the rules
> will just
> > drop the hints on the floor and pick whatever linear allocation it
> > feels like.
>
> Yeah I think latest when we also take into account logical image
> size (not
> just pitch) with stuff like it needs to be aligned to 2 pixels in both
> directions just using modifiers falls apart.
>
> And the problem with linear, unlike device modifiers is that we
> can't just
> throw up our hands and enumerate the handful of formats in actual
> use for
> interop. There's so many produces and consumers of linera buffers
> (Daniel's list above missed camera/image processors) that save
> assumption
> is that anything really can happen.
>
> > I think I've just talked myself into the position that passing
> > allocator constraints together with modifiers is the only way to
> > actually solve this problem, at least without creating the sort of
> > technical debt that meant we spent years fixing up implicit/explicit
> > modifier interactions when it really should've just been adding a
> > !)@*(#$ u64 next to the u32.
>
> Yeah probably.
>
> Otoh I know inertia, so I am tempted to go with the oddball
> LINEAR_VEDNOR_A_VENDOR_B_INTEROP thing and stretch the runway for
> a bit.
> And we just assign those as we go as a very special thing, and the
> drivers
> that support it would prefer it above just LINEAR if there's no other
> common format left.
>
> Also makes it really obvious what all userspace/kernel driver enabling
> would be needed to justify such a modifier.
> -Sima
>
> >
> > Cheers,
> > Daniel
>
> --
> Simona Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
More information about the dri-devel
mailing list