[PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment
Simona Vetter
simona.vetter at ffwll.ch
Mon Jan 20 18:41:36 UTC 2025
On Mon, Jan 20, 2025 at 08:58:20AM +0100, Thomas Zimmermann wrote:
> Hi
>
>
> Am 18.01.25 um 03:37 schrieb Marek Olšák:
> [...]
> >
> > 3) Implementing DRM_FORMAT_MOD_LINEAR as having 256B pitch and offset
> > alignment. This is what we do today. Even if Intel and some AMD chips
> > can do 64B or 128B alignment, they overalign to 256B. With so many
> > AMD+NV laptops out there, NV is probably next, unless they already do
> > this in the closed source driver.
I don't think this works, or at least not any better than the current
linear modifier. There's way too many users of that thing out there that I
think you can realistically redefine it.
Adding new linear modifiers and then preferring those above the old LINEAR
(if there is one left after all the wittling down to a common set) is I
think the only option that really works to fix something.
> The dumb-buffer series currently being discussed on dri-devel also touches
> handling of scanline pitches. THe actual value varies with each driver.
> Should dumb buffers use a default pitch alignment of 256 on al hardware?
If you go with new modifiers then there could be shared code that dtrt by
just looking at the modifier list.
-Sima
> Best regards
> Thomas
>
> >
> > Marek
> >
> > On Fri, Jan 17, 2025 at 9:18 AM Simona Vetter <simona.vetter at ffwll.ch>
> > wrote:
> >
> > On Wed, Jan 15, 2025 at 12:20:07PM +0000, Daniel Stone wrote:
> > > On Wed, 15 Jan 2025 at 04:05, Marek Olšák <maraeo at gmail.com> wrote:
> > > > On Tue, Jan 14, 2025 at 12:58 PM Daniel Stone
> > <daniel at fooishbar.org> wrote:
> > > >> AMD hardware is the only hardware I know of which doesn't support
> > > >> overaligning. Say (not hypothetically) we have a GPU and a
> > display
> > > >> controller which have a minimum pitch alignment of 32 bytes, no
> > > >> minimum height alignment, minimum 32-byte offset alignment,
> > minimum
> > > >> pitch of 32 bytes, and minimum image size of 32 bytes.
> > > >>
> > > >> To be maximally compatible, we'd have to expose 28 (pitch
> > align) * 32
> > > >> (height align) * 28 (offset align) * 28 (min pitch) * 28 (min
> > size) ==
> > > >> 19668992 individual modifiers when queried, which is 150MB
> > per format
> > > >> just to store the list of modifiers.
> > > >
> > > > Maximum compatibility is not required nor expected.
> > > >
> > > > In your case, only 1 linear modifier would be added for that
> > driver, which is: [5 / 0 / 5 / 5 / 5]
> > > >
> > > > Then if, and only if, compatibility with other devices is
> > desired, the driver developer could look at drivers of those other
> > devices and determine which other linear modifiers to add. Ideally
> > it would be just 1, so there would be a total of 2.
> > >
> > > Mali (actually two DRM drivers and sort of three Mesa drivers)
> > can be
> > > paired with any one of 11 KMS drivers (really 12 given that one is a
> > > very independent subdriver), and something like 20 different codecs
> > > (at least 12 different vendors; I didn't bother counting the actual
> > > subdrivers which are all quite different). The VeriSilicon Hantro G2
> > > codec driver is shipped by five (that we know of) vendors who
> > all have
> > > their own KMS drivers. One of those is in the Rockchip RK3588, which
> > > (don't ask me why) ships six different codec blocks, with three
> > > different drivers, from two different vendors - that's before
> > you even
> > > get to things like the ISP and NPU which really need to be sharing
> > > buffers properly without copies.
> > >
> > > So yeah, working widely without having to encode specific knowledge
> > > everywhere isn't a nice-to-have, it's a hard baseline requirement.
> > >
> > > >> > DRM_FORMAT_MOD_LINEAR needs to go because it prevents apps
> > from detecting whether 2 devices have 0 compatible memory layouts,
> > which is a useful thing to know.
> > > >>
> > > >> I get the point, but again, we have the exact same problem
> > today with
> > > >> placement, i.e. some devices require buffers to be in or not
> > be in
> > > >> VRAM or GTT or sysram for some uses, and some devices require
> > physical
> > > >> contiguity. Solving that problem would require an additional
> > 4 bits,
> > > >> which brings us to 2.3GB of modifiers per format with the current
> > > >> scheme. Not super viable.
> > > >
> > > > Userspace doesn't determine placement. The kernel memory
> > management can move buffers between heaps to accommodate sharing
> > between devices as needed. This is a problem in which userspace
> > has no say.
> > >
> > > It really does though!
> > >
> > > None of these devices use TTM with placement moves, and doing that
> > > isn't a fix either. Embedded systems have so low memory
> > bandwidth that
> > > the difference between choosing the wrong placement and moving it
> > > later vs. having the right placement to begin with is the difference
> > > between 'this does not work' and 'great, I can ship this'. Which is
> > > great if you're a consultancy trying to get paid, but tbh I'd rather
> > > work on more interesting things.
> > >
> > > So yeah, userspace does very much choose the placement. On most
> > > drivers, this is either by 'knowing' which device to allocate
> > from, or
> > > passing a flag to your allocation ioctl. For newer drivers though,
> > > there's the dma-heap allocation mechanism which is now upstream and
> > > the blessed path, for which userspace needs to explicitly know the
> > > desired placement (and must, because fixing it up later is a
> > > non-starter).
> > >
> > > Given that we need to keep LINEAR ~forever for ABI reasons, and
> > > because there's no reasonably workable alternative, let's
> > abandon the
> > > idea of abandoning LINEAR, and try to work with out-of-band
> > signalling
> > > instead.
> > >
> > > One idea is to actually pursue the allocator idea and express this
> > > properly through constraints. I'd be super in favour of this,
> > > unsurprisingly, because it allows us to solve a whole pile of other
> > > problems, rather than the extremely narrow AMD/Intel interop case.
> > >
> > > Another idea for the out-of-band signalling would be to add
> > > information-only modifiers, like
> > > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_EQ(256), or
> > > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_GE(32). But then that doesn't
> > really
> > > work at all with how people actually use modifiers: as the doc
> > > describes, userspace takes and intersects the declared modifier
> > lists
> > > and passes the result through. The intersection of LINEAR+EQ256 and
> > > LINEAR+GE32 is LINEAR, so a userspace that follows the rules
> > will just
> > > drop the hints on the floor and pick whatever linear allocation it
> > > feels like.
> >
> > Yeah I think latest when we also take into account logical image
> > size (not
> > just pitch) with stuff like it needs to be aligned to 2 pixels in both
> > directions just using modifiers falls apart.
> >
> > And the problem with linear, unlike device modifiers is that we
> > can't just
> > throw up our hands and enumerate the handful of formats in actual
> > use for
> > interop. There's so many produces and consumers of linera buffers
> > (Daniel's list above missed camera/image processors) that save
> > assumption
> > is that anything really can happen.
> >
> > > I think I've just talked myself into the position that passing
> > > allocator constraints together with modifiers is the only way to
> > > actually solve this problem, at least without creating the sort of
> > > technical debt that meant we spent years fixing up implicit/explicit
> > > modifier interactions when it really should've just been adding a
> > > !)@*(#$ u64 next to the u32.
> >
> > Yeah probably.
> >
> > Otoh I know inertia, so I am tempted to go with the oddball
> > LINEAR_VEDNOR_A_VENDOR_B_INTEROP thing and stretch the runway for
> > a bit.
> > And we just assign those as we go as a very special thing, and the
> > drivers
> > that support it would prefer it above just LINEAR if there's no other
> > common format left.
> >
> > Also makes it really obvious what all userspace/kernel driver enabling
> > would be needed to justify such a modifier.
> > -Sima
> >
> > >
> > > Cheers,
> > > Daniel
> >
> > -- Simona Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> >
>
> --
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
>
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the dri-devel
mailing list