[PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

Mon Jan 20 07:58:20 UTC 2025

Hi

Am 18.01.25 um 03:37 schrieb Marek Olšák:
[...]
>
> 3) Implementing DRM_FORMAT_MOD_LINEAR as having 256B pitch and offset 
> alignment. This is what we do today. Even if Intel and some AMD chips 
> can do 64B or 128B alignment, they overalign to 256B. With so many 
> AMD+NV laptops out there, NV is probably next, unless they already do 
> this in the closed source driver.

The dumb-buffer series currently being discussed on dri-devel also 
touches handling of scanline pitches. THe actual value varies with each 
driver.  Should dumb buffers use a default pitch alignment of 256 on al 
hardware?

Best regards
Thomas

>
> Marek
>
> On Fri, Jan 17, 2025 at 9:18 AM Simona Vetter <simona.vetter at ffwll.ch> 
> wrote:
>
>     On Wed, Jan 15, 2025 at 12:20:07PM +0000, Daniel Stone wrote:
>     > On Wed, 15 Jan 2025 at 04:05, Marek Olšák <maraeo at gmail.com> wrote:
>     > > On Tue, Jan 14, 2025 at 12:58 PM Daniel Stone
>     <daniel at fooishbar.org> wrote:
>     > >> AMD hardware is the only hardware I know of which doesn't support
>     > >> overaligning. Say (not hypothetically) we have a GPU and a
>     display
>     > >> controller which have a minimum pitch alignment of 32 bytes, no
>     > >> minimum height alignment, minimum 32-byte offset alignment,
>     minimum
>     > >> pitch of 32 bytes, and minimum image size of 32 bytes.
>     > >>
>     > >> To be maximally compatible, we'd have to expose 28 (pitch
>     align) * 32
>     > >> (height align) * 28 (offset align) * 28 (min pitch) * 28 (min
>     size) ==
>     > >> 19668992 individual modifiers when queried, which is 150MB
>     per format
>     > >> just to store the list of modifiers.
>     > >
>     > > Maximum compatibility is not required nor expected.
>     > >
>     > > In your case, only 1 linear modifier would be added for that
>     driver, which is: [5 / 0 / 5 / 5 / 5]
>     > >
>     > > Then if, and only if, compatibility with other devices is
>     desired, the driver developer could look at drivers of those other
>     devices and determine which other linear modifiers to add. Ideally
>     it would be just 1, so there would be a total of 2.
>     >
>     > Mali (actually two DRM drivers and sort of three Mesa drivers)
>     can be
>     > paired with any one of 11 KMS drivers (really 12 given that one is a
>     > very independent subdriver), and something like 20 different codecs
>     > (at least 12 different vendors; I didn't bother counting the actual
>     > subdrivers which are all quite different). The VeriSilicon Hantro G2
>     > codec driver is shipped by five (that we know of) vendors who
>     all have
>     > their own KMS drivers. One of those is in the Rockchip RK3588, which
>     > (don't ask me why) ships six different codec blocks, with three
>     > different drivers, from two different vendors - that's before
>     you even
>     > get to things like the ISP and NPU which really need to be sharing
>     > buffers properly without copies.
>     >
>     > So yeah, working widely without having to encode specific knowledge
>     > everywhere isn't a nice-to-have, it's a hard baseline requirement.
>     >
>     > >> > DRM_FORMAT_MOD_LINEAR needs to go because it prevents apps
>     from detecting whether 2 devices have 0 compatible memory layouts,
>     which is a useful thing to know.
>     > >>
>     > >> I get the point, but again, we have the exact same problem
>     today with
>     > >> placement, i.e. some devices require buffers to be in or not
>     be in
>     > >> VRAM or GTT or sysram for some uses, and some devices require
>     physical
>     > >> contiguity. Solving that problem would require an additional
>     4 bits,
>     > >> which brings us to 2.3GB of modifiers per format with the current
>     > >> scheme. Not super viable.
>     > >
>     > > Userspace doesn't determine placement. The kernel memory
>     management can move buffers between heaps to accommodate sharing
>     between devices as needed. This is a problem in which userspace
>     has no say.
>     >
>     > It really does though!
>     >
>     > None of these devices use TTM with placement moves, and doing that
>     > isn't a fix either. Embedded systems have so low memory
>     bandwidth that
>     > the difference between choosing the wrong placement and moving it
>     > later vs. having the right placement to begin with is the difference
>     > between 'this does not work' and 'great, I can ship this'. Which is
>     > great if you're a consultancy trying to get paid, but tbh I'd rather
>     > work on more interesting things.
>     >
>     > So yeah, userspace does very much choose the placement. On most
>     > drivers, this is either by 'knowing' which device to allocate
>     from, or
>     > passing a flag to your allocation ioctl. For newer drivers though,
>     > there's the dma-heap allocation mechanism which is now upstream and
>     > the blessed path, for which userspace needs to explicitly know the
>     > desired placement (and must, because fixing it up later is a
>     > non-starter).
>     >
>     > Given that we need to keep LINEAR ~forever for ABI reasons, and
>     > because there's no reasonably workable alternative, let's
>     abandon the
>     > idea of abandoning LINEAR, and try to work with out-of-band
>     signalling
>     > instead.
>     >
>     > One idea is to actually pursue the allocator idea and express this
>     > properly through constraints. I'd be super in favour of this,
>     > unsurprisingly, because it allows us to solve a whole pile of other
>     > problems, rather than the extremely narrow AMD/Intel interop case.
>     >
>     > Another idea for the out-of-band signalling would be to add
>     > information-only modifiers, like
>     > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_EQ(256), or
>     > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_GE(32). But then that doesn't
>     really
>     > work at all with how people actually use modifiers: as the doc
>     > describes, userspace takes and intersects the declared modifier
>     lists
>     > and passes the result through. The intersection of LINEAR+EQ256 and
>     > LINEAR+GE32 is LINEAR, so a userspace that follows the rules
>     will just
>     > drop the hints on the floor and pick whatever linear allocation it
>     > feels like.
>
>     Yeah I think latest when we also take into account logical image
>     size (not
>     just pitch) with stuff like it needs to be aligned to 2 pixels in both
>     directions just using modifiers falls apart.
>
>     And the problem with linear, unlike device modifiers is that we
>     can't just
>     throw up our hands and enumerate the handful of formats in actual
>     use for
>     interop. There's so many produces and consumers of linera buffers
>     (Daniel's list above missed camera/image processors) that save
>     assumption
>     is that anything really can happen.
>
>     > I think I've just talked myself into the position that passing
>     > allocator constraints together with modifiers is the only way to
>     > actually solve this problem, at least without creating the sort of
>     > technical debt that meant we spent years fixing up implicit/explicit
>     > modifier interactions when it really should've just been adding a
>     > !)@*(#$ u64 next to the u32.
>
>     Yeah probably.
>
>     Otoh I know inertia, so I am tempted to go with the oddball
>     LINEAR_VEDNOR_A_VENDOR_B_INTEROP thing and stretch the runway for
>     a bit.
>     And we just assign those as we go as a very special thing, and the
>     drivers
>     that support it would prefer it above just LINEAR if there's no other
>     common format left.
>
>     Also makes it really obvious what all userspace/kernel driver enabling
>     would be needed to justify such a modifier.
>     -Sima
>
>     >
>     > Cheers,
>     > Daniel
>
>     -- 
>     Simona Vetter
>     Software Engineer, Intel Corporation
>     http://blog.ffwll.ch
>

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)