VM madvise semantics

Wed Jun 4 15:39:04 UTC 2025

On Wed, 4 Jun 2025 at 17:30, Thomas Hellström
<thomas.hellstrom at linux.intel.com> wrote:
>
> On Wed, 2025-06-04 at 16:32 +0200, Simona Vetter wrote:
> > On Wed, Jun 04, 2025 at 02:57:50PM +0200, Thomas Hellström wrote:
> > > Hi!
> > >
> > > I'm starting an email thread to move forward the questions Himal
> > > had on
> > > the madvise semantics:
> > >
> > > 1) Whether to support an array of ops?
> > >
> > > IMO the VM_BIND implementation got very complicated due to this and
> > > the
> > > needed rollback support. Perhaps due to single op splitting turning
> > > to
> > > multiple ops it would've been hard to avoid that. Can we avoid
> > > supporting an array of ops for madvise? If so I'd vote for single
> > > op.
> >
> > Just seen this fly by and wondering why you even want rollback
> > support?
> >
> > Looking at core mm and the various m* syscalls that manipulate vma,
> > they
> > just bail out if things fail halfway through. Or at least don't make
> > any
> > guarantees, userspace gets to keep the pieces.
> >
> > And I think that's the semantics we want, because making global
> > promises
> > around rollback means we need atomicity, which means fine-grained
> > locking
> > is out. And that doesn't sound like the right way to design this
> > stuff to
> > me?
> >
> > Or is there some userspace requirement that wants rollback for some
> > strange reason?
>
> So the rollback was in the context of submitting an array of multiple
> VM_BINDs in a single IOCTL, and the nth one failed. That makes it
> troublesome for user-space to recover. And in particular in the
> situation where there was a signal delivery and the IOCTL is supposed
> to be restartable without any user-space-modification of the arguments.
>
> At that time after long discussions and looking at how, for example the
> write() syscall behaves, when -EINTR happens, no write is done and the
> syscall can be restarted from scratch. Following that we decided that
> if that multiple-bind hit an -EINTR, we'd need to roll back, and since
> we already had a rollback we'd do the same for all errors. IIRC
> blocking signals wasn't an option. But agreed this puts us in a
> situation where fine-grained locking is out with that interface, at
> least with multiple binds or ops that generate multiple sub-operations.

Hm yeah signal restarting sounds like a good reason to make this
support rollback. Or I guess, even more reasons to design the
interface so that rollback doesn't become nasty.

I guess one other option would be to make madvise strictly idempotent,
at that point you don't need to roll back for signals. It's just a bit
of duplicated work on restarting the ioctl.

> So my suggestion here is to avoid the array for madvise, also avoiding
> rollbacks if possible.

Makes sense.
-Sima

>
> /Thomas
>
>
>
> >
> > Cheers, Sima
> >
> > > 2) Purgeability. If it's not implemented yet we shouldn't merge an
> > > uapi
> > > for it, but ensure that it would at least be possible moving
> > > forward.
> > >
> > > 3) Multi-device. In the spirit of the above I guess it makes sense
> > > if
> > > UMD wants to select between VRAM and SRAM now to merge an interface
> > > that supports *only* that. When we add multi-device support we
> > > could
> > > perhaps add another op, or an extension to support agreed multi-
> > > device
> > > semantics. Even if this means rolling back to Himal's original
> > > suggestion of a preferred placement UAPI.
> > >
> > > Thoughts?
> > >
> > > Thomas
> > >
> >
>

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch