VM madvise semantics

Wed Jun 4 15:30:16 UTC 2025

On Wed, 2025-06-04 at 16:32 +0200, Simona Vetter wrote:
> On Wed, Jun 04, 2025 at 02:57:50PM +0200, Thomas Hellström wrote:
> > Hi!
> > 
> > I'm starting an email thread to move forward the questions Himal
> > had on
> > the madvise semantics:
> > 
> > 1) Whether to support an array of ops? 
> > 
> > IMO the VM_BIND implementation got very complicated due to this and
> > the
> > needed rollback support. Perhaps due to single op splitting turning
> > to
> > multiple ops it would've been hard to avoid that. Can we avoid
> > supporting an array of ops for madvise? If so I'd vote for single
> > op.
> 
> Just seen this fly by and wondering why you even want rollback
> support?
> 
> Looking at core mm and the various m* syscalls that manipulate vma,
> they
> just bail out if things fail halfway through. Or at least don't make
> any
> guarantees, userspace gets to keep the pieces.
> 
> And I think that's the semantics we want, because making global
> promises
> around rollback means we need atomicity, which means fine-grained
> locking
> is out. And that doesn't sound like the right way to design this
> stuff to
> me?
> 
> Or is there some userspace requirement that wants rollback for some
> strange reason?

So the rollback was in the context of submitting an array of multiple
VM_BINDs in a single IOCTL, and the nth one failed. That makes it
troublesome for user-space to recover. And in particular in the
situation where there was a signal delivery and the IOCTL is supposed
to be restartable without any user-space-modification of the arguments.

At that time after long discussions and looking at how, for example the
write() syscall behaves, when -EINTR happens, no write is done and the
syscall can be restarted from scratch. Following that we decided that
if that multiple-bind hit an -EINTR, we'd need to roll back, and since
we already had a rollback we'd do the same for all errors. IIRC
blocking signals wasn't an option. But agreed this puts us in a
situation where fine-grained locking is out with that interface, at
least with multiple binds or ops that generate multiple sub-operations.

So my suggestion here is to avoid the array for madvise, also avoiding
rollbacks if possible.

/Thomas

> 
> Cheers, Sima
> 
> > 2) Purgeability. If it's not implemented yet we shouldn't merge an
> > uapi
> > for it, but ensure that it would at least be possible moving
> > forward. 
> > 
> > 3) Multi-device. In the spirit of the above I guess it makes sense
> > if
> > UMD wants to select between VRAM and SRAM now to merge an interface
> > that supports *only* that. When we add multi-device support we
> > could
> > perhaps add another op, or an extension to support agreed multi-
> > device
> > semantics. Even if this means rolling back to Himal's original
> > suggestion of a preferred placement UAPI.
> > 
> > Thoughts?
> > 
> > Thomas
> > 
>