simplify the mdev interface v6

Eric Farman farman at linux.ibm.com
Wed Jul 20 02:41:49 UTC 2022


On Tue, 2022-07-19 at 13:49 -0400, Eric Farman wrote:
> On Tue, 2022-07-19 at 09:26 -0600, Alex Williamson wrote:
> > On Tue, 19 Jul 2022 16:49:28 +0200
> > Christoph Hellwig <hch at lst.de> wrote:
> > 
> > > On Mon, Jul 18, 2022 at 10:01:40PM -0400, Eric Farman wrote:
> > > > I'll get the problem with struct subchannel [1] sorted out in
> > > > the
> > > > next
> > > > couple of days. This series breaks vfio-ccw in its current form
> > > > (see
> > > > reply to patch 14), but even with that addressed the placement
> > > > of
> > > > all
> > > > these other mdev structs needs to be handled differently.  
> > > 
> > > Alex, any preference if I should just fix the number instances
> > > checking
> > > with either an incremental patch or a resend, or wait for this
> > > ccw
> > > rework?
> > 
> > Since it's the last patch, let's at least just respin that patch
> > rather
> > than break and fix.  I'd like to make sure Eric is ok to shift
> > around
> > structures as a follow-up or make a proposal how this series should
> > change though. 
> 
> I'd hoped to have that proposal today, but I don't have much
> confidence
> in it yet as this series (with the fix on the last patch) is still
> crashing my system. Will get something out as soon as I'm able.

The solution I envision thus far does two things:

 - Move the struct mdev_parent and its friends out of struct
subchannel, and into struct vfio_ccw_private. This struct is allocated
just prior to the call to mdev_register_device/_parent, and released
with the mdev_unregister. It's also a device-specific struct linked
from the device-agnostic subchannel.
 - Add a kref to struct vfio_ccw_private. The mdev_parent currently has
one, which is now unnecessary since it's embedded in another struct,
but it leaves vfio_ccw_private rather racy.

I suspect the second item (or something similar) is needed anyway,
because Alex' tree + this series crashes frequently in (usually)
mdev_remove. I haven't found an explanation for how we get in this
state, but admittedly didn't spent a lot of time on them since the
proposed changes to struct subchannel are a non-starter. The other
crashes were always in something that's almost certainly a victim of
something else, like kmalloc-related stuff in net/skbuff.

With the above, the crashes out of the vfio-ccw stack disappear, and
things work a bit better. But those random kmalloc-related crashes
persist. I guess I'll pick those up tomorrow.

Eric



More information about the intel-gvt-dev mailing list