[Intel-gfx] [PATCH V3 4/7] mdev: introduce device specific ops

Parav Pandit parav at mellanox.com
Thu Oct 17 16:29:47 UTC 2019



> -----Original Message-----
> From: Alex Williamson <alex.williamson at redhat.com>
> Sent: Wednesday, October 16, 2019 5:38 PM
> To: Parav Pandit <parav at mellanox.com>
> Cc: Cornelia Huck <cohuck at redhat.com>; Jason Wang
> <jasowang at redhat.com>; kvm at vger.kernel.org; linux-s390 at vger.kernel.org;
> linux-kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; intel-
> gfx at lists.freedesktop.org; intel-gvt-dev at lists.freedesktop.org;
> kwankhede at nvidia.com; mst at redhat.com; tiwei.bie at intel.com;
> virtualization at lists.linux-foundation.org; netdev at vger.kernel.org;
> maxime.coquelin at redhat.com; cunming.liang at intel.com;
> zhihong.wang at intel.com; rob.miller at broadcom.com; xiao.w.wang at intel.com;
> haotian.wang at sifive.com; zhenyuw at linux.intel.com; zhi.a.wang at intel.com;
> jani.nikula at linux.intel.com; joonas.lahtinen at linux.intel.com;
> rodrigo.vivi at intel.com; airlied at linux.ie; daniel at ffwll.ch;
> farman at linux.ibm.com; pasic at linux.ibm.com; sebott at linux.ibm.com;
> oberpar at linux.ibm.com; heiko.carstens at de.ibm.com; gor at linux.ibm.com;
> borntraeger at de.ibm.com; akrowiak at linux.ibm.com; freude at linux.ibm.com;
> lingshan.zhu at intel.com; Ido Shamay <idos at mellanox.com>;
> eperezma at redhat.com; lulu at redhat.com; christophe.de.dinechin at gmail.com;
> kevin.tian at intel.com
> Subject: Re: [PATCH V3 4/7] mdev: introduce device specific ops
> 
> On Wed, 16 Oct 2019 20:48:06 +0000
> Parav Pandit <parav at mellanox.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson at redhat.com> On Wed, 16 Oct
> > > 2019 15:31:25 +0000 Parav Pandit <parav at mellanox.com> wrote:
> > > > > From: Cornelia Huck <cohuck at redhat.com> Parav Pandit
> > > > > <parav at mellanox.com> wrote:
> > > > > > > From: Alex Williamson <alex.williamson at redhat.com> On Tue,
> > > > > > > 15 Oct 2019 20:17:01 +0800 Jason Wang <jasowang at redhat.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > On 2019/10/15 下午6:41, Cornelia Huck wrote:
> > > > > > > > > Apologies if that has already been discussed, but do we
> > > > > > > > > want a
> > > > > > > > > 1:1 relationship between id and ops, or can different
> > > > > > > > > devices with the same id register different ops?
> > > > > > > >
> > > > > > > >
> > > > > > > > I think we have a N:1 mapping between id and ops, e.g we
> > > > > > > > want both virtio-mdev and vhost-mdev use a single set of device
> ops.
> > > > > > >
> > > > > > > The contents of the ops structure is essentially defined by
> > > > > > > the id, which is why I was leaning towards them being defined
> together.
> > > > > > > They are effectively interlocked, the id defines which mdev
> "endpoint"
> > > > > > > driver is loaded and that driver requires mdev_get_dev_ops()
> > > > > > > to return the structure required by the driver.  I wish
> > > > > > > there was a way we could incorporate type checking here.  We
> > > > > > > toyed with the idea of having the class in the same
> > > > > > > structure as the ops, but I think this approach was chosen
> > > > > > > for simplicity.  We could still do
> > > something like:
> > > > > > >
> > > > > > > int mdev_set_class_struct(struct device *dev, const struct
> > > > > > > mdev_class_struct *class);
> > > > > > >
> > > > > > > struct mdev_class_struct {
> > > > > > > 	u16	id;
> > > > > > > 	union {
> > > > > > > 		struct vfio_mdev_ops vfio_ops;
> > > > > > > 		struct virtio_mdev_ops virtio_ops;
> > > > > > > 	};
> > > > > > > };
> > > > > > >
> > > > > > > Maybe even:
> > > > > > >
> > > > > > > struct vfio_mdev_ops *mdev_get_vfio_ops(struct mdev_device
> > > > > > > *mdev)
> > > {
> > > > > > > 	BUG_ON(mdev->class.id != MDEV_ID_VFIO);
> > > > > > > 	return &mdev->class.vfio_ops; }
> > > > > > >
> > > > > > > The match callback would of course just use the mdev->class.id
> value.
> > > > > > > Functionally equivalent, but maybe better type characteristics.
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Alex
> > > > > >
> > > > > > We have 3 use cases of mdev.
> > > > > > 1. current mdev binding to vfio_mdev 2. mdev binding to virtio 3.
> > > > > > mdev binding to mlx5_core without dev_ops
> > > > > >
> > > > > > Also
> > > > > > (a) a given parent may serve multiple types of classes in future.
> > > > > > (b) number of classes may not likely explode, they will be
> > > > > > handful of them. (vfio_mdev, virtio)
> > > > > >
> > > > > > So, instead of making copies of this dev_ops pointer in each
> > > > > > mdev, it is better
> > > > > to keep const multiple ops in their parent device.
> > > > > > Something like below,
> > > > > >
> > > > > > struct mdev_parent {
> > > > > > 	[..]
> > > > > > 	struct mdev_parent_ops *parent_ops; /* create, remove */
> > > > > > 	struct vfio_mdev_ops *vfio_ops; /* read,write, ioctl etc */
> > > > > > 	struct virtio_mdev_ops *virtio_ops; /* virtio ops */ };
> > > > >
> > > > > That feels a bit odd. Why should the parent carry pointers to
> > > > > every possible version of ops?
> > > > >
> > > > How many are we expecting? I envisioned handful of them.
> > > > It carries because parent is few, mdevs are several hundreds.
> > > > It makes sense to keep few copies, instead of several hundred
> > > > copies and it doesn't need to setup on every mdev creation.
> > >
> > > It does need setup on every mdev creation, it's just a matter of the
> > > scope, 'id and ops' vs 'id only' vs 'ops with implicit id'.  The
> > > other argument is assuming a space vs time trade-off that I'm having
> > > a hard time judging is necessarily the correct approach.  We
> > > potentially have better data locality in the mdev device structure
> > > vs the parent.  The caching of the ops structure itself is separate from how
> we get to it.
> > > We might have hundreds of pointers to those ops structure, but the
> > > space trade-off might we worth it if they're on the same cacheline
> > > as the mdev device itself vs the indirection via the parent.
> > >
> > > I see a couple other drawbacks to the parent hosted ops pointers as
> > > well.  First, it imposes that per parent there can only be one
> > > device ops structure per class id, but who's to say that different types of
> mdev devices for a given parent all
> > > make the same callbacks into the parent.
> > We should have driver who intent to use different device ops for each
> > device with single parent that supports this claim.
> 
> Why?  Are we not allowed to identify restrictions implied by a given proposal if
> we don't yet have a user?  I can't subscribe to that.
> 
Of course we can identify the restrictions.
But if there is no current user, who is it restricting?
A future use case? Why can't that change be done when an actual user needs it?

> >  For instance, for a vfio-mdev we
> > > already support the concept of an iommu backing device which makes
> > > the type1 iommu code behave a little differently.  Those differences
> > > might be sufficient that the parent driver would register a
> > > different device ops structure for an iommu backed mdev
> > > vs a non-iommu backed device.
> > I am not sure if this is really worth it.
> > Which driver should I look which has if-else conditions sprinkled in
> > these callbacks for different iommu types? If majority code is same,
> > adding few branches looks ok vs creating new ops all together. So I
> > need to educate myself first with the driver which desires this. Any
> > pointers?
> 
> While the iommu backed vfio-mdevs is real, the example that a parent driver
> might choose to register different device ops based on that is theoretical.
> Parent drivers don't have that option today, but as we're making the device ops
> more modular and have stumbled onto this benefit of per device ops, perhaps it
> might be useful.  The "is it worth it"
> question can also be asked of the claimed benefits of a set of shared devices
> ops per parent.
> 
> > > The other
> > > drawback is that it implies a binary difference in all mdev parent
> > > drivers to add any new device ids.  I know we don't guarantee binary
> > > compatibility, but it's rather ugly.
> > >
> > Yeah, we don't support and there is no requirement for binary
> > compatibility.
> >
> > > Overall, I guess I tend to prefer Connie's proposal, the class id
> > > and structure are tied together and the parent driver is only
> > > responsible for one of them, the class id is hidden away in
> > > mdev-core and the mdev driver itself.
> > I am fine with Cornelia's approach.
> > It comes with small cost of additional symbols and it is probably ok.
> > I just find it over engineered given handful of dev ops types.
> 
> If the device ops types are limited, then so are the additional symbols.  Those
> symbols also add a degree of explicitness to the interface 
Explicitness is good to avoid error.
Such explicitness exist in both the methods if ops are per parent or per device.

(ie. register this
> device as vfio-mdev with this set of vfio-mdev-ops, versus register this device as
> vfio-mdev... which uses the vfio-mdev-ops over in the parent ops structure).  I
> don't really see what's over-engineered about former. 
Vendor driver is expected to setup mdev_device ops field during create() callback for every device vs setting up once during parent setup time.
Currently we have only 3 types of ops, but there is no single driver identified who intent to set more than one type per parent for two different device.
And for this theoretical use case we are adding 3 exported symbols for setting it.

Anyways, I am ok with this approach too, I just want to see APIs evolving based on actual use case.

> I like to think of Rusty's
> old interface guidelines, particularly the one about making it difficult to use
> incorrectly for these sorts of interfaces.  Thanks,
If the call is made only one time per parent, compare to 100 times per mdev, how can it go wrong or how is it difficult to use?

> 
> Alex


More information about the Intel-gfx mailing list