[Intel-gfx] [PATCH V3 4/7] mdev: introduce device specific ops
Jason Wang
jasowang at redhat.com
Thu Oct 17 08:30:43 UTC 2019
On 2019/10/17 上午12:53, Alex Williamson wrote:
> On Wed, 16 Oct 2019 15:31:25 +0000
> Parav Pandit <parav at mellanox.com> wrote:
>
>>> -----Original Message-----
>>> From: Cornelia Huck <cohuck at redhat.com>
>>> Sent: Wednesday, October 16, 2019 3:53 AM
>>> To: Parav Pandit <parav at mellanox.com>
>>> Cc: Alex Williamson <alex.williamson at redhat.com>; Jason Wang
>>> <jasowang at redhat.com>; kvm at vger.kernel.org; linux-s390 at vger.kernel.org;
>>> linux-kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; intel-
>>> gfx at lists.freedesktop.org; intel-gvt-dev at lists.freedesktop.org;
>>> kwankhede at nvidia.com; mst at redhat.com; tiwei.bie at intel.com;
>>> virtualization at lists.linux-foundation.org; netdev at vger.kernel.org;
>>> maxime.coquelin at redhat.com; cunming.liang at intel.com;
>>> zhihong.wang at intel.com; rob.miller at broadcom.com; xiao.w.wang at intel.com;
>>> haotian.wang at sifive.com; zhenyuw at linux.intel.com; zhi.a.wang at intel.com;
>>> jani.nikula at linux.intel.com; joonas.lahtinen at linux.intel.com;
>>> rodrigo.vivi at intel.com; airlied at linux.ie; daniel at ffwll.ch;
>>> farman at linux.ibm.com; pasic at linux.ibm.com; sebott at linux.ibm.com;
>>> oberpar at linux.ibm.com; heiko.carstens at de.ibm.com; gor at linux.ibm.com;
>>> borntraeger at de.ibm.com; akrowiak at linux.ibm.com; freude at linux.ibm.com;
>>> lingshan.zhu at intel.com; Ido Shamay <idos at mellanox.com>;
>>> eperezma at redhat.com; lulu at redhat.com; christophe.de.dinechin at gmail.com;
>>> kevin.tian at intel.com
>>> Subject: Re: [PATCH V3 4/7] mdev: introduce device specific ops
>>>
>>> On Wed, 16 Oct 2019 05:50:08 +0000
>>> Parav Pandit <parav at mellanox.com> wrote:
>>>
>>>> Hi Alex,
>>>>
>>>>> -----Original Message-----
>>>>> From: Alex Williamson <alex.williamson at redhat.com>
>>>>> Sent: Tuesday, October 15, 2019 12:27 PM
>>>>> To: Jason Wang <jasowang at redhat.com>
>>>>> Cc: Cornelia Huck <cohuck at redhat.com>; kvm at vger.kernel.org; linux-
>>>>> s390 at vger.kernel.org; linux-kernel at vger.kernel.org; dri-
>>>>> devel at lists.freedesktop.org; intel-gfx at lists.freedesktop.org;
>>>>> intel-gvt- dev at lists.freedesktop.org; kwankhede at nvidia.com;
>>>>> mst at redhat.com; tiwei.bie at intel.com;
>>>>> virtualization at lists.linux-foundation.org;
>>>>> netdev at vger.kernel.org; maxime.coquelin at redhat.com;
>>>>> cunming.liang at intel.com; zhihong.wang at intel.com;
>>>>> rob.miller at broadcom.com; xiao.w.wang at intel.com;
>>>>> haotian.wang at sifive.com; zhenyuw at linux.intel.com;
>>>>> zhi.a.wang at intel.com; jani.nikula at linux.intel.com;
>>>>> joonas.lahtinen at linux.intel.com; rodrigo.vivi at intel.com;
>>>>> airlied at linux.ie; daniel at ffwll.ch; farman at linux.ibm.com;
>>>>> pasic at linux.ibm.com; sebott at linux.ibm.com; oberpar at linux.ibm.com;
>>>>> heiko.carstens at de.ibm.com; gor at linux.ibm.com;
>>>>> borntraeger at de.ibm.com; akrowiak at linux.ibm.com;
>>>>> freude at linux.ibm.com; lingshan.zhu at intel.com; Ido Shamay
>>>>> <idos at mellanox.com>; eperezma at redhat.com; lulu at redhat.com; Parav
>>>>> Pandit <parav at mellanox.com>; christophe.de.dinechin at gmail.com;
>>>>> kevin.tian at intel.com
>>>>> Subject: Re: [PATCH V3 4/7] mdev: introduce device specific ops
>>>>>
>>>>> On Tue, 15 Oct 2019 20:17:01 +0800
>>>>> Jason Wang <jasowang at redhat.com> wrote:
>>>>>
>>>>>> On 2019/10/15 下午6:41, Cornelia Huck wrote:
>>>>>>> On Fri, 11 Oct 2019 16:15:54 +0800 Jason Wang
>>>>>>> <jasowang at redhat.com> wrote:
>>>
>>>>>>>> @@ -167,9 +176,10 @@ register itself with the mdev core driver::
>>>>>>>> extern int mdev_register_device(struct device *dev,
>>>>>>>> const struct
>>>>>>>> mdev_parent_ops *ops);
>>>>>>>>
>>>>>>>> -It is also required to specify the class_id through::
>>>>>>>> +It is also required to specify the class_id and device
>>>>>>>> +specific ops
>>>>> through::
>>>>>>>> - extern int mdev_set_class(struct device *dev, u16 id);
>>>>>>>> + extern int mdev_set_class(struct device *dev, u16 id,
>>>>>>>> + const void *ops);
>>>>>>> Apologies if that has already been discussed, but do we want a
>>>>>>> 1:1 relationship between id and ops, or can different devices
>>>>>>> with the same id register different ops?
>>>>>>
>>>>>> I think we have a N:1 mapping between id and ops, e.g we want both
>>>>>> virtio-mdev and vhost-mdev use a single set of device ops.
>>>>> The contents of the ops structure is essentially defined by the id,
>>>>> which is why I was leaning towards them being defined together.
>>>>> They are effectively interlocked, the id defines which mdev "endpoint"
>>>>> driver is loaded and that driver requires mdev_get_dev_ops() to
>>>>> return the structure required by the driver. I wish there was a way
>>>>> we could incorporate type checking here. We toyed with the idea of
>>>>> having the class in the same structure as the ops, but I think this
>>>>> approach was chosen for simplicity. We could still do something like:
>>>>>
>>>>> int mdev_set_class_struct(struct device *dev, const struct
>>>>> mdev_class_struct *class);
>>>>>
>>>>> struct mdev_class_struct {
>>>>> u16 id;
>>>>> union {
>>>>> struct vfio_mdev_ops vfio_ops;
>>>>> struct virtio_mdev_ops virtio_ops;
>>>>> };
>>>>> };
>>>>>
>>>>> Maybe even:
>>>>>
>>>>> struct vfio_mdev_ops *mdev_get_vfio_ops(struct mdev_device *mdev) {
>>>>> BUG_ON(mdev->class.id != MDEV_ID_VFIO);
>>>>> return &mdev->class.vfio_ops;
>>>>> }
>>>>>
>>>>> The match callback would of course just use the mdev->class.id value.
>>>>> Functionally equivalent, but maybe better type characteristics.
>>>>> Thanks,
>>>>>
>>>>> Alex
>>>> We have 3 use cases of mdev.
>>>> 1. current mdev binding to vfio_mdev
>>>> 2. mdev binding to virtio
>>>> 3. mdev binding to mlx5_core without dev_ops
>>>>
>>>> Also
>>>> (a) a given parent may serve multiple types of classes in future.
>>>> (b) number of classes may not likely explode, they will be handful of
>>>> them. (vfio_mdev, virtio)
>>>>
>>>> So, instead of making copies of this dev_ops pointer in each mdev, it is better
>>> to keep const multiple ops in their parent device.
>>>> Something like below,
>>>>
>>>> struct mdev_parent {
>>>> [..]
>>>> struct mdev_parent_ops *parent_ops; /* create, remove */
>>>> struct vfio_mdev_ops *vfio_ops; /* read,write, ioctl etc */
>>>> struct virtio_mdev_ops *virtio_ops; /* virtio ops */ };
>>> That feels a bit odd. Why should the parent carry pointers to every possible
>>> version of ops?
>>>
>> How many are we expecting? I envisioned handful of them.
>> It carries because parent is few, mdevs are several hundreds.
>> It makes sense to keep few copies, instead of several hundred copies
>> and it doesn't need to setup on every mdev creation.
> It does need setup on every mdev creation, it's just a matter of the
> scope, 'id and ops' vs 'id only' vs 'ops with implicit id'. The other
> argument is assuming a space vs time trade-off that I'm having a hard
> time judging is necessarily the correct approach. We potentially have
> better data locality in the mdev device structure vs the parent. The
> caching of the ops structure itself is separate from how we get to it.
> We might have hundreds of pointers to those ops structure, but the
> space trade-off might we worth it if they're on the same cacheline as
> the mdev device itself vs the indirection via the parent.
>
> I see a couple other drawbacks to the parent hosted ops pointers as
> well. First, it imposes that per parent there can only be one device
> ops structure per class id, but who's to say that different types of
> mdev devices for a given parent all make the same callbacks into the
> parent. For instance, for a vfio-mdev we already support the concept
> of an iommu backing device which makes the type1 iommu code behave a
> little differently. Those differences might be sufficient that the
> parent driver would register a different device ops structure for an
> iommu backed mdev vs a non-iommu backed device. The other drawback is
> that it implies a binary difference in all mdev parent drivers to add
> any new device ids. I know we don't guarantee binary compatibility,
> but it's rather ugly.
>
> Overall, I guess I tend to prefer Connie's proposal, the class id and
> structure are tied together and the parent driver is only responsible
> for one of them, the class id is hidden away in mdev-core and the mdev
> driver itself.
Will go this way.
>
>>>> const struct vfio_mdev_ops *mdev_get_vfio_ops(struct mdev_parent
>>>> *parent); const struct virtio_mdev_ops *mdev_get_virtio_ops(struct
>>>> mdev_parent *parent);
>>>>
>>>> This way,
>>>> (a) we have strong type check support
>>>> (b) ops pointer is not duplicated across several hundred mdev
>>>> devices, and don't have to set on every mdev creation
>>>> (c) all 3 classes of mdev are supported
>>>> (d) one extra symbol table entry used per ops type, but there are
>>>> not
>>> expected to grow a lot.
>>>> (e) multiple classes per single parent is still supported
>>>> (f) still extendible for multiple classes (well defined classes =
>>>> vfio, virtio, and vendor class)
>>> Yet another suggestion: have the class id derive from the function
>>> you use to set up the ops.
>>>
>>> void mdev_set_vfio_ops(struct mdev_device *mdev, const struct
>>> vfio_mdev_ops *vfio_ops) {
>>> mdev->device_ops = vfio_ops;
>>> mdev->class_id = MDEV_ID_VFIO;
>>> }
>>>
>>> void mdev_set_virtio_ops(struct mdev_device *mdev, const struct
>>> virtio_mdev_ops *virtio_ops) {
>>> mdev->device_ops = virtio_ops;
>>> mdev->class_id = MDEV_ID_VIRTIO;
>>> }
>>>
>>> void mdev_set_vhost_ops(struct mdev_device *mdev, const struct
>>> virtio_mdev_ops *virtio_ops) {
>>> mdev->device_ops = virtio_ops;
>>> mdev->class_id = MDEV_ID_VHOST;
>>> }
>>>
>>> void mdev_set_vendor_ops(struct mdev_device *mdev) /* no ops */ {
>>> mdev->class_id = MDEV_ID_VENDOR;
>>> }
> One further step towards making this hard to use incorrectly might be
> to return an error if class_id is already set. Thanks,
>
> Alex
I will add a BUG_ON() when class_id has already set.
Thanks
More information about the Intel-gfx
mailing list