New subsystem for acceleration devices
Oded Gabbay
oded.gabbay at gmail.com
Sun Aug 7 11:25:33 UTC 2022
On Sun, Aug 7, 2022 at 9:43 AM Oded Gabbay <oded.gabbay at gmail.com> wrote:
>
> On Fri, Aug 5, 2022 at 3:22 AM Jason Gunthorpe <jgg at nvidia.com> wrote:
> >
> > On Thu, Aug 04, 2022 at 08:48:28PM +0300, Oded Gabbay wrote:
> >
> > > > The flip is true of DRM - DRM is pretty general. I bet I could
> > > > implement an RDMA device under DRM - but that doesn't mean it should
> > > > be done.
> > > >
> > > > My biggest concern is that this subsystem not turn into a back door
> > > > for a bunch of abuse of kernel APIs going forward. Though things
> > > > are
> > >
> > > How do you suggest to make sure it won't happen ?
> >
> > Well, if you launch the subsystem then it is your job to make sure it
> > doesn't happen - or endure the complaints :)
> Understood, I'll make sure there is no room for complaints.
> >
> > Accelerators have this nasty tendancy to become co-designed with their
> > SOCs in nasty intricate ways and then there is a desire to punch
> > through all the inconvenient layers to make it go faster.
> >
> > > > better now, we still see this in DRM where expediency or performance
> > > > justifies hacky shortcuts instead of good in-kernel architecture. At
> > > > least DRM has reliable and experienced review these days.
> > > Definitely. DRM has some parts that are really well written. For
> > > example, the whole char device handling with sysfs/debugfs and managed
> > > resources code.
> >
> > Arguably this should all be common code in the driver core/etc - what
> > value is a subsystem adding beyond that besides using it properly?
>
> I mainly see two things here:
>
> 1. If there is a subsystem which is responsible for creating and
> exposing the device character files, then there should be some code
> that connects between each device driver to that subsystem.
> i.e. There should be functions that each driver should call from its
> probe and release callback functions.
>
> Those functions should take care of the following:
> - Create metadata for the device, the device's minor(s) and the
> driver's ioctls table and driver's callback for file operations (both
> are common for all the driver's devices). Save all that metadata with
> proper locking.
> - Create the device char files themselves and supply file operations
> that will be called per each open/close/mmap/etc.
> - Keep track of all these objects' lifetime in regard to the device
> driver's lifetime, with proper handling for release.
> - Add common handling and entries of sysfs/debugfs for these devices
> with the ability for each device driver to add their own unique
> entries.
>
> 2. I think that you underestimate (due to your experience) the "using
> it properly" part... It is not so easy to do this properly for
> inexperienced kernel people. If we provide all the code I mentioned
> above, the device driver writer doesn't need to be aware of all these
> kernel APIs.
>
Two more points I thought of as examples for added value to be done in
common code:
1. Common code for handling dma-buf. Very similar to what was added a
year ago to rdma. This can be accompanied by a common ioctl to export
and import a dma-buf.
2. Common code to handle drivers that want to allow a single user at a
time to run open the device char file.
Oded
> >
> > It would be nice to at least identify something that could obviously
> > be common, like some kind of enumeration and metadata kind of stuff
> > (think like ethtool, devlink, rdma tool, nvemctl etc)
> Definitely. I think we can have at least one ioctl that will be common
> to all drivers from the start.
> A kind of information retrieval ioctl. There are many information
> points that I'm sure are common to most accelerators. We have an
> extensive information ioctl in the habanalabs driver and most of the
> information there is not habana specific imo.
> >
> > > I think that it is clear from my previous email what I intended to
> > > provide. A clean, simple framework for devices to register with, get
> > > services for the most basic stuff such as device char handling,
> > > sysfs/debugfs.
> >
> > This should all be trivially done by any driver using the core codes,
> > if you see gaps I'd ask why not improve the core code?
> >
> > > Later on, add more simple stuff such as memory manager
> > > and uapi for memory handling. I guess someone can say all that exists
> > > in drm, but like I said it exists in other subsystems as well.
> >
> > This makes sense to me, all accelerators need a way to set a memory
> > map, but on the other hand we are doing some very nifty stuff in this
> > area with iommufd and it might be very interesting to just have the
> > accelerator driver link to that API instead of building yet another
> > copy of pin_user_pages() code.. Especially with PASID likely becoming
> > part of any accelerator toolkit.
> Here I disagree with you. First of all, there are many relatively
> simple accelerators, especially in edge, where PASID is really not
> relevant.
> Second, even for the more sophisticated PCIe/CXL-based ones, PASID is
> not mandatory and I suspect that it won't be in 100% of those devices.
> But definitely that should be an alternative to the "classic" way of
> handling dma'able memory (pin_user_pages()).
> >
> > > I want to be perfectly honest and say there is nothing special here
> > > for AI. It's actually the opposite, it is a generic framework for
> > > compute only. Think of it as an easier path to upstream if you just
> > > want to do compute acceleration. Maybe in the future it will be more,
> > > but I can't predict the future.
> >
> > I can't either, and to be clear I'm only questioning the merit of a
> > "subsystem" eg with a struct class, rigerous uAPI, etc.
> >
> > The idea that these kinds of accel drivers deserve specialized review
> > makes sense to me, even if they remain as unorganized char
> > devices. Just understand that is what you are signing up for :)
> I understand. That's why I'm taking all your points very seriously.
> This is not a decision that should be taken lightly and I want to be
> sure most agree this is the correct way forward.
> My next step is to talk to Dave about it in-depth. In his other email
> he wrote some interesting ideas which I want to discuss with him.
>
> Maybe this is something that should be discussed in the kernel summit ?
>
> >
> > Jason
More information about the dri-devel
mailing list