[Linaro-mm-sig] [PATCH v4 0/2] Add p2p via dmabuf to habanalabs

Tue Jul 6 13:45:36 UTC 2021

On Tue, Jul 6, 2021 at 4:17 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>
> On Tue, Jul 6, 2021 at 2:46 PM Oded Gabbay <oded.gabbay at gmail.com> wrote:
> >
> > On Tue, Jul 6, 2021 at 3:23 PM Daniel Vetter <daniel at ffwll.ch> wrote:
> > >
> > > On Tue, Jul 06, 2021 at 02:21:10PM +0200, Christoph Hellwig wrote:
> > > > On Tue, Jul 06, 2021 at 10:40:37AM +0200, Daniel Vetter wrote:
> > > > > > Greg, I hope this will be good enough for you to merge this code.
> > > > >
> > > > > So we're officially going to use dri-devel for technical details review
> > > > > and then Greg for merging so we don't have to deal with other merge
> > > > > criteria dri-devel folks have?
> > > > >
> > > > > I don't expect anything less by now, but it does make the original claim
> > > > > that drivers/misc will not step all over accelerators folks a complete
> > > > > farce under the totally-not-a-gpu banner.
> > > > >
> > > > > This essentially means that for any other accelerator stack that doesn't
> > > > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses
> > > > > other gpu driver stuff, you can just send it to Greg and it's good to go.
> > > > >
> > > > > There's quite a lot of these floating around actually (and many do have
> > > > > semi-open runtimes, like habanalabs have now too, just not open enough to
> > > > > be actually useful). It's going to be absolutely lovely having to explain
> > > > > to these companies in background chats why habanalabs gets away with their
> > > > > stack and they don't.
> > > >
> > > > FYI, I fully agree with Daniel here.  Habanlabs needs to open up their
> > > > runtime if they want to push any additional feature in the kernel.
> > > > The current situation is not sustainable.
> > Well, that's like, your opinion...
> >
> > >
> > > Before anyone replies: The runtime is open, the compiler is still closed.
> > > This has become the new default for accel driver submissions, I think
> > > mostly because all the interesting bits for non-3d accelerators are in the
> > > accel ISA, and no longer in the runtime. So vendors are fairly happy to
> > > throw in the runtime as a freebie.
> > >
> > > It's still incomplete, and it's still useless if you want to actually hack
> > > on the driver stack.
> > > -Daniel
> > > --
> > I don't understand what's not sustainable here.
> >
> > There is zero code inside the driver that communicates or interacts
> > with our TPC code (TPC is the Tensor Processing Core).
> > Even submitting works to the TPC is done via a generic queue
> > interface. And that queue IP is common between all our engines
> > (TPC/DMA/NIC). The driver provides all the specs of that queue IP,
> > because the driver's code is handling that queue. But why is the TPC
> > compiler code even relevant here ?
>
> Can I use the hw how it's intended to be used without it?
You can use the h/w with the userspace stack we are providing in our
github repos + website.
Part of the userspace stack is open sourced, part is closed source.
And I'm actively working on opening up more stuff as we go along.

>
> If the answer is no, then essentially what you're doing with your
> upstream driver is getting all the benefits of an upstream driver,
> while upstream gets nothing. We can't use your stack, not as-is. Sure
> we can use the queue, but we can't actually submit anything
> interesting. And I'm pretty sure the point of your hw is to do more
> than submit no-op packets to a queue.
>
> This is all "I want my cake and eat it too" approach to upstreaming,
> and it's totally fine attitude to have, but if you don't see why
> there's maybe an different side to it then I don't get what you're
> arguing. Upstream isn't free lunch for nothing.
>
> Frankly I'm starting to assume you're arguing this all in bad faith
> just because habanalabds doesn't want to actually have an open driver
> stack, so any attack is good, no matter what. Which is also what
> everyone else does who submits their accel driver to upstream, and
> which gets us back to the starting point of this sub-thread of me
> really appreciation how this will improve background discussions going
> forward for everyone.
>
> Like if the requirement for accel drivers truly is that you can submit
> a dummy command to the queues then I have about 5-10 drivers at least
> I could merge instantly. For something like the intel gpu driver it
> would be about 50 lines of code (including all the structure boiler
> plate the ioctls require)in userspace to submit a dummy queue command.
> GPU and accel vendors would really love that, because it would allow
> them to freeload on upstream and do essentially nothing in return.
>
> And we'd end up with an unmaintainable disaster of a gpu or well
> accelerator subsystem because there's nothing you can change or
> improve because all the really useful bits of the stack are closed.
> And ofc that's not any companies problem anymore, so ofc you with the
> habanalabs hat on don't care and call this *extreme*.
>
> > btw, you can today see our TPC code at
> > https://github.com/HabanaAI/Habana_Custom_Kernel
> > There is a link there to the TPC user guide and link to download the
> > LLVM compiler.
>
> I got stuck clicking links before I found the source for that llvm
> compiler. Can you give me a direct link to the repo with sourcecode
> instead please?
The source code for the LLVM compiler is not available yet. That's one
of the parts I'm working on getting in the open.
Having said that, I don't think (and I'm not alone at this) that this
should be a pre-requirement for upstreaming kernel drivers of any
type.
And we had this discussion in the past, I'm sure we are both tired of
repeating ourselves.

>
> Thanks, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch