Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

Wed Jan 24 18:26:38 UTC 2024

On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand <faith at gfxstrand.net> wrote:
>
> Jose,
>
> Thanks for your thoughts!
>
> On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca <jose.fonseca at broadcom.com> wrote:
> >
> > I don't know much about the current Vulkan driver internals to have or provide an informed opinion on the path forward, but I'd like to share my backwards looking perspective.
> >
> > Looking back, Gallium was two things effectively:
> > (1) an abstraction layer, that's watertight (as in upper layers shouldn't reach through to lower layers)
> > (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
> >
> > (1) was of course important -- and the discipline it imposed is what enabled to great simplifications -- but it also became a straight-jacket, as GPUs didn't stand still, and sooner or later the see-every-hardware-as-the-same lenses stop reflecting reality.
> >
> > If I had to pick one, I'd say that (2) is far more useful and practical.    Take components like gallium's draw and other util modules. A driver can choose to use them or not.  One could fork them within Mesa source tree, and only the drivers that opt-in into the fork would need to be tested/adapted/etc
> >
> > On the flip side, Vulkan API is already a pretty low level HW abstraction.  It's also very flexible and extensible, so it's hard to provide a watertight abstraction underneath it without either taking the lowest common denominator, or having lots of optional bits of functionality governed by a myriad of caps like you alluded to.
>
> There is a third thing that isn't really recognized in your description:
>
> (3) A common "language" to talk about GPUs and data structures that
> represent that language
>
> This is precisely what the Vulkan runtime today doesn't have. Classic
> meta sucked because we were trying to implement GL in GL. u_blitter,
> on the other hand, is pretty fantastic because Gallium provides a much
> more sane interface to write those common components in terms of.
>
> So far, we've been trying to build those components in terms of the
> Vulkan API itself with calls jumping back into the dispatch table to
> try and get inside the driver. This is working but it's getting more
> and more fragile the more tools we add to that box. A lot of what I
> want to do with gallium2 or whatever we're calling it is to fix our
> layering problems so that calls go in one direction and we can
> untangle the jumble. I'm still not sure what I want that to look like
> but I think I want it to look a lot like Vulkan, just with a handier
> interface.

Yes, that makes sense. When we were writing the initial components for
gallium (draw and cso) I really liked the general concept and thought
about trying to reuse them in the old, non-gallium Mesa drivers but
the obstacle was that there was no common interface to lay them on.
Using GL to implement GL was silly and using Vulkan to implement
Vulkan is not much better.

Having said that my general thoughts on GPU abstractions largely match
what Jose has said. To me it's a question of whether a clean
abstraction:
- on top of which you can build an entire GPU driver toolkit (i.e. all
the components and helpers)
- that makes it trivial to figure up what needs to be done to write a
new driver and makes bootstrapping a new driver a lot simpler
- that makes it easier to reason about cross hardware concepts (it's a
lot easier to understand the entirety of the ecosystem if every driver
is not doing something unique to implement similar functionality)
is worth more than almost exponentially increasing the difficulty of:
- advancing the ecosystem (i.e. it might be easier to understand but
it's way harder to create clean abstractions across such different
hardware).
- driver maintenance (i.e. there will be a constant stream of
regressions hitting your driver as a result of other people working on
their drivers)
- general development (i.e. bug fixes/new features being held back
because they break some other driver)

Some of those can certainly be titled one way or the other, e.g. the
driver maintenance con be somewhat eased by requiring that every
driver working on top of the new abstraction has to have a stable
Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those
things need to be reasoned about. In my experience abstractions never
have uniform support because some people will value cons of them more
than they value the pros. So the entire process requires some very
steadfast individuals to keep going despite hearing that the effort is
dumb, at least until the benefits of the new approach are impossible
to deny. So you know... "how much do you believe in this approach
because some days will suck and you can't give up" ;) is probably the
question.

z