Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

Fri Jan 19 17:01:35 UTC 2024

Yeah, this one's gonna hit Phoronix...

When we started writing Vulkan drivers back in the day, there was this
notion that Vulkan was a low-level API that directly targets hardware.
Vulkan drivers were these super thin things that just blasted packets
straight into the hardware. What little code was common was small and
pretty easy to just copy+paste around. It was a nice thought...

What's happened in the intervening 8 years is that Vulkan has grown. A lot.

We already have several places where we're doing significant layering.
It started with sharing the WSI code and some Python for generating
dispatch tables. Later we added common synchronization code and a few
vkFoo2 wrappers. Then render passes and...

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024

That's been my project the last couple weeks: A common VkPipeline
implementation built on top of an ESO-like interface. The big
deviation this MR makes from prior art is that I make no attempt at
pretending it's a layered implementation. The vtable for shader
objects looks like ESO but takes its own path when it's useful to do
so. For instance, shader creation always consumes NIR and a handful of
lowering passes are run for you. It's no st_glsl_to_nir but it is a
bit opinionated. Also, a few of the bits that are missing from ESO
such as robustness have been added to the interface.

In my mind, this marks a pretty fundamental shift in how the Vulkan
runtime works, at least in my mind. Previously, everything was
designed to be a toolbox where you can kind of pick and choose what
you want to use. Also, everything at least tried to act like a layer
where you still implemented Vulkan but you could leave out bits like
render passes if you implemented the new thing and were okay with the
layer. With the ESO code, you implement something that isn't Vulkan
entrypoints and the actual entrypoints live in the runtime. This lets
us expand and adjust the interface as needed for our purposes as well
as sanitize certain things even in the modern API.

The result is that NVK is starting to feel like a gallium driver. 🙃

So here's the question: do we like this? Do we want to push in this
direction? Should we start making more things work more this way? I'm
not looking for MRs just yet nor do I have more reworks directly
planned. I'm more looking for thoughts and opinions as to how the
various Vulkan driver teams feel about this. We'll leave the detailed
planning for the Mesa issue tracker.

It's worth noting that, even though I said we've tried to keep things
layerish, there are other parts of the runtime that look like this.
The synchronization code is a good example. The vk_sync interface is
pretty significantly different from the Vulkan objects it's used to
implement. That's worked out pretty well, IMO. With as complicated as
something like pipelines or synchronization are, trying to keep the
illusion of a layer just isn't practical.

So, do we like this? Should we be pushing more towards drivers being a
backed of the runtime instead of a user of it?

Now, before anyone asks, no, I don't really want to build a multi-API
abstraction with a Vulkan state tracker. If we were doing this 5 years
ago and Zink didn't already exist, one might be able to make an
argument for pushing in that direction. However, that would add a huge
amount of weight to the project and make it even harder to develop the
runtime than it already is and for little benefit at this point.

Here's a few other constraints on what I'm thinking:

1. I want it to still be possible for drivers to implement an
extension without piles of runtime plumbing or even bypass the runtime
on occasion as needed.

2. I don't want to recreate the gallium cap disaster drivers should
know exactly what they're advertising. We may want to have some
internal features or properties that are used by the runtime to make
decisions but they'll be in addition to the features and properties in
Vulkan.

3. We've got some meta stuff already but we probably want more.
However, I don't want to force meta on folks who don't want it.

The big thing here is that if we do this, I'm going to need help. I'm
happy to do a lot of the architectural work but drivers are going to
have to keep up with the changes and I can't take on the burden of
moving 8 different drivers forward. I can answer questions and maybe
help out a bit but the refactoring is going to be too much for one
person, even if that person is me.

Thoughts?

~Faith