Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

Mon Jan 22 15:10:17 UTC 2024

On Mon, Jan 22, 2024 at 7:20 AM Iago Toral <itoral at igalia.com> wrote:
>
> Hi Faith,
>
> thanks for starting the discussion, we had a bit of an internal chat
> here at Igalia to see where we all stand on this and I am sharing some
> initial thoughts/questions below:
>
> El vie, 19-01-2024 a las 11:01 -0600, Faith Ekstrand escribió:
>
> > Thoughts?
>
> We think it is fine if the Vulkan runtime implements its own internal
> API that doesn't match Vulkan's. If we are going down this path however
> we really want to make sure we have good documentation for it so it is
> clear how all that works without having to figure things out by looking
> at the code.

That's a reasonable request. We probably won't re-type the Vulkan spec
in comments but having differences documented is reasonable.  I'm
thinking the level of documentation in vk_graphics_state.

> For existing drivers we think it is a bit less clear whether the effort
> required to port is going to be worth it. If you end up having to throw
> away a lot of what you currently have that already works and in some
> cases might even be optimal for your platform it may be a hard ask.
> What are your thoughts on this? How much adoption would you be looking
> for from existing drivers?

That's a good question. One of the problems I'm already seeing is that
we have a bunch of common stuff which is in use in some drivers and
not in others and I generally don't know why. If there's something
problematic about it on some vendor's hardware, we should fix that. If
it's just that driver teams don't have the time for refactors, that's
a different issue. Unfortunately, I usually don't know besides one-off
comments from a developer here and there.

And, yeah, I know it can be a lot of work.  Hopefully the work pays
off in the long run but short-term it's often hard to justify. :-/

> As new features are added to the runtime, we understand some of them
> could have dependencies on other features, building on top of them,
> requiring drivers to adopt more of the common vulkan runtime to
> continue benefiting from additional features, is that how you see this
> or would you still expect many runtime features to still be independent
> from each other to facilitate driver opt-in on a need-by-need basis?

At a feature level, yes. However, one of the big things I'm struggling
with right now is layering issues where we really need to flip things
around from the driver calling into the runtime to the runtime calling
into the driver. One of the things I would LOVE to put in the runtime
is YCbCr emulation for drivers that don't natively have multi-plane
image support. However, that just isn't possible today thanks to the
way things are layered. In particular, we would need the runtime to be
able to make one `VkImage` contain multiple driver images and that's
just not possible as long as the driver is controlling image creation.
We also don't have enough visibility into descriptor sets. People have
also talked about trying to do a common ray-tracing implementation.
Unfortunately, I just don't see that happening with the current layer
model.

Unfortunately, I don't have a lot of examples of what that would look
like without having written the code to do it. One thing I'm currently
thinking about is switching more objects to a kernel vtable model like
I did with `vk_pipeline` and `vk_shader` in the posted MR. This puts
the runtime in control of the object's life cycle and more easily
allows for multiple implementations of an object type. Like right now
you can use the common implementation for graphics and compute and
roll your own vk_pipeline for ray-tracing. I realize that doesn't
really apply to Raspberry Pi but it's an example of what flipping the
layering around looks like.

The other thing I've been realizing as I've been thinking about this
over the week-end is that, if this happens, we're likely heading
towards another gallium/classic split for a while. (Though hopefully
without the bad blood in the community that we had from gallium.) If
this plays out similarly to gallium/classic, a bunch of drivers will
remain classic, doing most things themselves and the new thing (which
really needs a name, BTW) will be driven by a small subset of drivers
and then other drivers get moved over as time allows. This isn't
necessarily a bad thing, it's just a recognition of how large-scale
changes tend to roll out within Mesa and the potential scope of a more
invasive runtime project.

Thinking of it this way would also give more freedom to the people
building the new thing to just build it without worrying about driver
porting and trying to do everything incrementally. If we do attempt
this, it needs to be done with a subset of drivers that is as
representative of the industry as possible so we don't screw anybody
over. I'm currently thinking NVK (1.3, all the features), AGX (all the
features but on shit hardware), and Panvk (low features). That won't
guarantee the perfect design for everyone, of course, but hopefully
it'd be enough to keep us from painting ourselves into too many
corners.

Unfortunately, it's all a bit nebulous at the moment. It's pretty
clear to me that there's a problem that needs to be solved but it's
still a bit of a mystery around exactly how we want to solve it. I'm
mostly trying to gauge interest at the moment.

~Faith