Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
Faith Ekstrand
faith at gfxstrand.net
Wed Jan 24 15:26:05 UTC 2024
Jose,
Thanks for your thoughts!
On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca <jose.fonseca at broadcom.com> wrote:
>
> I don't know much about the current Vulkan driver internals to have or provide an informed opinion on the path forward, but I'd like to share my backwards looking perspective.
>
> Looking back, Gallium was two things effectively:
> (1) an abstraction layer, that's watertight (as in upper layers shouldn't reach through to lower layers)
> (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
>
> (1) was of course important -- and the discipline it imposed is what enabled to great simplifications -- but it also became a straight-jacket, as GPUs didn't stand still, and sooner or later the see-every-hardware-as-the-same lenses stop reflecting reality.
>
> If I had to pick one, I'd say that (2) is far more useful and practical. Take components like gallium's draw and other util modules. A driver can choose to use them or not. One could fork them within Mesa source tree, and only the drivers that opt-in into the fork would need to be tested/adapted/etc
>
> On the flip side, Vulkan API is already a pretty low level HW abstraction. It's also very flexible and extensible, so it's hard to provide a watertight abstraction underneath it without either taking the lowest common denominator, or having lots of optional bits of functionality governed by a myriad of caps like you alluded to.
There is a third thing that isn't really recognized in your description:
(3) A common "language" to talk about GPUs and data structures that
represent that language
This is precisely what the Vulkan runtime today doesn't have. Classic
meta sucked because we were trying to implement GL in GL. u_blitter,
on the other hand, is pretty fantastic because Gallium provides a much
more sane interface to write those common components in terms of.
So far, we've been trying to build those components in terms of the
Vulkan API itself with calls jumping back into the dispatch table to
try and get inside the driver. This is working but it's getting more
and more fragile the more tools we add to that box. A lot of what I
want to do with gallium2 or whatever we're calling it is to fix our
layering problems so that calls go in one direction and we can
untangle the jumble. I'm still not sure what I want that to look like
but I think I want it to look a lot like Vulkan, just with a handier
interface.
~Faith
> Not sure how useful this is in practice to you, but the lesson from my POV is that opt-in reusable and shared libraries are always time well spent as they can bend and adapt with the times, whereas no opt-out watertight abstractions inherently have a shelf life.
>
> Jose
>
> On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand <faith at gfxstrand.net> wrote:
>>
>> Yeah, this one's gonna hit Phoronix...
>>
>> When we started writing Vulkan drivers back in the day, there was this
>> notion that Vulkan was a low-level API that directly targets hardware.
>> Vulkan drivers were these super thin things that just blasted packets
>> straight into the hardware. What little code was common was small and
>> pretty easy to just copy+paste around. It was a nice thought...
>>
>> What's happened in the intervening 8 years is that Vulkan has grown. A lot.
>>
>> We already have several places where we're doing significant layering.
>> It started with sharing the WSI code and some Python for generating
>> dispatch tables. Later we added common synchronization code and a few
>> vkFoo2 wrappers. Then render passes and...
>>
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024
>>
>> That's been my project the last couple weeks: A common VkPipeline
>> implementation built on top of an ESO-like interface. The big
>> deviation this MR makes from prior art is that I make no attempt at
>> pretending it's a layered implementation. The vtable for shader
>> objects looks like ESO but takes its own path when it's useful to do
>> so. For instance, shader creation always consumes NIR and a handful of
>> lowering passes are run for you. It's no st_glsl_to_nir but it is a
>> bit opinionated. Also, a few of the bits that are missing from ESO
>> such as robustness have been added to the interface.
>>
>> In my mind, this marks a pretty fundamental shift in how the Vulkan
>> runtime works, at least in my mind. Previously, everything was
>> designed to be a toolbox where you can kind of pick and choose what
>> you want to use. Also, everything at least tried to act like a layer
>> where you still implemented Vulkan but you could leave out bits like
>> render passes if you implemented the new thing and were okay with the
>> layer. With the ESO code, you implement something that isn't Vulkan
>> entrypoints and the actual entrypoints live in the runtime. This lets
>> us expand and adjust the interface as needed for our purposes as well
>> as sanitize certain things even in the modern API.
>>
>> The result is that NVK is starting to feel like a gallium driver. 🙃
>>
>> So here's the question: do we like this? Do we want to push in this
>> direction? Should we start making more things work more this way? I'm
>> not looking for MRs just yet nor do I have more reworks directly
>> planned. I'm more looking for thoughts and opinions as to how the
>> various Vulkan driver teams feel about this. We'll leave the detailed
>> planning for the Mesa issue tracker.
>>
>> It's worth noting that, even though I said we've tried to keep things
>> layerish, there are other parts of the runtime that look like this.
>> The synchronization code is a good example. The vk_sync interface is
>> pretty significantly different from the Vulkan objects it's used to
>> implement. That's worked out pretty well, IMO. With as complicated as
>> something like pipelines or synchronization are, trying to keep the
>> illusion of a layer just isn't practical.
>>
>> So, do we like this? Should we be pushing more towards drivers being a
>> backed of the runtime instead of a user of it?
>>
>> Now, before anyone asks, no, I don't really want to build a multi-API
>> abstraction with a Vulkan state tracker. If we were doing this 5 years
>> ago and Zink didn't already exist, one might be able to make an
>> argument for pushing in that direction. However, that would add a huge
>> amount of weight to the project and make it even harder to develop the
>> runtime than it already is and for little benefit at this point.
>>
>> Here's a few other constraints on what I'm thinking:
>>
>> 1. I want it to still be possible for drivers to implement an
>> extension without piles of runtime plumbing or even bypass the runtime
>> on occasion as needed.
>>
>> 2. I don't want to recreate the gallium cap disaster drivers should
>> know exactly what they're advertising. We may want to have some
>> internal features or properties that are used by the runtime to make
>> decisions but they'll be in addition to the features and properties in
>> Vulkan.
>>
>> 3. We've got some meta stuff already but we probably want more.
>> However, I don't want to force meta on folks who don't want it.
>>
>> The big thing here is that if we do this, I'm going to need help. I'm
>> happy to do a lot of the architectural work but drivers are going to
>> have to keep up with the changes and I can't take on the burden of
>> moving 8 different drivers forward. I can answer questions and maybe
>> help out a bit but the refactoring is going to be too much for one
>> person, even if that person is me.
>>
>> Thoughts?
>>
>> ~Faith
>
>
> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
More information about the mesa-dev
mailing list