Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

Faith Ekstrand faith at gfxstrand.net
Thu Jan 25 16:01:38 UTC 2024


On Thu, Jan 25, 2024 at 8:57 AM Jose Fonseca <jose.fonseca at broadcom.com>
wrote:

> > So far, we've been trying to build those components in terms of the
> Vulkan API itself with calls jumping back into the dispatch table to try
> and get inside the driver. This is working but it's getting more and more
> fragile the more tools we add to that box. A lot of what I want to do with
> gallium2 or whatever we're calling it is to fix our layering problems so
> that calls go in one direction and we can untangle the jumble. I'm still
> not sure what I want that to look like but I think I want it to look a lot
> like Vulkan, just with a handier interface.
>
> That resonates with my experience.  For example, Galllium draw module does
> some of this too -- it provides its own internal interfaces for drivers,
> but it also loops back into Gallium top interface to set FS and rasterizer
> state -- and that has *always* been a source of grief.  Having control
> flow proceeding through layers in one direction only seems an important
> principle to observe.  It's fine if the lower interface is the same
> interface (e.g., Gallium to Gallium, or Vulkan to Vulkan as you allude),
> but they shouldn't be the same exact entry-points/modules (ie, no
> reentrancy/recursion.)
>
> It's also worth considering that Vulkan extensibility could come in hand
> too in what you want to achieve.  For example, Mesa Vulkan drivers could
> have their own VK_MESA_internal_xxxx extensions that could be used by the
> shared Vulkan code to do lower level things.
>

We already do that for a handful of things. The fact that Vulkan doesn't
ever check the stuff in the pNext chain is really useful for that. 😅

~Faith


> Jose
>
>
> On Wed, Jan 24, 2024 at 3:26 PM Faith Ekstrand <faith at gfxstrand.net>
> wrote:
>
>> Jose,
>>
>> Thanks for your thoughts!
>>
>> On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca <jose.fonseca at broadcom.com>
>> wrote:
>> >
>> > I don't know much about the current Vulkan driver internals to have or
>> provide an informed opinion on the path forward, but I'd like to share my
>> backwards looking perspective.
>> >
>> > Looking back, Gallium was two things effectively:
>> > (1) an abstraction layer, that's watertight (as in upper layers
>> shouldn't reach through to lower layers)
>> > (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
>> >
>> > (1) was of course important -- and the discipline it imposed is what
>> enabled to great simplifications -- but it also became a straight-jacket,
>> as GPUs didn't stand still, and sooner or later the
>> see-every-hardware-as-the-same lenses stop reflecting reality.
>> >
>> > If I had to pick one, I'd say that (2) is far more useful and
>> practical.    Take components like gallium's draw and other util modules. A
>> driver can choose to use them or not.  One could fork them within Mesa
>> source tree, and only the drivers that opt-in into the fork would need to
>> be tested/adapted/etc
>> >
>> > On the flip side, Vulkan API is already a pretty low level HW
>> abstraction.  It's also very flexible and extensible, so it's hard to
>> provide a watertight abstraction underneath it without either taking the
>> lowest common denominator, or having lots of optional bits of functionality
>> governed by a myriad of caps like you alluded to.
>>
>> There is a third thing that isn't really recognized in your description:
>>
>> (3) A common "language" to talk about GPUs and data structures that
>> represent that language
>>
>> This is precisely what the Vulkan runtime today doesn't have. Classic
>> meta sucked because we were trying to implement GL in GL. u_blitter,
>> on the other hand, is pretty fantastic because Gallium provides a much
>> more sane interface to write those common components in terms of.
>>
>> So far, we've been trying to build those components in terms of the
>> Vulkan API itself with calls jumping back into the dispatch table to
>> try and get inside the driver. This is working but it's getting more
>> and more fragile the more tools we add to that box. A lot of what I
>> want to do with gallium2 or whatever we're calling it is to fix our
>> layering problems so that calls go in one direction and we can
>> untangle the jumble. I'm still not sure what I want that to look like
>> but I think I want it to look a lot like Vulkan, just with a handier
>> interface.
>>
>> ~Faith
>>
>> > Not sure how useful this is in practice to you, but the lesson from my
>> POV is that opt-in reusable and shared libraries are always time well spent
>> as they can bend and adapt with the times, whereas no opt-out watertight
>> abstractions inherently have a shelf life.
>> >
>> > Jose
>> >
>> > On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand <faith at gfxstrand.net>
>> wrote:
>> >>
>> >> Yeah, this one's gonna hit Phoronix...
>> >>
>> >> When we started writing Vulkan drivers back in the day, there was this
>> >> notion that Vulkan was a low-level API that directly targets hardware.
>> >> Vulkan drivers were these super thin things that just blasted packets
>> >> straight into the hardware. What little code was common was small and
>> >> pretty easy to just copy+paste around. It was a nice thought...
>> >>
>> >> What's happened in the intervening 8 years is that Vulkan has grown. A
>> lot.
>> >>
>> >> We already have several places where we're doing significant layering.
>> >> It started with sharing the WSI code and some Python for generating
>> >> dispatch tables. Later we added common synchronization code and a few
>> >> vkFoo2 wrappers. Then render passes and...
>> >>
>> >> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024
>> >>
>> >> That's been my project the last couple weeks: A common VkPipeline
>> >> implementation built on top of an ESO-like interface. The big
>> >> deviation this MR makes from prior art is that I make no attempt at
>> >> pretending it's a layered implementation. The vtable for shader
>> >> objects looks like ESO but takes its own path when it's useful to do
>> >> so. For instance, shader creation always consumes NIR and a handful of
>> >> lowering passes are run for you. It's no st_glsl_to_nir but it is a
>> >> bit opinionated. Also, a few of the bits that are missing from ESO
>> >> such as robustness have been added to the interface.
>> >>
>> >> In my mind, this marks a pretty fundamental shift in how the Vulkan
>> >> runtime works, at least in my mind. Previously, everything was
>> >> designed to be a toolbox where you can kind of pick and choose what
>> >> you want to use. Also, everything at least tried to act like a layer
>> >> where you still implemented Vulkan but you could leave out bits like
>> >> render passes if you implemented the new thing and were okay with the
>> >> layer. With the ESO code, you implement something that isn't Vulkan
>> >> entrypoints and the actual entrypoints live in the runtime. This lets
>> >> us expand and adjust the interface as needed for our purposes as well
>> >> as sanitize certain things even in the modern API.
>> >>
>> >> The result is that NVK is starting to feel like a gallium driver. 🙃
>> >>
>> >> So here's the question: do we like this? Do we want to push in this
>> >> direction? Should we start making more things work more this way? I'm
>> >> not looking for MRs just yet nor do I have more reworks directly
>> >> planned. I'm more looking for thoughts and opinions as to how the
>> >> various Vulkan driver teams feel about this. We'll leave the detailed
>> >> planning for the Mesa issue tracker.
>> >>
>> >> It's worth noting that, even though I said we've tried to keep things
>> >> layerish, there are other parts of the runtime that look like this.
>> >> The synchronization code is a good example. The vk_sync interface is
>> >> pretty significantly different from the Vulkan objects it's used to
>> >> implement. That's worked out pretty well, IMO. With as complicated as
>> >> something like pipelines or synchronization are, trying to keep the
>> >> illusion of a layer just isn't practical.
>> >>
>> >> So, do we like this? Should we be pushing more towards drivers being a
>> >> backed of the runtime instead of a user of it?
>> >>
>> >> Now, before anyone asks, no, I don't really want to build a multi-API
>> >> abstraction with a Vulkan state tracker. If we were doing this 5 years
>> >> ago and Zink didn't already exist, one might be able to make an
>> >> argument for pushing in that direction. However, that would add a huge
>> >> amount of weight to the project and make it even harder to develop the
>> >> runtime than it already is and for little benefit at this point.
>> >>
>> >> Here's a few other constraints on what I'm thinking:
>> >>
>> >> 1. I want it to still be possible for drivers to implement an
>> >> extension without piles of runtime plumbing or even bypass the runtime
>> >> on occasion as needed.
>> >>
>> >> 2. I don't want to recreate the gallium cap disaster drivers should
>> >> know exactly what they're advertising. We may want to have some
>> >> internal features or properties that are used by the runtime to make
>> >> decisions but they'll be in addition to the features and properties in
>> >> Vulkan.
>> >>
>> >> 3. We've got some meta stuff already but we probably want more.
>> >> However, I don't want to force meta on folks who don't want it.
>> >>
>> >> The big thing here is that if we do this, I'm going to need help. I'm
>> >> happy to do a lot of the architectural work but drivers are going to
>> >> have to keep up with the changes and I can't take on the burden of
>> >> moving 8 different drivers forward. I can answer questions and maybe
>> >> help out a bit but the refactoring is going to be too much for one
>> >> person, even if that person is me.
>> >>
>> >> Thoughts?
>> >>
>> >> ~Faith
>> >
>> >
>> > This electronic communication and the information and any files
>> transmitted with it, or attached to it, are confidential and are intended
>> solely for the use of the individual or entity to whom it is addressed and
>> may contain information that is confidential, legally privileged, protected
>> by privacy laws, or otherwise restricted from disclosure to anyone else. If
>> you are not the intended recipient or the person responsible for delivering
>> the e-mail to the intended recipient, you are hereby notified that any use,
>> copying, distributing, dissemination, forwarding, printing, or copying of
>> this e-mail is strictly prohibited. If you received this e-mail in error,
>> please return the e-mail to the sender, delete it from your computer, and
>> destroy any printed copy of it.
>>
>
> This electronic communication and the information and any files
> transmitted with it, or attached to it, are confidential and are intended
> solely for the use of the individual or entity to whom it is addressed and
> may contain information that is confidential, legally privileged, protected
> by privacy laws, or otherwise restricted from disclosure to anyone else. If
> you are not the intended recipient or the person responsible for delivering
> the e-mail to the intended recipient, you are hereby notified that any use,
> copying, distributing, dissemination, forwarding, printing, or copying of
> this e-mail is strictly prohibited. If you received this e-mail in error,
> please return the e-mail to the sender, delete it from your computer, and
> destroy any printed copy of it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20240125/3f3f63c4/attachment.htm>


More information about the mesa-dev mailing list