[Mesa-dev] Mesa/Gallium overall design

Mon Apr 12 01:22:23 PDT 2010

>> Well, there are a lot of things that Gallium doesn't do well compared
>> to other APIs, mostly OpenGL:
>> 1. Support for fixed function cards in some way, either:
>> 1a. (worse) New Gallium interfaces to pass fixed function pipeline
>> states, along with an auxiliary module to turn them into shaders
>> 1b. (better) An auxiliary module doing magic with LLVM to fit shaders
>> into the fixed function pipeline
>
> No. One of the entire design goals of Gallium is to provide a
> shaderful pipeline. If you wanna do it with register combiners, you
> could try, but frankly we've already talked this over and decided to
> not walk that plank.

I think this is a serious problem and this design decision should be
changed if possible.

The issue here is that this means that the classic Mesa interfaces can
never be removed in favor Gallium, which I think is going to stifle
the evolution of Mesa and make Gallium drivers slower due to the
inability to massively refactor Mesa to be a leaner and faster state
tracker, since support for the classic interfaces must be kept as
well.

Of course, it's technically hard to do this, so it's not obvious if
anyone will do so, but welcoming such a change would be a first step.

The shaderful pipeline can be fully preserved with the shader-fitting
approach, or can become just one of two ways of specifying a pipeline.
You could for instance make shader CSOs represent either shaders or
full fixed function pipelines.

>> 2. Support for setting individual states instead of full state
>> objects, for APIs like OpenGL where that works better
>
> State is collated. Are there really apps (or even serious use cases)
> where state is constantly in flux like this?

Not sure, but the fact is that it is quite inefficient to do that,
since both OpenGL and most GPUs use fine-grained states.
Of course DirectX (and some OpenGL extensions) haev coarse-grained
states and some GPUs also do (e.g. nv50 texture), so we would need to
have both.

Haven't really thought a lot about this, but I think the CSO module
could be made an helper to implement the new fine-grainer state
interfaces over the existing CSO ones.

>> 3. Immediate vertex submission
>
> Already addressed this. Doing it in-driver for the HW that supports it
> isn't that tough.

It is impossible for glBegin/glVertex/glEnd unless this gets passed
directly to the Gallium driver.

>> 4. More powerful and better defined clear interface with scissor/MRT support
>
> I'm not sure how scissors fit in, other than that you probably have to
> hax them on your HW to work with clears, but this isn't really a
> problem any longer. If you want to involve e.g. MRTs in your clears,
> patch util_clear to do it. Also how is this a GL thing?

Because glClear respects scissors, and some hardware provides
scissor-supporting CLEAR as well, while some may instead have the
opposite.
So IMHO there should be a separate full_clear and clear_with_scissors,
or something like that.

>> 5. Perhaps in theory more powerful 2D interfaces (e.g. format
>> conversion, stretching, color masks, ROPs, Porter-Duff, etc), emulated
>> over 3D by the blitter module, to better implement 2D apis
>
> We talked several times of a new pipe interface for this stuff. For a
> majority of chipsets, the features you listed all require a 3D engine,
> but that doesn't preclude a new pipe built on pipe_context. I guess
> use cases would be nice before we go down this path; the only consumer
> of all these that I can think of is Xorg, and we've already got that
> covered.

Yes, this is not obviously worth the time to implement it, I added it
for completeness.

However, note that older version of Windows had a significant 2D
acceleration layer, that at least old chipsets (think mach64/rage128)
probably support, and in these ancient cases likely works better (this
is depending on the aforementioned fixed function support obviously).

>> 6. Better handling of shader linkage (I have some work on this)
>
> Is the link-on-render semantic not strong enough? I remember last time
> that your grievances were largely pointed at Mesa and GLSL; do we
> really need Gallium features for this?

It would be nice to have it.
Since then, I added support for the current Gallium interface to
nv30/nv40, but it is somewhat inefficient to do so.
This is a complex topic though, and Gallium can be surely improved,
even though the current interface is mostly acceptable, but only for
the current needs.

> On the other hand, Gallium should be permitted to fail shader
> compiles; most APIs permit this in one way or another.

Yes, this too.
In general, I think it is missing a lot of "hardware limitation" caps.

>> 7. Some broadly used and *good* way of managing graphics memory (e.g.
>> pipebuffer improved and widely adopted)
>
> Um. I'm probably opening a can of worms here, but this has nothing to
> do with GL.

Yes, this is a general concern.

>> 8. Conditionals/predicates in TGSI (for NV_*_program)
>
> Hm, I could have sworn we have all the useful conditionals. I know
> that some instructions were removed, but they were largely useless or
> redundant.

There is no support for 3-valued condition codes (less than, equal or
greater) and I'm not sure how well the current predicate support
works/is well designed (nothing uses it AFAIK).

>> 9. Half float registers and precision specification in TGSI (for NV_*_program)
>
> I think this should go under a general conformance vs. performance vs.
> quality switch.

I mean allowing to specify precision and datatypes on specific instructions.
The nVidia OpenGL extensions allow this.
Not sure if GLSL somehow does too.

>> 10. Maybe an interface to explicitly set constants instead of dealing
>> with constant buffers (not sure about this, perhaps constant buffers
>> are fine everywhere)
>
> We talked about this already. Constant buffers aren't ideal on
> transitional hardware, but they work fine.

Not totally sure about this: it may actually work well enough.
This may need to be revisited in the future.

>
>> Of course there are also the missing features that DirectX 10/11 has, like:
>> 1. Mipmap generation
>
> SGIS_generate_mipmap isn't good enough? It's already implemented in
> Mesa. So only D3D 10+ trackers would benefit from this, and they
> already have it implemented.

But the Gallium driver can't implement it: the state tracker will do
it itself with the 3D engine even if the hardware has a better way
(e.g. a stretching 2D engine).
This is also unclear, maybe the 3D engine is the best choice
everywhere (but e.g. the nVidia binary drivers use the 2D engine
afaik, though that doesn't mean it's optimal).

>> 2. Stream out/transform feedback and DrawAuto
>
> GL_FEEDBACK modes? I'm not sure if any APIs have them all in a style
> that can be unified.

No idea, but they surely need to be supported.
BTW, it's not GL_FEEDBACK, it's transform feedback which started as an
nVidia-extension and I think is core in GL3.3 and GL4.

>> 4. Compute shaders
>
> We'll talk about this later. Suffice it to say that we more or less
> all agree pipe_context isn't good for this.

Possibly.

>
>> 5. Tessellation
>
> Geom shaders are already half-implemented, aren't they? You'd have to
> ask Zack about that, but ISTR that he's got them working.

It's different from geometry shaders, I mean GL4/DX11 tessellation,
i.e. tessellation control and tessellation evaluation shaders.

>> 6. Multisampling, including alpha-to-coverage and all
>> hardware-specific tricks like CSAA
>
> Too much hardware-specific stuff in there. SSAA (straight-up
> supersampling) should be possible right now, but I think the current
> compromise of a single bit to request *some kind* of multisampled
> buffer is fine.

Uh?
Everything must be exposed, otherwise it's not possible to implement
all the extensions and functionality supported by the binary drivers.

>> 7. 2D texture arrays
>
> I have no idea what these are.

Textures made of multiple 2D images, but without filtering between
them like 3D textures.
DirectX 10 hardware has them and usually implements cube maps with them.

>> 8. Texture sampling in geometry shaders
>
> Wait, we can do that? Wicked. That'll be fun.

Yes, DirectX 10 has that.

>> 9. Indirect instanced drawing (see DX11 DrawInstancedIndirect)
>
> Ugh, moar D3D 11? Let's catch up to D3D 10 first.

Yeah, added for completeness.

>> 10. DX11 shader interfaces
>
> These differ significantly from what we've got? I don't know D3D 11 yet.

It's some strange C++-like thing DX11 has to better support
ubershaders, not sure about that.

>> 11. Selection of viewports/render target in 2D texture array from the
>> geometry shader
>
> I'm seeing a pattern here.

>> 12. More TGSI instructions (pointer load/stores, fp64, atomic ops,
>> shared memory, etc.)
>
> This isn't happening on the current generations of supported hardware,
> and it'll likely be delayed for a bit on newer stuff.

Uh?
GeForce GTX 2xx has all of them (hw supported by Gallium nv50)
GeForce GTX 480 has all of them and they are also fast (hw hopefully
supported soon by nv50 Gallium or a new driver)
R800 probably does too.

> I think "it's going to suck on all cards and for all APIs" is a
> constant unchanged by the whims of driver developers and hardware
> manufacturers.

Uhm?
At least in theory and future direction, I think Gallium is not
supposed to suck at all, and should actually be supposed to be the
best overall software for all kinds of graphics needs on all platforms
and all hardware.
Practical considerations may limit that of course.

>> Of course this may not get done due to not being worth the time to
>> implement it, but that's a different issue.
>
> No, that's the entire point. If we had the time to implement things,
> we wouldn't still be in the GL 2.x era.

Yes, sure.
However, agreeing that something should be done/is a good idea is the
first step towards someone doing it.

>> BTW, for instance, I sent a patch to change the Gallium sampler state
>> to support nearest-neighbor anisotropic filtering on nVidia cards (by
>> removing ANISO as a special filter), and it was merged some time ago,
>> so it seems this kind of thing is possible...
>
> I'm gonna point you to a discussion we had several weeks ago about
> GLSL linking, in which it was opined that some nVidia hardware lacked
> programmable swizzles and routing tables for linking shaders,
> requiring shader compilers to be augmented with linking and selection
> code to properly match outputs and inputs across shaders. Was a
> Gallium-level module implemented to perform the desired shader
> modifications, or was it done privately in the driver? Was the Gallium
> API changed as a result of the discussion?

I have already written some Gallium auxiliary code to do that (and
nvfx code using it) that I plan to propose at some point, along with a
better specification of the current Gallium rules (without changes).

Some parts must be driver-specific, since actually applying
relocations inherently depends on how the shader is encoded.

Once I merge a lot of other unmerged stuff, I'll probably send this to
the ML too.