[Mesa-dev] [PATCH 2/2] mesa: add hard limits for the number of varyings and uniforms for the linker

Wed Nov 23 10:25:10 PST 2011

On 11/22/2011 07:27 PM, Marek Olšák wrote:
> On Tue, Nov 22, 2011 at 11:11 PM, Ian Romanick<idr at freedesktop.org>  wrote:
>> All of this discussion is largely moot.  The failure that you're so angry
>> about was caused by a bug in the check, not by the check itself. That bug
>> has already been fixed (commit 151867b).
>>
>> The exact same check was previously performed in st_glsl_to_tgsi (or
>> ir_to_mesa), and the exact same set of shaders would have been rejected.
>>   The check is now done in the linker instead.
>
> Actually, the bug only got my attention and then I realized what is
> actually happening in the linker. I probably wouldn't even notice
> because I no longer do any 3D on my laptop with r500. I gotta admit, I
> didn't know the checks were so... well, "not ready for a release" to
> say the least and that's meant regardless of the bug.
>
> Let's analyze the situation a bit, open-minded.
>
> The checks can be enabled for OpenGL ES 2.0 with no problem, we won't
> likely get a failure there.
>
> They can also be enabled for D3D10-level and later hardware, because
> its limits are pretty high and therefore are unlikely to fail. The
> problem is with the D3D9-level hardware (probably related to the
> vmware driver too).

Let me paraphrase this a little bit in a way that I think concisely 
captures the intention:

     "We need to work really hard to make things work on older hardware."

I don't think anyone disagrees with that.  However, the solutions you 
have so far proposed to this problem have said:

     "We need to let anything through whether it will work or not."

Those are very different things.  We can have the first without the 
second.  I will fight very, very hard to not allow the second in any 
project with which I'm associated.

> We also have to consider that a lot of applications are now developed
> with D3D10-level or later hardware and even though the expected
> hardware requirements for such an app are meant to be low, there can
> be, say, programming mistakes, which raise hardware requirements quite
> a lot. The app developer has no way to know about it, because it just
> works on his machine. For example, some compositing managers had such
> mistakes and there's been a lot of whining about that on Phoronix.
>
> We also should take into account that hardly any app has a fallback if
> a shader program fails to link. VDrift has one, but that's rather an
> exception to the rule (VDrift is an interesting example though; it
> falls back to fixed-function because Mesa is too strict about obeying
> specs, just that really). Most apps usually just abort, crash, or
> completely ignore that linking failed and render garbage or nothing.
> Wine, our biggest user of Mesa, can't fail. D3D shaders must compile
> successfully or it's game over.

Here's the deal about Wine and compositing (my spell checker always 
wants to make that word "composting") managers.  All of the 
closed-source driver makers have developer outreach programs that work 
closely with tier-1 developers to make sure their apps work and run 
well.  This is how they avoid a lot of these sorts of problems.  It's 
unreasonable to expect any developer to test their product on every 
piece of hardware.  We (the Mesa community) can't even manage that with 
our drivers.  What we can do is try to prevent app developers from 
shooting themselves in the foot.

We've had a lot more communication with the Wine developers over the 
last year or so, and things have gotten a lot better there.  We can and 
should be more proactive, but I'm not sure what form that should take. 
Pretty much everything we do with app developers is reactive.  Right? 
We only interact with them when they come to us because something 
doesn't work or they got a bug report from a user.

> Although the possibility of a linker failure is a nice feature in
> theory, the reality is nobody wants it, because it's the primary cause
> of apps aborting themselves or just rendering nothing (and, of course,
> everybody blames Mesa, or worse: Linux).

By this same logic, malloc should never return NULL because most apps 
can't handle it.  Instead it should mmap /dev/null and return a pointer 
to that.  That analogy isn't as far off as it may seem:  in both cases 
the underlying infrastructure has lied to the application that an 
operation succeeded, and it has given it a resource that it can't 
possibly use.

Surely nobody would suggest glibc implement such a thing much less 
implement it.  We shouldn't either.  Both cases may make some deployed 
application run.  However, what happens to the poor schmuck writing an 
application that accidentally tries 5TB instead of 5MB?  He spends hours 
trying to figure out why all his reads of malloc'ed memory give zeros.

My day job is writing OpenGL drivers.  My evening job is teaching people 
how to write OpenGL applications.  I have seen people try to debug 
OpenGL code, and it's already a miserable process.  There are so many 
things that can lead to a mysterious black screen.  Adding another by 
lying to the developer doesn't do anyone any favors.  The code looks 
fine, the driver says it's fine, and it may even work fine on a 
different piece of hardware.  That developer will blame Mesa, Linux, or 
OpenGL and probably ragequit.

> There is a quite a large possibility that if those linker checks were
> disabled, more apps would work, especially those were the limits are
> exceeded by a little bit, but the difference is eliminated by the
> driver. Sure, some apps would still be broken or render garbage, but
> it's either this or nothing, don't you think?

No, I don't think that at all.  I think we can have more shaders run 
within hardware limits without letting things through that cannot run. 
There have been proposals on IRC, in at least one of the bug reports, 
and in this e-mail thread about how we could achieve that.  It just 
requires some work.  Good engineering is a real hassle that way. :)