[Mesa-dev] [v2 6/6] mesa: OES_get_program_binary functionality

Wed Nov 6 11:24:40 PST 2013

On Wed, Nov 6, 2013 at 7:49 PM, Paul Berry <stereotype441 at gmail.com> wrote:
> On 6 November 2013 09:35, Erik Faye-Lund <kusmabite at gmail.com> wrote:
>>
>> Sorry for the late reply.
>>
>> On Fri, Nov 1, 2013 at 4:14 PM, Tapani <tapani.palli at intel.com> wrote:
>> > On 11/01/2013 03:31 PM, Erik Faye-Lund wrote:
>> >>
>> >>
>> >> Won't using the git-sha1 as a compatibility-criteria cause issues for
>> >> developers with local changes? I'm not so worried about this for
>> >> OES_get_program_binary itself, but once the shader-cache is in place
>> >> it sounds like a potential source of difficult to track down
>> >> misbehavior...
>> >>>>>
>> >>>>>
>> >>>>> I agree it might be too aggressive criteria but it is hard to come
>> >>>>> up
>> >>>>> with
>> >>>>> better and as simple.
>> >>>>
>> >>>> That's not my objection. My objection is that this might give
>> >>>> headaches for people with local modifications to the glsl-compiler.
>> >>>> Local modifications does not affect the git-sha1.
>> >>>
>> >>>
>> >>> For the automatic shader cache this headache could be helped a bit
>> >>> with a
>> >>> environment variable or drirc setting that can be used during
>> >>> development.
>> >>> On the other hand an automatic cache must work in a transparent way so
>> >>> it
>> >>> should be always able to recover when it fails, so one should only see
>> >>> it
>> >>> as
>> >>> 'slower than usual' (since recompilation/relink required) sort of
>> >>> behaviour.
>> >>> The WIP of the automatic cache I sent some time earlier also marked
>> >>> (renamed) these 'problematic' cached shaders so that they can be
>> >>> detected
>> >>> on
>> >>> further runs and cache can ignore those.
>> >>>
>> >>> I agree that it might become problematic, on the other hand it is also
>> >>> easy
>> >>> to just wipe ~/.cache/mesa and disable cache.
>> >>
>> >> That won't help for programs that maintain their own (explicit)
>> >> shader-cache, which was the intention of introducing binary shaders to
>> >> OpenGL ES in the first place.
>> >
>> >
>> > Ok, we are talking about the extension, I thought you referred to the
>> > automatic caching. For extension to work, we need at least more Piglit
>> > tests
>> > to ensure that it does not break.
>>
>> I was actually of talking about both. But for the caching, it's
>> probably more forgivable, as developers probably know they changed the
>> compiler and can step around it by flushing the cache. Especially if
>> the build time gets included, like Pauls suggested.
>>
>> > Of course every time you go and touch the
>> > code, some functionality may break, be it this extension or something
>> > else.
>>
>> That's completely irrelevant here. The rejection mechanism isn't
>> intended to catch bugs, but to detect intentional format changes. So
>> let's not confuse the issue.
>>
>> > I'm not sure if Chromium, Qt or other users expect glBinaryProgram call
>> > to
>> > always succeed, hopefully not.
>>
>> If they do, they're buggy. But there is a chance for that, yes. But
>> I'm actually talking about that they might get *told* that everything
>> went well, and still get a broken shader. Or even a crash. Of course,
>> this only applies when mesa is build with local modifications, but
>> that happens a *lot* while debugging application issues. Perhaps bugs
>> start to disappear, because applications take another non-buggy
>> code-path? It'll probably only affect developers, and not happen in
>> the wild. But I still don't think that's a good excuse.
>>
>> However, by a strict reading of the spec, I don't even yhink we're
>> allowed to break the shaders for just any reason. The wording of the
>> spec is "An implementation may reject a program binary if it
>> determines the program binary was produced by an incompatible or
>> outdated version of the compiler". The way I read that, changes that
>> doesn't modify the compiler aren't really allowed to reject previous
>> shaders. While diverging from the spec on this *might* not have many
>> real-world implications, at the very best your solution goes against
>> the intention of this rejection-mechanism.
>
>
> We have to regard any bug fix, feature improvement, or performance
> optimization that's applied to the parser, ast, optimization passes, or
> lowering passes as constituting a change to the compiler.

Not quite. Bug-fixes are fine; if the compiled shader was buggy,
you'll have a buggy binary shader, and it should be loaded with those
bugs intact. Feature improvements also does not matter: surely, a
pre-compiled shader won't use newer features. The same goes for
performance optimization; a program that pre-compiled a shader from an
old compiler surely won't expect to get a performance boost from new
optimization passes. If so, they'll use source shaders instead, and
pay the price of re-compiling. This is *exactly* what binary shaders
are for.

>  Otherwise when
> the user upgrades Mesa, any apps that use get_program_binary will fail to
> get the advantage of the upgraded code.

Yes, as I said, that's a part of the contract.

> As a practical matter, nearly every
> sMesa release makes changes to one of those components, even point releases.
> So allowing old program binaries in the event that the compiler hasn't
> changed really wouldn't produce any end user benefit.

Even if *some* modification happened to the compiler, that's not a
good reason to reject the binary shader IMO. Only *relevant* changes.
And it's not untypical for distro-maintainers to cherry-pick patches
from time to time. And then there's bleeding-edge users.

>  And it would risk big
> developer headaches if we ever make a change that affects compilation
> without realizing it.  IMHO the risks outweigh the benefits.

As someone who has gone down this path before; it's not as bad as you
make it to be. If you attack the problem from the right angle, that
is.

> And as far as whether we're strictly conforming to the spec, the spec is
> vague here since it doesn't define what constitutes the "version of the
> compiler" vs the "version of the driver".  I think it's an entirely
> defensible interpretation to consider the compiler version and the driver
> version to be the same thing; so whenever the driver is upgraded, the client
> application needs to be prepared for the possibility that the shader binary
> might be rejected.

Now you're making terminology up. The spec doesn't talk about the
driver version at all. And it's not unclear at all from the OpenGL
spec what the task of the compiler is, so no. I don't agree that it's
a defensible position in that regard. I'm not saying it isn't
defensible in *any* regard, but that's not it.

>> Storing halfway compiled code rather than fully compiled code seems
>> quite sketchy to me, as it introduces the need for compatibility at a
>> point where flexibility is needed. The result is a program-binary that
>> 1) must still perform register allocation and scheduling when loading,
>> so the time-saving is limited, and 2) fails to reload even for trivial
>> bugfixes in unrelated code. To me, it sounds like it would be better
>> to simply not have the extension exposed than this middle-ground. I'd
>> hate to see applications having to do vendor detection for mesa to
>> disable caching that turned out to be more of a burden than a gain.
>
>
> A key piece of discussion that I feel like we're missing is the fact that on
> i965 at least, the shader sometimes needs to be recompiled as a result of
> changes to GL state.  The get_program_binary spec requires these state-based
> recompiles to work properly, even if the GL state machine is placed in a
> state that hadn't been seen at the time the program binary was saved.

What kind of state are we talking about? I thought all GPU vendors had
learned from NV's GeForce FX fiasco to stay away from non-trivially
patchable state-dependeny shaders...

> Therefore, the program binary format needs to include the IR, since that's
> what we use as a starting point when we do state-based recompiles.

I think that's more an indication that i965 is a bad fit for such an
extension, rather than that the IR must be included. In fact, when we
defined the extension, discouraging state-dependent shaders were
explicitly one of the goals.

Fundamentally redefining what the extension does for *all*
mesa-drivers sounds to me more like intel policy than what's best for
everybody. If the binary format was a driver-specific blob container
like I suggested, intel could *still* serialize the IR into it. But
everyone else won't have to suffer from it.

> Given
> that the IR format is defined in core mesa, it makes sense that the
> serialization code for it should go in core mesa as well.
>
> I agree with you that in order to implement the full intent of the
> extension, this mechanism needs to also store the fully-compiled machine
> code for the shader.  I brought this up in my comments on patch 5/6, and it
> sounds like Tapani is on board with doing that, but he has some more
> research to do before he can implement that part of the feature.
>
> If in the future we have a back end that doesn't need to do state-based
> recompiles (or that can do state-based recompiles without needing access to
> the IR), then we can always add a mechanism that allows the back end to opt
> out of storing the IR in the shader binary.  But since Tapani is targeting
> i965, which needs the IR, I think it's reasonable to get that part of things
> working first.
>

I guess I don't understand why these needs to be two different things,
rather than en intel-specific implementation of some generic
mechanism.

Anyway, Tapani asked me to elaborate why I thought it was a bad idea
in the first place, and I did. Hopefully, I was clear enough.