[Mesa-dev] build failure: _mesa_BindAttribLocation vs _mesa_lookup_shader_program_err , GLuint vs GLhandleARB

Ian Romanick idr at freedesktop.org
Sun May 25 21:57:50 PDT 2014


On 05/24/2014 09:22 PM, Jeremy Huddleston Sequoia wrote:
> On May 24, 2014, at 19:55, Emil Velikov <emil.l.velikov at gmail.com> wrote:
> 
>> Hi Jeremy,
>>
>> IIRC there was another location where the above typedef gave us the finger.
>> Not entirety sure what the conclusion on the topic was and I believe that some
>> of the patches did not get accepted as they would break our current libGL <>
>> DRI ABI. The discussion (starting with a few patches) is available in the ML
>> archives [1].
>>
>> -Emil
>>
>> [1] http://lists.freedesktop.org/archives/mesa-dev/2014-March/055617.html
> 
> Thanks for the pointer.  +Brian and Ian from the March thread.
> 
> As I understand it, the only platforms where fixing this could break
> the DRI ABI are the ones where GLhandleARB and GLuint do not have the
> same underlying type. The only platform where that is the case is
> darwin, which doesn't use that code (hence why I mentioned above that I
> wasn't concerned about fixing this breaking binary compatibility on
> darwin). Can someone explain how chaning some GLuint types to
> GLhandleARB (or visa versa) could break ABI on other systems? I just
> don't see why that would be the case.
> 
> Ian said:
>> The problem is that drivers are built expecting that glCompileShader and
>> glCompileShaerARB are the same function.  As a result, the driver only
>> asks libGL the offset of one of those functions in the dispatch table,
>> and it only sets one pointer in the dispatch table.  Then an application
>> tries to call the "other" function, gets a NULL dispatch pointer, and
>> explodes.
> 
> That doesn't seem right to me.  Why would the driver only set one
> entry?  As it knows (or at least assumes) that both are the same, it
> seems understandable that it would just ask libGL for one of the
> functions, but it should set both entries in its dispatch table to
> that value.  Having a NULL entry for one of those functions seems
> like an obvious bug at the driver level.  Is the application layer
> really responsible for knowing about what aliasing is being done at
> the driver level?  That's a rather big violation of the abstraction
> that I'd expect to be present.

Re-re-re-re-hashing the old discussion... Every libGL that has ever
shipped on Linux from Mesa has only one dispatch table entry for both
the GLuint and GLhandle version of the functions.  There's only one
place for the driver to store a pointer, and shipping drivers only know
that they need to store one pointer.  Changing either libGL or the
driver will catastrophically break ABI with the other.

There is a temptation to say that we should never have had any functions
alias each other in the dispatch table.  I can see some strong agruments
for that especially in light of this issue and a previous bug with
ARB_framebuffer_object vs EXT_framebuffer_object functions incorrectly
aliasing.  This was an intentional design choice that was made for GLX
in the Xserver.  The original design was for functions that had the same
GLX opcode to share the same dispatch.  In the server, it is impossible
to tell the difference between glTexImage3D and glTexImage3DEXT.  They
both just come in as opcode 4114.  Having multiple entrypoints only made
more work for everyone.

This was also at a time when it was common to have as many as four
different names (vendor, EXT, ARB, and "core") for the same function.
None of the code was generated by scripts, and api_exec.c wasn't
generated until about a year ago.  Each time a new spelling of the name
was added, someone had to remember to manually update some code.

Had GLX protocol been defined for the ARB functions in 2003, it would have:

a. Had 64-bits for the handles, and Mesa would have had multiple
dispatch entries.

b. Had 32-bits for the handles (as the GLX protocol added in 2009
does!), and Mesa would have still only had aliased entries.

> Also, in the earlier thread, Ian said, "I can't understand why we'd
> break our own ABI because of something silly that Apple did.  This
> feels like madness." ... if I recall, the issue wasn't that Apple did
> "something silly," the issue was that GLhandleARB was underspecified
> and different vendors implemented it differently.  Apple is no more
> "at fault" for making it sized to a pointer (which is actually much
> more "safe" given ambiguity) than Mesa is "at fault" for fixing it at
> a 32bit unsigned integer.  The real issue here is that mesa is mixing
> GLhandleARB and GLuint when it shouldn't be and has made other design
> decisions which make fixing bugs like this difficult.

The reason I believe Apple did something silly is that OpenGL 2.0, which
uses GLuint, shipped in October 2004... and in April 2005 Apple shipped
something that used void* sized GLhandleARB.  At least one person from
Apple was on the conference call when the discsion was decision was made
to change GLhandleARB to GLuint in the API, so it should not have been a
surprise that GLhandleARB was a dead end.

The Mesa implementation came even after that, and the implementer
decieded to not have two separate entrypoints for no clear benefit.
That wasn't me, and I don't recall who or when it was.

*BUT* I don't think any of that matters. I think this can all be
resolved without having to break any ABI.  It will take a bit of work of
fairly unpleasant work. but I think it's doable.

1. Rename the existing Mesa functions with the OpenGL 2.0 names and
function signatures.  This will require changes to the XML so that the
api_exec.c generator script does the right thing.

2. Add new functions with the old names and the ARB function signatures.
 These functions ought to be completely trivial: they'll just do some
pointer casting and call the "other" functions.  They should probably be
wrapped in #ifdef APPLE blocks.

3. Introduce new markup to the XML.  Maybe 'offset="assign_apple"' or
similar.  Modify the scripts that matter so that they will treat
functions marked "assign_apple" as "assign" when building on Apple
platforms.  On non-Apple platforms it is ignored.  This means the XML
processing code will have to correctly handle function entires that have
an alias="..." and an offset="assign_apple".

4. Pizza party.

There is an addition step that could perhaps split the dispatch entries
even on Linux, but it seems a little sketchy to me.  I haven't fully
thought it through, so it may not even work at all.  I think we may want
to have a "flag day" for libGL / driver ABI in the not too distant
future, so I think I'd rather put this on the list of things to change
at that time.

> --Jeremy



More information about the mesa-dev mailing list