[Mesa-dev] [PATCH] mesa: avoid pthread_mutexes when adjusting object refcounts

Wed Oct 19 22:34:12 UTC 2016

On 10/19/2016 12:58 PM, Matt Turner wrote:
> On Wed, Oct 19, 2016 at 12:06 PM, Jan Ziak <0xe2.0x9a.0x9b at gmail.com> wrote:
>> This patch removes locks around reference counts in favor of atomic inc/dec
>> operations on the refcounts. This makes the refcount adjustments much faster.
> 
> I had the same idea last year and sent a patch series ([PATCH 00/13]
> mesa: Locking improvements and optimizations [1]).
> 
> The part that corresponds to your patch are patches 5-9. 1-4, 10, and
> 11 were committed.
> 
> I'm having a difficult time remembering why I ultimately dropped the
> series, but Ian's comment on 05/13 indicates that the locking is
> broken for bufferobj.c, so it's not appropriate to use atomics there.
> I remember that texture locking is totally broken, but I cannot
> remember the details.

So... I was trying to trick you into fixing the locking problems.  I see
that I have failed. :)

I looked at this quite a bit more closely today.  Here's one timeline
that worries me:

    Thread 1                     Thread 2
    Call glDeleteBuffers
                                 Call glNamedBufferData
                                 Call _mesa_lookup_bufferobj_err
    Remove buffer ID from hash
    Decrement refcount
    Free BO backing store
    Delete buffer object
                                 Reference freed memory

The entire body of glDeleteBuffers holds a different lock (the hashtable
lock), so all of that happens atomically.  _mesa_lookup_bufferobj_err
does not update the refcount, so bad things can happen.

I think the right way to fix this is to have every _mesa_lookup_*
function increment the refcount of the object before returning it.  At
that point, every modification to the refcount should happen while the
hashtable mutex is held.  I think that means that neither atomics nor an
additional mutex are needed for reference counting.

The other timelime the concerns me is:

    Thread 1                     Thread 2
    Call glNamedBufferData
    Call _mesa_lookup_bufferobj_err
    Call ctx->Driver.BufferData
        (buffer_data_fallback as an example)
    Free old buffer
                                 Call glNamedBufferData
                                 Call _mesa_lookup_bufferobj_err
                                 Call ctx->Driver.BufferData
                                 Free old buffer
                                     (possibly double-free)
                                 Allocate new buffer
    Allocate new buffer
        (possibly leak the other new buffer)

To fix this, I think we do want a per-object mutex that is only used in
ctx->Driver.BufferData.  If there is only one context in the share group
(a very common case!), we could avoid even that locking.

Textures and renderbuffers have similar issues.  Textures have
additional problems with derived state.  Calling glTexParameter can
cause a lot of state inside the texture to get updated.  Multiple calls
to glTexParameter from multiple threads can result in inconsistent
state.  It also means that rendering from one thread while another
thread is updating state can lead to problems.

Other drivers solve this problem by having N+1 copies of each object's
state.  Each context gets its own copy, and the share group gets a copy.
 Writes to state go to both the local copy and the shared copy.  Reads
come only from the local copy.  glBindTexture updates the local copy
from the shared copy.  The local copies are basically a write-through
cache.  I'm not sure how this works with bindless.  DSA also causes
problems because every DSA command implicitly has a bind.

> The other thing (in 12/13 [2]) is that vertex array objects,
> framebuffer objects, and pipeline objects are "container objects" and
> are not shared between contexts, so they require no locking. It would
> be better to solve that and avoid locking/atomics all together. But
> then FBOs (and renderbuffers?) from EXT_framebuffer_object and VAOs
> from APPLE_vertex_array_object are actually shared between contexts,
> so those are out as well.

If my theories above are correct, we don't need atomics or mutexes for
any of it.

> That seems to leave sampler objects, shader objects, and program
> objects. I suspect those are relatively small compared with textures
> or buffers.
> 
> I'm not really sure what I'd suggest, to be honest. It seems like a
> lot of aggravation for a minimal reward if done right.
> 
> [1] https://lists.freedesktop.org/archives/mesa-dev/2015-August/090972.html
> [2] https://lists.freedesktop.org/archives/mesa-dev/2015-August/090984.html
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev