[Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

Mon Apr 3 19:31:07 UTC 2017

On Fri 31 Mar 2017, Chad Versace wrote:
> On Wed 15 Mar 2017, Jason Ekstrand wrote:
> > This cache allows us to easily ensure that we have a unique anv_bo for
> > each gem handle.  We'll need this in order to support multiple-import of
> > memory objects and semaphores.
> > 
> > v2 (Jason Ekstrand):
> >  - Reject BO imports if the size doesn't match the prime fd size as
> >    reported by lseek().
> > 
> > v3 (Jason Ekstrand):
> >  - Fix reference counting around cache_release (Chris Willson)
> >  - Move the mutex_unlock() later in cache_release
> > ---
> >  src/intel/vulkan/anv_allocator.c | 261 +++++++++++++++++++++++++++++++++++++++
> >  src/intel/vulkan/anv_private.h   |  26 ++++
> >  2 files changed, 287 insertions(+)
> 
> 
> > +static uint32_t
> > +hash_uint32_t(const void *key)
> > +{
> > +   return (uint32_t)(uintptr_t)key;
> > +}
> 
> This hash function does not appear hashy.
> 
> If I correctly understand the details of Mesa's struct hash_table,
> choosing the identify function for the hash function causes unwanted
> clustering when inserting consecutive gem handles.  Since the kernel does
> allocate gem handles consecutively, the problem is real.
> 
> For proof, consider the following:
> 
>    - Suppose a long-running process (maybe the compositor) has thrashed on the
>      hash table long enough that its bucket count
>      is ht->size = hash_sizes[7].size = 283. Suppose a spike of
>      compositor activity raises the hash table's density to about 0.5.
>      And suppose the hash table buckets are filled with the consecutive gem
>      handles
>  
>      {0, 0, 0, 0, 4, 5, 6, 7, 8, 9, ..., 127, 128, 0, 0, 0, ..., 0 }
>  
>      The exact density is (128 - 4 + 1) / 283 = 0.4417.
> 
>    - Next, some other in-process activity (maybe OpenGL) generated
>      a lot of gem handles after Vulkan's most recently imported
>      gem handle, 128.

This point in the example---the reason why the gem handles in the
anv_bo_cache skip from 128 to 287---is bogus in Vulkan. The problem *is*
real for multiple in-process OpenGL contexts derived from the same
EGLDisplay, using EGL_EXT_image_dma_buf_import, because each context
shares the same intel_screen, and therefore the same drm device fd. But
in Vulkan, each VkDevice opens its own drm device id. So, bogus example.

BUT, that leads to a new question...

Since each VkDevice has a unique drm device fd, and since the kernel
allocates gem handles consecutively on the fd, and since struct
hash_table only grows and never shrinks, and since patch 8/18 inserts
every VkDeviceMemory into the cache... I believe no collisions are
possible in anv_bo_cache.

If there are no collisions, then the hash table is only adding overhead,
and we should use a direct-addressing lookup table. The bo cache should
look like this:

    struct anv_bo_cache {
       /* The array indices are gem handles. Null entries are legal. */
       struct anv_bo **bos;

       /* Length of the array. Because the array can have holes, this
        * is *not* the number of gem handles in the array. 
        */
       size_t len; 

       pthread_mutex_t mutex;
    };

    struct anv_bo *
    anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t gem_handle)
    {
       struct anv_bo *bo = NULL;

       pthread_mutex_lock(&cache->mutex);

       if (gem_handle < cache->len)
          bo = cache->entries[gem_handle] == NULL)

       pthread_mutex_unlock(&cache->mutex);

       return bo;

    }

BUT, that leads to yet another question...

Why is patch 8/18 inserting every VkDeviceMemory into the cache? If
I understand things correctly, we only *need* to insert a VkDeviceMemory
into the cache if, in vkAllocateMemory, either (1)
VkExportMemoryAllocateInfoKHX::handleTypes != 0 or (2)
VkMemoryAllocateInfo's pNext chain contains an import structure.

If we insert into the cache only those VkDeviceMemories that are
imported or that will be exported, then the bo cache remains small, and
we *should* use a hash table.