[Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

Mon Apr 3 21:05:28 UTC 2017

On Mon, Apr 3, 2017 at 12:31 PM, Chad Versace <chadversary at chromium.org>
wrote:

> On Fri 31 Mar 2017, Chad Versace wrote:
> > On Wed 15 Mar 2017, Jason Ekstrand wrote:
> > > This cache allows us to easily ensure that we have a unique anv_bo for
> > > each gem handle.  We'll need this in order to support multiple-import
> of
> > > memory objects and semaphores.
> > >
> > > v2 (Jason Ekstrand):
> > >  - Reject BO imports if the size doesn't match the prime fd size as
> > >    reported by lseek().
> > >
> > > v3 (Jason Ekstrand):
> > >  - Fix reference counting around cache_release (Chris Willson)
> > >  - Move the mutex_unlock() later in cache_release
> > > ---
> > >  src/intel/vulkan/anv_allocator.c | 261 ++++++++++++++++++++++++++++++
> +++++++++
> > >  src/intel/vulkan/anv_private.h   |  26 ++++
> > >  2 files changed, 287 insertions(+)
> >
> >
> > > +static uint32_t
> > > +hash_uint32_t(const void *key)
> > > +{
> > > +   return (uint32_t)(uintptr_t)key;
> > > +}
> >
> > This hash function does not appear hashy.
> >
> > If I correctly understand the details of Mesa's struct hash_table,
> > choosing the identify function for the hash function causes unwanted
> > clustering when inserting consecutive gem handles.  Since the kernel does
> > allocate gem handles consecutively, the problem is real.
> >
> > For proof, consider the following:
> >
> >    - Suppose a long-running process (maybe the compositor) has thrashed
> on the
> >      hash table long enough that its bucket count
> >      is ht->size = hash_sizes[7].size = 283. Suppose a spike of
> >      compositor activity raises the hash table's density to about 0.5.
> >      And suppose the hash table buckets are filled with the consecutive
> gem
> >      handles
> >
> >      {0, 0, 0, 0, 4, 5, 6, 7, 8, 9, ..., 127, 128, 0, 0, 0, ..., 0 }
> >
> >      The exact density is (128 - 4 + 1) / 283 = 0.4417.
> >
> >    - Next, some other in-process activity (maybe OpenGL) generated
> >      a lot of gem handles after Vulkan's most recently imported
> >      gem handle, 128.
>
> This point in the example---the reason why the gem handles in the
> anv_bo_cache skip from 128 to 287---is bogus in Vulkan. The problem *is*
> real for multiple in-process OpenGL contexts derived from the same
> EGLDisplay, using EGL_EXT_image_dma_buf_import, because each context
> shares the same intel_screen, and therefore the same drm device fd. But
> in Vulkan, each VkDevice opens its own drm device id. So, bogus example.
>
> BUT, that leads to a new question...
>
> Since each VkDevice has a unique drm device fd, and since the kernel
> allocates gem handles consecutively on the fd, and since struct
> hash_table only grows and never shrinks, and since patch 8/18 inserts
> every VkDeviceMemory into the cache... I believe no collisions are
> possible in anv_bo_cache.
>

Does this fall under the category of unbreakable kernel ABI or is it just a
side-effect of the implementation?  If not, then I'm reluctant to trust it.

> If there are no collisions, then the hash table is only adding overhead,
>

Sure, but a no-collision hash table is pretty cheap...

and we should use a direct-addressing lookup table. The bo cache should
> look like this:
>
>     struct anv_bo_cache {
>        /* The array indices are gem handles. Null entries are legal. */
>        struct anv_bo **bos;
>
>        /* Length of the array. Because the array can have holes, this
>         * is *not* the number of gem handles in the array.
>         */
>        size_t len;
>
>        pthread_mutex_t mutex;
>     };
>
>     struct anv_bo *
>     anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t gem_handle)
>     {
>        struct anv_bo *bo = NULL;
>
>        pthread_mutex_lock(&cache->mutex);
>
>        if (gem_handle < cache->len)
>           bo = cache->entries[gem_handle] == NULL)
>
>        pthread_mutex_unlock(&cache->mutex);
>
>        return bo;
>
>     }
>
> BUT, that leads to yet another question...
>
> Why is patch 8/18 inserting every VkDeviceMemory into the cache? If
> I understand things correctly, we only *need* to insert a VkDeviceMemory
> into the cache if, in vkAllocateMemory, either (1)
> VkExportMemoryAllocateInfoKHX::handleTypes != 0 or (2)
> VkMemoryAllocateInfo's pNext chain contains an import structure.
>

Because I'm lazy.  In order to start using the bo cache,
anv_device_memory::bo needs to be a pointer (well, it's makes the BO cache
API simpler and more efficient if it's a pointer).  This would mean that we
would have to allocate an additional chunk of memory or go through some
other hoops in order to make it work.  At the end of the day, just stuffing
everything in the cache was simpler and kept us to a single path.

> If we insert into the cache only those VkDeviceMemories that are
> imported or that will be exported, then the bo cache remains small, and
> we *should* use a hash table.
>

Maybe.  But the client isn't supposed to be allocating hundreds of
VkDeviceMemory objects.  It's supposed to allocate a few and then
suballocate from those.  If the client allocates so many memory objects
that they start hitting hash table performance issues, that's their own
fault.

Also, please remember that vkAllocateMemory is considered to be a *very*
heavy-weight function in Vulkan.  Compared to an ioctl which allocates
memory, a hash table insert is trivial.  I'm reasonably happy to make a few
changes here or there to make it more efficient if any of this proves to be
a problem.  However, I think we're working way too hard to micro-optimize
something that takes < 0.001% of runtime.

--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170403/5022e79d/attachment-0001.html>