[Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

Mon Apr 3 23:26:29 UTC 2017

On Mon, Apr 3, 2017 at 3:45 PM, Chad Versace <chadversary at chromium.org>
wrote:

> On Mon 03 Apr 2017, Jason Ekstrand wrote:
> > On Mon, Apr 3, 2017 at 12:31 PM, Chad Versace <chadversary at chromium.org>
> > wrote:
> >
> > > On Fri 31 Mar 2017, Chad Versace wrote:
> > > > On Wed 15 Mar 2017, Jason Ekstrand wrote:
> > > > > This cache allows us to easily ensure that we have a unique anv_bo
> for
> > > > > each gem handle.  We'll need this in order to support
> multiple-import
> > > of
> > > > > memory objects and semaphores.
> > > > >
> > > > > v2 (Jason Ekstrand):
> > > > >  - Reject BO imports if the size doesn't match the prime fd size as
> > > > >    reported by lseek().
> > > > >
> > > > > v3 (Jason Ekstrand):
> > > > >  - Fix reference counting around cache_release (Chris Willson)
> > > > >  - Move the mutex_unlock() later in cache_release
> > > > > ---
> > > > >  src/intel/vulkan/anv_allocator.c | 261
> ++++++++++++++++++++++++++++++
> > > +++++++++
> > > > >  src/intel/vulkan/anv_private.h   |  26 ++++
> > > > >  2 files changed, 287 insertions(+)
> > > >
> > > >
> > > > > +static uint32_t
> > > > > +hash_uint32_t(const void *key)
> > > > > +{
> > > > > +   return (uint32_t)(uintptr_t)key;
> > > > > +}
> > > >
> > > > This hash function does not appear hashy.
> > > >
> > > > If I correctly understand the details of Mesa's struct hash_table,
> > > > choosing the identify function for the hash function causes unwanted
> > > > clustering when inserting consecutive gem handles.  Since the kernel
> does
> > > > allocate gem handles consecutively, the problem is real.
> > > >
> > > > For proof, consider the following:
> > > >
> > > >    - Suppose a long-running process (maybe the compositor) has
> thrashed
> > > on the
> > > >      hash table long enough that its bucket count
> > > >      is ht->size = hash_sizes[7].size = 283. Suppose a spike of
> > > >      compositor activity raises the hash table's density to about
> 0.5.
> > > >      And suppose the hash table buckets are filled with the
> consecutive
> > > gem
> > > >      handles
> > > >
> > > >      {0, 0, 0, 0, 4, 5, 6, 7, 8, 9, ..., 127, 128, 0, 0, 0, ..., 0 }
> > > >
> > > >      The exact density is (128 - 4 + 1) / 283 = 0.4417.
> > > >
> > > >    - Next, some other in-process activity (maybe OpenGL) generated
> > > >      a lot of gem handles after Vulkan's most recently imported
> > > >      gem handle, 128.
> > >
> > > This point in the example---the reason why the gem handles in the
> > > anv_bo_cache skip from 128 to 287---is bogus in Vulkan. The problem
> *is*
> > > real for multiple in-process OpenGL contexts derived from the same
> > > EGLDisplay, using EGL_EXT_image_dma_buf_import, because each context
> > > shares the same intel_screen, and therefore the same drm device fd. But
> > > in Vulkan, each VkDevice opens its own drm device id. So, bogus
> example.
> > >
> > > BUT, that leads to a new question...
> > >
> > > Since each VkDevice has a unique drm device fd, and since the kernel
> > > allocates gem handles consecutively on the fd, and since struct
> > > hash_table only grows and never shrinks, and since patch 8/18 inserts
> > > every VkDeviceMemory into the cache... I believe no collisions are
> > > possible in anv_bo_cache.
> > >
> >
> > Does this fall under the category of unbreakable kernel ABI or is it
> just a
> > side-effect of the implementation?  If not, then I'm reluctant to trust
> it.
>
> I'm not certain. But krh, sitting beside me, says "It's ABI at this point.
> The kernel uses a 'idr' structure which guarantees that behavior".
>

If we think we can rely on it, I'm happy to make it into a flat array but
it'll basically be a simplified hash table at that point.

> > > If there are no collisions, then the hash table is only adding
> overhead,
> >
> > Sure, but a no-collision hash table is pretty cheap...
> >
> > > and we should use a direct-addressing lookup table. The bo cache should
> > > look like this:
> > >
> > >     struct anv_bo_cache {
> > >        /* The array indices are gem handles. Null entries are legal. */
> > >        struct anv_bo **bos;
> > >
> > >        /* Length of the array. Because the array can have holes, this
> > >         * is *not* the number of gem handles in the array.
> > >         */
> > >        size_t len;
> > >
> > >        pthread_mutex_t mutex;
> > >     };
> > >
> > >     struct anv_bo *
> > >     anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t
> gem_handle)
> > >     {
> > >        struct anv_bo *bo = NULL;
> > >
> > >        pthread_mutex_lock(&cache->mutex);
> > >
> > >        if (gem_handle < cache->len)
> > >           bo = cache->entries[gem_handle];
> > >
> > >        pthread_mutex_unlock(&cache->mutex);
> > >
> > >        return bo;
> > >
> > >     }
> > >
> > > BUT, that leads to yet another question...
> > >
> > > Why is patch 8/18 inserting every VkDeviceMemory into the cache? If
> > > I understand things correctly, we only *need* to insert a
> VkDeviceMemory
> > > into the cache if, in vkAllocateMemory, either (1)
> > > VkExportMemoryAllocateInfoKHX::handleTypes != 0 or (2)
> > > VkMemoryAllocateInfo's pNext chain contains an import structure.
> > >
> >
> > Because I'm lazy.  In order to start using the bo cache,
> > anv_device_memory::bo needs to be a pointer (well, it's makes the BO
> cache
> > API simpler and more efficient if it's a pointer).  This would mean that
> we
> > would have to allocate an additional chunk of memory or go through some
> > other hoops in order to make it work.  At the end of the day, just
> stuffing
> > everything in the cache was simpler and kept us to a single path.
>
>
> Huh? I don't understand how pointers-or-not-pointers affect the decision
> to use an array or a hash table.  In the lookup function I wrote above,
> the anv_bo_cache holds pointers to anv_bo's, just like the anv_bo_cache
> in the actual patch. (I kept it that way so the API would be the same as
> in your patch series).
>

That was in response to the question about inserting every VkDeviceMemory
into the cache.

> > > If we insert into the cache only those VkDeviceMemories that are
> > > imported or that will be exported, then the bo cache remains small, and
> > > we *should* use a hash table.
> > >
> >
> > Maybe.  But the client isn't supposed to be allocating hundreds of
> > VkDeviceMemory objects.  It's supposed to allocate a few and then
> > suballocate from those.  If the client allocates so many memory objects
> > that they start hitting hash table performance issues, that's their own
> > fault.
> >
> > Also, please remember that vkAllocateMemory is considered to be a *very*
> > heavy-weight function in Vulkan.  Compared to an ioctl which allocates
> > memory, a hash table insert is trivial.  I'm reasonably happy to make a
> few
> > changes here or there to make it more efficient if any of this proves to
> be
> > a problem.  However, I think we're working way too hard to micro-optimize
> > something that takes < 0.001% of runtime.
>
> True. The difference between time needed for an array lookup and a hash
> table lookup is tiny compared to the ioctl. I wasn't suggesting to use
> an array for that 0.001% though. I suggested an array because struct
> hash_table, as used in this patch, is just a resizable array with an
> unneeded vtable. It just felt weird.
>
> We've discussed all the things :) I'll drop the complaint about
> hash-vs-array.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170403/e7f566b7/attachment-0001.html>