[Intel-gfx] [RFC] libdrm_intel: Rework BO allocs to avoid rounding up to bucket size

Fri Aug 29 12:16:03 CEST 2014

On Fri, Aug 29, 2014 at 11:02:01AM +0100, Arun Siluvery wrote:
> From: Garry Lancaster <garry.lancaster at intel.com>
> 
> libdrm includes a scheme where freed buffer objects (BOs)
> are held in a cache. This allows incoming allocation requests to be
> serviced by re-using an old BO, instead of requiring a new
> object to be allocated. This is a performance enhancement.
> The cache is divided into "buckets". Each bucket holds unused
> BOs of a pre-determined size. When a BO allocation request is seen,
> the bucket for BOs of this size or larger is selected. Any BO
> currently in the bucket will be re-used for the allocation. If the
> bucket was empty, a new BO is created. However, the BO is created
> with the size determined by the selected bucket (i.e. the size is
> rounded up to the bucket size), rather than being created with the
> originally requested size. This is so that when the BO is freed,
> it can be released into the bucket and re-used by any other allocation
> which selects the same bucket.
> 
> Depending upon the size of the allocation, this rounding up can
> result in a significant wastage of memory when allocating a BO. For
> example, a BO request just over 132K allocated during GLES context
> creation was rounded up to the next bucket size of 160K. Such wastage
> can be critical on devices with low memory.
> 
> This commit reworks the BO allocation code. On a BO allocation request,
> if the selected bucket contains any BOs, each of them is checked to
> see if any is large enough to fulfill the allocation request. If not,
> a new BO is created, but (due to the new check) it is no longer
> necessary to round up its size to match the size determined by the
> selected bucket.
> 
> So, previously, buckets contained BOs that were all the same size. But now
> the BOs in a bucket can be different sizes: in the range from the size of the
> next smaller, nominal, bucket size to the current, nominal, bucket size.
> 
> On a 1GB system, the following reductions in BO memory usage were seen:
> 
> BaseMark X 1.0:                324.4MB -> 306.0MB (-18.4MB;  5.7% saving)
> BaseMark X 1.1 Medium Quality: 206.9MB -> 201.2MB (- 5.7MB;  2.8% saving)
> GFXBench 3.0 TRex:             216.6MB -> 200.0MB (-16.6MB;  8.3% saving)
> GFXBench 3.0 Manhattan:        281.4MB -> 246.8MB (-34.6MB; 12.3% saving)
> 
> No performance change was seen on BaseMarkX. GFXBench 3.0 showed small
> performance increases (~0.5fps on Manhattan, ~1-2fps on TRex) which may be
> due to reduced activity of the OOM killer.

The principle for rounding up was to increase the cache hit rate and
thereby reduce allocations. Might be interesting to know whether the
number of bo allocated also changes. If not, the argument is that the
working set is pretty stable and has a natural set of sizes which it
reuses. A counter example might then be uxa, glamor, compositors which
off-the-top-of-my-head would have more variable object sizes.

Reducing the impact of thrashing should itself be measurable, and a
useful statistic to track.

As a corollary to exact allocations, you can then reduce the number of
buckets again (the number was increased to allow finer-grained
allocations). Again, it is hard to judge whether handing back larger
objects will lead to memory wastage. So yet another statistic to track
is requested versus allocated memory sizes.

Also it is important to state what type of system you are measuring the
impact of allocations for -- the behaviour of a cache miss is
dramatically different between LLC and non-LLC systems.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre