[Mesa-dev] [PATCH 0/4] improve buffer cache and reuse

Mon May 7 15:10:33 UTC 2018

Hi,

On 05.05.2018 03:56, James Xiong wrote:
> From: "Xiong, James" <james.xiong at intel.com>
> 
> With the current implementation, brw_bufmgr may round up a request
> size to the next bucket size, result in 25% more memory allocated in
> the worst senario. For example:
> Request size    Actual size
> 32KB+1Byte      40KB
> .
> 8MB+1Byte       10MB
> .
> 96MB+1Byte      112MB
> This series align the buffer size up to page instead of a bucket size
> to improve memory allocation efficiency.
> 
> Performance and memory usage were measured on a gen9 platform using
> Basemark ES3, GfxBench 4 and 5, each test case ran 6 times.
> 
> Basemark ES3
> score                            peak memory size(KB)
> before    after   diff           before  after   diff
> max avg   max avg max    avg
> 22  21    23  21  2.83%  1.21%   409928  395573  -14355
> 20  20    20  20  0.53%  0.41%			

Thanks for the new data!

As the values below seem similar to what you earlier sent, I assume
the tests are listed here in the same order, i.e:

> GfxBench 4.0
> scorepeak memory size(KB)
               > score                                   peak memory 
size(KB)
               > before      after       diff            before  after 
diff
               > max   avg   max   avg   max     avg
gl_4          >  584   577   586   583  0.45%   1.02%   566489  539699 
-26791
manhattan     > 1604  1144  1650  1202	2.81%   4.86%   439220  411596 
-27624
gl_trex       > 2711  2222  2718  2152  0.25%  -3.25%   126065  121398 
-4667
gl_alu2       > 1218  1213  1212  1154 -0.53%  -5.10%    54153   53868 
  -285
driver2       >  106   104   106   103  0.85%  -1.66%    12730   12666 
   -64
gl_4_off      >  728   727   727   726 -0.03%  -0.16%   614730  586794 
-27936
manhattan_off > 1732  1709  1740  1728  0.49%   1.11%   475716  447726 
-27990
gl_trex_off   > 3051  2969  3066  3047  0.50%   2.55%   154169  148962 
-5207
gl_alu2_off   > 2626  2607  2626  2625  0.00%   0.70%    84119   83150 
  -969
driver2_off   >  211   208   208   205 -1.26%  -1.21%    39924   39667 
  -257

> GfxBench 5.0
               > score                               peak memory size(KB)
               > before    after     diff            before   after    diff
               > max  avg  max  avg  max     avg
gl_5          > 260  258  259  256 -0.39%  -0.85%   1111037  1013520 
-97517
gl_5_off      > 298  295  298  297  0.00%   0.45%   1143593  1040844 
-102749

As expected, max gives more stable results than average.

There could be performance improvement in Manhattan v3.0. At least it
had largest peak memory usage saving in GfxBench v4, both absolutely &
relatively (6%).

gl_alu2 onscreen average drop seems also suspiciously large, but as it's
not visible in max value, or in alu2 offscreen, or your previous test,
I think it it's just random variation.

In light of what I know of these tests variance on TDP limited devices,
I think rest of your GfxBench v4 & v5 performance changes also fall 
within random variance.

	- Eero

> Xiong, James (4):
>    i965/drm: Reorganize code for the next patch
>    i965/drm: Round down buffer size and calculate the bucket index
>    i965/drm: Searching for a cached buffer for reuse
>    i965/drm: Purge the bucket when its cached buffer is evicted
> 
>   src/mesa/drivers/dri/i965/brw_bufmgr.c | 139 ++++++++++++++++++---------------
>   src/util/list.h                        |   5 ++
>   2 files changed, 79 insertions(+), 65 deletions(-)