[Mesa-dev] [PATCH 0/4] improve buffer cache and reuse

Wed May 2 18:19:19 UTC 2018

On Wed, 2 May 2018 14:18:21 +0300
Eero Tamminen <eero.t.tamminen at intel.com> wrote:

> Hi,
> 
> On 02.05.2018 02:25, James Xiong wrote:
> > From: "Xiong, James" <james.xiong at intel.com>
> > 
> > With the current implementation, brw_bufmgr may round up a request
> > size to the next bucket size, result in 25% more memory allocated in
> > the worst senario. For example:
> > Request size    Actual size
> > 32KB+1Byte      40KB
> > .
> > 8MB+1Byte       10MB
> > .
> > 96MB+1Byte      112MB
> > This series align the buffer size up to page instead of a bucket
> > size to improve memory allocation efficiency. Performances are
> > almost the same with Basemark ES3, GfxBench4 and 5:
> > 
> > Basemark ES3
> >             score                    peak memory allocation
> >    before      after    diff        before    after      diff
> > 21.537462  21.888784  1.61%    419766272  408809472  -10956800
> > 19.566198  19.763429  1.00%			  
> 
> What memory you're measuring:
> 
> * VmSize (not that relevant unless you're running out of address
> space)?
> 
> * PrivateDirty (listed in /proc/PID/smaps and e.g. by "smem" tool
> [1])?
> 
> * total of allocation sizes used by Mesa?
> 
> Or something else?
> 
> In general, unused memory isn't much of a problem, only dirty
> (written) memory.  Kernel maps all unused memory to a single zero
> page, so unused memory takes only few bytes of RAM for the page table
> entries (required for tracking the allocation pages).
I did the measurements in brw_bufmgr from the user space, I kept tracks
of the allocated size for each brw_bufmgr context, and printed out the
peak allocated size when the test completed and context was destroyed.
basically I increased/decreased the size when I915_GEM_CREATE or
GEM_CLOSE were called, so the cached buffers, imported or user_ptr
buffers were excluded.

The brw_bufmgr context is created when the test starts and destroyed
after it completes, the size is for the test case in bytes. This method
can measure exact size allocated for a given test case and the result
is precise too.
> 
> 
> > GfxBench 4.0
> >                                      score
> > peak memory before         after         diff     before
> > after     diff gl_4             564.6052246094  565.2348632813
> > 0.11%	578490368 550199296 -28291072 gl_4_off
> > 727.0440063477  703.5833129883	 -3.33%	629501952
> > 598216704 -31285248 gl_manhattan     1053.4223632813
> > 1057.3690185547 0.37%	449568768 421134336 -28434432
> > gl_trex          2708.0656738281 2699.2646484375 -0.33%
> > 130076672 125042688 -5033984 gl_alu2          1207.1490478516
> > 1212.2220458984 0.42%	55496704  55029760  -466944
> > gl_driver2       103.0383071899  103.5478439331  0.49%
> > 13107200  12980224  -126976 gl_manhattan_off 1703.4780273438
> > 1736.9074707031 1.92%	490016768 456548352 -33468416
> > gl_trex_off      2951.6809082031 3058.5422363281 3.49%
> > 157511680 152260608 -5251072 gl_alu2_off      2604.0903320313
> > 2626.2524414063 0.84%	86130688  85483520  -647168
> > gl_driver2_off   204.0173187256  207.0510101318  1.47%
> > 40869888  40615936  -253952  
> 
> You're missing information on:
> * On which plaform you did the testing (affects variance)
> * how many test rounds you ran, and
> * what is your variance
I ran these tests on a gen9 platform/ubuntu 17.10 LTS. Most of the tests
are consistent, especially the memory usage. The only exception is
GfxBench 4.0 gl_manhattan, I had to ran it 3 times and pick the highest
one. I will apply this method to all tests and re-send with updated
results.
> 
> -> I don't know whether your numbers are just random noise.  
> 
> 
> Memory is allocated in pages from kernel, so there's no point in
> showing its usage as bytes.  Please use KBs, that's more readable.
> 
> (Because of randomness e.g. interactions with the windowing system, 
> there can be some variance also in process memory usage, which may
> also be useful to report.)
> 
> Because of variance, you don't need that decimals for the scores. 
> Removing the extra ones makes that data a bit more readable too.
> 
> 
> 	- Eero
> 
> [1] "smem" is python based tool available at least in Debian.
> If you want something simpler, e.g. shell script working with
> minimal shells like Busybox, you can use this:
> https://github.com/maemo-tools-old/sp-memusage/blob/master/scripts/mem-smaps-private
> 
> 
> > GfxBench 5.0
> >              score               peak memory		
> >           before	after   before     after       diff
> > gl_5       259   259  1137549312  1038286848 -99262464
> > gl_5_off   297   297  1170853888  1071357952 -99495936
> > 
> > Xiong, James (4):
> >    i965/drm: Reorganize code for the next patch
> >    i965/drm: Round down buffer size and calculate the bucket index
> >    i965/drm: Searching for a cached buffer for reuse
> >    i965/drm: Purge the bucket when its cached buffer is evicted
> > 
> >   src/mesa/drivers/dri/i965/brw_bufmgr.c | 139
> > ++++++++++++++++++---------------
> > src/util/list.h                        |   5 ++ 2 files changed, 79
> > insertions(+), 65 deletions(-) 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev