[Mesa-dev] [PATCH 0/4] improve buffer cache and reuse

Wed May 2 11:18:21 UTC 2018

Hi,

On 02.05.2018 02:25, James Xiong wrote:
> From: "Xiong, James" <james.xiong at intel.com>
> 
> With the current implementation, brw_bufmgr may round up a request
> size to the next bucket size, result in 25% more memory allocated in
> the worst senario. For example:
> Request size    Actual size
> 32KB+1Byte      40KB
> .
> 8MB+1Byte       10MB
> .
> 96MB+1Byte      112MB
> This series align the buffer size up to page instead of a bucket size
> to improve memory allocation efficiency. Performances are almost the
> same with Basemark ES3, GfxBench4 and 5:
> 
> Basemark ES3
>             score                    peak memory allocation
>    before      after    diff        before    after      diff
> 21.537462  21.888784  1.61%    419766272  408809472  -10956800
> 19.566198  19.763429  1.00%			

What memory you're measuring:

* VmSize (not that relevant unless you're running out of address space)?

* PrivateDirty (listed in /proc/PID/smaps and e.g. by "smem" tool [1])?

* total of allocation sizes used by Mesa?

Or something else?

In general, unused memory isn't much of a problem, only dirty (written) 
memory.  Kernel maps all unused memory to a single zero page, so unused 
memory takes only few bytes of RAM for the page table entries (required 
for tracking the allocation pages).

> GfxBench 4.0
>                                      score                        peak memory
>                       before         after         diff     before   after     diff
> gl_4             564.6052246094  565.2348632813  0.11%	578490368 550199296 -28291072
> gl_4_off         727.0440063477  703.5833129883	 -3.33%	629501952 598216704 -31285248
> gl_manhattan     1053.4223632813 1057.3690185547 0.37%	449568768 421134336 -28434432
> gl_trex          2708.0656738281 2699.2646484375 -0.33%	130076672 125042688 -5033984
> gl_alu2          1207.1490478516 1212.2220458984 0.42%	55496704  55029760  -466944
> gl_driver2       103.0383071899  103.5478439331  0.49%	13107200  12980224  -126976
> gl_manhattan_off 1703.4780273438 1736.9074707031 1.92%	490016768 456548352 -33468416
> gl_trex_off      2951.6809082031 3058.5422363281 3.49%	157511680 152260608 -5251072
> gl_alu2_off      2604.0903320313 2626.2524414063 0.84%	86130688  85483520  -647168
> gl_driver2_off   204.0173187256  207.0510101318  1.47%	40869888  40615936  -253952

You're missing information on:
* On which plaform you did the testing (affects variance)
* how many test rounds you ran, and
* what is your variance

-> I don't know whether your numbers are just random noise.

Memory is allocated in pages from kernel, so there's no point in showing 
its usage as bytes.  Please use KBs, that's more readable.

(Because of randomness e.g. interactions with the windowing system, 
there can be some variance also in process memory usage, which may
also be useful to report.)

Because of variance, you don't need that decimals for the scores. 
Removing the extra ones makes that data a bit more readable too.

	- Eero

[1] "smem" is python based tool available at least in Debian.
If you want something simpler, e.g. shell script working with
minimal shells like Busybox, you can use this:
https://github.com/maemo-tools-old/sp-memusage/blob/master/scripts/mem-smaps-private

> GfxBench 5.0
>              score               peak memory		
>           before	after   before     after       diff
> gl_5       259   259  1137549312  1038286848 -99262464
> gl_5_off   297   297  1170853888  1071357952 -99495936
> 
> Xiong, James (4):
>    i965/drm: Reorganize code for the next patch
>    i965/drm: Round down buffer size and calculate the bucket index
>    i965/drm: Searching for a cached buffer for reuse
>    i965/drm: Purge the bucket when its cached buffer is evicted
> 
>   src/mesa/drivers/dri/i965/brw_bufmgr.c | 139 ++++++++++++++++++---------------
>   src/util/list.h                        |   5 ++
>   2 files changed, 79 insertions(+), 65 deletions(-)
>