[Mesa-dev] [PATCH 0/4] improve buffer cache and reuse
James Xiong
james.xiong at intel.com
Wed May 2 18:19:19 UTC 2018
On Wed, 2 May 2018 14:18:21 +0300
Eero Tamminen <eero.t.tamminen at intel.com> wrote:
> Hi,
>
> On 02.05.2018 02:25, James Xiong wrote:
> > From: "Xiong, James" <james.xiong at intel.com>
> >
> > With the current implementation, brw_bufmgr may round up a request
> > size to the next bucket size, result in 25% more memory allocated in
> > the worst senario. For example:
> > Request size Actual size
> > 32KB+1Byte 40KB
> > .
> > 8MB+1Byte 10MB
> > .
> > 96MB+1Byte 112MB
> > This series align the buffer size up to page instead of a bucket
> > size to improve memory allocation efficiency. Performances are
> > almost the same with Basemark ES3, GfxBench4 and 5:
> >
> > Basemark ES3
> > score peak memory allocation
> > before after diff before after diff
> > 21.537462 21.888784 1.61% 419766272 408809472 -10956800
> > 19.566198 19.763429 1.00%
>
> What memory you're measuring:
>
> * VmSize (not that relevant unless you're running out of address
> space)?
>
> * PrivateDirty (listed in /proc/PID/smaps and e.g. by "smem" tool
> [1])?
>
> * total of allocation sizes used by Mesa?
>
> Or something else?
>
> In general, unused memory isn't much of a problem, only dirty
> (written) memory. Kernel maps all unused memory to a single zero
> page, so unused memory takes only few bytes of RAM for the page table
> entries (required for tracking the allocation pages).
I did the measurements in brw_bufmgr from the user space, I kept tracks
of the allocated size for each brw_bufmgr context, and printed out the
peak allocated size when the test completed and context was destroyed.
basically I increased/decreased the size when I915_GEM_CREATE or
GEM_CLOSE were called, so the cached buffers, imported or user_ptr
buffers were excluded.
The brw_bufmgr context is created when the test starts and destroyed
after it completes, the size is for the test case in bytes. This method
can measure exact size allocated for a given test case and the result
is precise too.
>
>
> > GfxBench 4.0
> > score
> > peak memory before after diff before
> > after diff gl_4 564.6052246094 565.2348632813
> > 0.11% 578490368 550199296 -28291072 gl_4_off
> > 727.0440063477 703.5833129883 -3.33% 629501952
> > 598216704 -31285248 gl_manhattan 1053.4223632813
> > 1057.3690185547 0.37% 449568768 421134336 -28434432
> > gl_trex 2708.0656738281 2699.2646484375 -0.33%
> > 130076672 125042688 -5033984 gl_alu2 1207.1490478516
> > 1212.2220458984 0.42% 55496704 55029760 -466944
> > gl_driver2 103.0383071899 103.5478439331 0.49%
> > 13107200 12980224 -126976 gl_manhattan_off 1703.4780273438
> > 1736.9074707031 1.92% 490016768 456548352 -33468416
> > gl_trex_off 2951.6809082031 3058.5422363281 3.49%
> > 157511680 152260608 -5251072 gl_alu2_off 2604.0903320313
> > 2626.2524414063 0.84% 86130688 85483520 -647168
> > gl_driver2_off 204.0173187256 207.0510101318 1.47%
> > 40869888 40615936 -253952
>
> You're missing information on:
> * On which plaform you did the testing (affects variance)
> * how many test rounds you ran, and
> * what is your variance
I ran these tests on a gen9 platform/ubuntu 17.10 LTS. Most of the tests
are consistent, especially the memory usage. The only exception is
GfxBench 4.0 gl_manhattan, I had to ran it 3 times and pick the highest
one. I will apply this method to all tests and re-send with updated
results.
>
> -> I don't know whether your numbers are just random noise.
>
>
> Memory is allocated in pages from kernel, so there's no point in
> showing its usage as bytes. Please use KBs, that's more readable.
>
> (Because of randomness e.g. interactions with the windowing system,
> there can be some variance also in process memory usage, which may
> also be useful to report.)
>
> Because of variance, you don't need that decimals for the scores.
> Removing the extra ones makes that data a bit more readable too.
>
>
> - Eero
>
> [1] "smem" is python based tool available at least in Debian.
> If you want something simpler, e.g. shell script working with
> minimal shells like Busybox, you can use this:
> https://github.com/maemo-tools-old/sp-memusage/blob/master/scripts/mem-smaps-private
>
>
> > GfxBench 5.0
> > score peak memory
> > before after before after diff
> > gl_5 259 259 1137549312 1038286848 -99262464
> > gl_5_off 297 297 1170853888 1071357952 -99495936
> >
> > Xiong, James (4):
> > i965/drm: Reorganize code for the next patch
> > i965/drm: Round down buffer size and calculate the bucket index
> > i965/drm: Searching for a cached buffer for reuse
> > i965/drm: Purge the bucket when its cached buffer is evicted
> >
> > src/mesa/drivers/dri/i965/brw_bufmgr.c | 139
> > ++++++++++++++++++---------------
> > src/util/list.h | 5 ++ 2 files changed, 79
> > insertions(+), 65 deletions(-)
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list