[Mesa-dev] Improving ralloc performance for the GLSL compiler
Timothy Arceri
timothy.arceri at collabora.com
Tue Aug 30 22:53:09 UTC 2016
On Tue, 2016-08-30 at 16:14 -0400, Connor Abbott wrote:
> On Tue, Aug 30, 2016 at 10:06 AM, Marek Olšák <maraeo at gmail.com>
> wrote:
> >
> > On Tue, Aug 30, 2016 at 3:21 PM, Eero Tamminen
> > <eero.t.tamminen at intel.com> wrote:
> > >
> > > Hi,
> > >
> > >
> > > On 30.08.2016 12:51, Marek Olšák wrote:
> > > >
> > > >
> > > > Recently I discovered that our GLSL compiler spends a lot of
> > > > time in
> > > > rzalloc_size, so I looked at possible options to optimize that.
> > > > It's
> > > > worth noting that too many existing allocations slow down
> > > > subsequent
> > > > malloc calls, which in turn slows down the GLSL compiler. When
> > > > I kept
> > > > 5 instances of LLVMContext alive between compilations (I wanted
> > > > to
> > > > reuse them), the GLSL compiler slowed down. That shows that the
> > > > GLSL
> > > > compiler performance is too dependent on the size and
> > > > complexity of
> > > > the heap.
> > > >
> > > > So I decided to write my own linear allocator and then compared
> > > > it
> > > > with jemalloc preloaded by LD, and jemalloc linked statically
> > > > and used
> > > > by ralloc only.
> > > >
> > > > The test was shader-db using AMD's shader collection. The
> > > > command line
> > > > was:
> > > > time GALLIUM_NOOP=1 shader-db/run shaders
> > > > The noop driver ensures the compilation process ends with TGSI.
> > > >
> > > >
> > > > Default Mesa:
> > > > real 0m58.343s
> > > > user 3m48.828s
> > > > sys 0m0.760s
> > > >
> > > > Mesa with LD_PRELOAD=/usr/lib/x86_64-linux-
> > > > gnu/libjemalloc.so.1:
> > > > real 0m48.550s (17% less time)
> > > > user 3m9.544s
> > > > sys 0m1.700s
> > > >
> > > > Ralloc using _mesa_je_{calloc, realloc, free} and Mesa links
> > > > against
> > > > my libmesa_jemalloc_pic.a:
> > > > real 0m49.580s (15% less time)
> > > > user 3m14.452s
> > > > sys 0m0.996s
> > > >
> > > > Ralloc using my own linear allocator that allocates out of 32KB
> > > > buffers for 512b and smaller allocations:
> > > > real 0m46.521s (20% less time)
> > > > user 3m1.304s
> > > > sys 0m1.740s
> > > >
> > > >
> > > > Now let's test complete compilation down to GCN bytecode:
> > > >
> > > > Default Mesa:
> > > > real 1m57.634s
> > > > user 7m41.692s
> > > > sys 0m1.824s
> > > >
> > > > Mesa with LD_PRELOAD=/usr/lib/x86_64-linux-
> > > > gnu/libjemalloc.so.1:
> > > > real 1m42.604s (13% less time)
> > > > user 6m39.776s
> > > > sys 0m3.828s
> > > >
> > > > Ralloc using _mesa_je_{calloc, realloc, free} and Mesa links
> > > > against
> > > > my libmesa_jemalloc_pic.a:
> > > > real 1m44.413s (11% less time)
> > > > user 6m48.808s
> > > > sys 0m2.480s
> > > >
> > > > Ralloc using my own linear allocator:
> > > > real 1m40.486s (14.6% less time)
> > > > user 6m34.456s
> > > > sys 0m2.224s
> > > >
> > > >
> > > > The linear allocator that I wrote has a very high memory usage
> > > > due to
> > > > the inability to free 32KB blocks if those blocks have at least
> > > > one
> > > > living allocation. The workaround would be to do realloc() when
> > > > changing a ralloc parent in order to "defragment" the memory,
> > > > but
> > > > that's more involved.
> > > >
> > > > I don't know much about glibc, but it's hard to believe that
> > > > glibc
> > > > people have been purposely ignoring jemalloc for so long. There
> > > > must
> > > > be some anti-performance politics going on, but enough of
> > > > speculations.
> > >
> > >
> > > Different allocators have different trade-offs:
> > > * single-core speed
> > > * multi-core speed
> > > * memory usage
> > > * long time memory fragmentation
> > > * alloc debugging support & robustness
> > >
> > > And they can behave different with different allocation patterns
> > > and sizes.
> > > Jemalloc being better in one test than ptmalloc doesn't
> > > necessarily mean
> > > that it's better in another.
> > >
> > > Here's some discussion on the subject:
> > > https://lwn.net/Articles/273084/
> > >
> > > The used algorithms and some of the trade-offs are described in
> > > allocators'
> > > source codes.
> > >
> > >
> > > >
> > > > If we don't care about memory usage, let's use my allocator.
> > >
> > >
> > > Modern games are most demanding use-case for compiler, use
> > > largest number of
> > > shaders, but almost all (>90%) Steam games are *still* 32-
> > > bit. Before
> > > compiler memory usage optimizations by Ian & Co, several of them
> > > crashed
> > > because they ran out of 32-bit address space.
> >
> > Did the games crash because i965 was using GLSL IR as its main
> > compiler IR? Or was the problem that GLSL IR hadn't been released
> > at
> > link time, because the driver had to keep all of it for compiling
> > shader variants? The memory usage issue might have been i965-
> > specific
> > and not relevant right now.
> >
> > Note that Gallium releases GLSL IR in glLinkProgram and other
> > drivers
> > should do that too. If some drivers don't, they are going to have
> > memory usage issues either way.
>
> I believe that at the time, i965 had to keep GLSL IR around after
> linking to handle shader variants. Nowadays, we release the GLSL IR
> at
> link time and only hang onto the NIR for variants.
Are you sure? As far as I was aware no one ever finished this up for
i965.
> NIR is inherently a
> lot more compact than GLSL IR since it uses a lot fewer variables and
> variable dereferences (they're mostly replaced by SSA values during
> optimization time). It's not as compact as TGSI, since it's designed
> to be mutated/optimized, but it could be made a lot smaller with a
> little tuning. Also, Ian did a lot of work to make GLSL's memory
> footprint smaller, which still helps during link time.
>
> >
> >
> > Marek
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list