[Mesa-dev] [PATCH 0/4] RadeonSI: Multithreaded shader compilation
Timothy Arceri
timothy.arceri at collabora.com
Wed Jul 13 03:19:09 UTC 2016
On Sun, 2016-07-10 at 01:24 +0200, Marek Olšák wrote:
> On Sat, Jul 9, 2016 at 11:02 PM, Grazvydas Ignotas <notasas at gmail.com
> > wrote:
> > On Sat, Jul 9, 2016 at 6:49 PM, Marek Olšák <maraeo at gmail.com>
> > wrote:
> > > On Fri, Jul 8, 2016 at 3:20 AM, Timothy Arceri
> > > <timothy.arceri at collabora.com> wrote:
> > > > On Wed, 2016-06-29 at 18:32 +0200, Marek Olšák wrote:
> > > > > Hi,
> > > > >
> > > > > This series implements basic multithreaded LLVM shader
> > > > > compilation
> > > > > in a minimally invasive way. (+51 lines of code in the main
> > > > > patch)
> > > > >
> > > > > It doesn't help on-demand shader compilation, but it does
> > > > > improve
> > > > > loading and startup times by being able to saturate up to 4
> > > > > CPU cores
> > > > > if given enough shaders to compile. A proper shader cache
> > > > > might make
> > > > > this redundant, but we don't have that now.
> > > >
> > > > Have you had a chance to take a look at my recent shader cache
> > > > work
> > > > [1]? The glsl work is mostly done I'm now just cleaning up and
> > > > fixing
> > > > up the fallback paths for when we have a cache miss. I'm not
> > > > sure if
> > > > you guys will need the fallback path or not since you have a
> > > > way
> > > > dealing with varients, you might be ok which would make things
> > > > much
> > > > simpler.
> > > >
> > > > Anyway I don't think that getting something up and running
> > > > would be a
> > > > large amount of work. Most of your time would likely be spent
> > > > tweaking
> > > > the glsl to tgsi path to haveit to skip over the IR conversion
> > > > and just
> > > > grab the required state.
> > > >
> > > > The two files you would find most interesting would be:
> > > >
> > > > src/compiler/glsl/shader_cache.cpp
> > > > src/mesa/drivers/dri/i965/brw_shader_cache.c
> > > >
> > > > Everything else (besided the cache code itself) is pretty much
> > > > just
> > > > wiring things up to be called at the right time, or skipped.
> > > >
> > > >
> > > > [1] https://github.com/tarceri/Mesa_arrays_of_arrays.git
> > > > shader-cache20
> > >
> > > No, I haven't looked at it.
> > >
> > > An on-disk shader cache would be nice to have, but it's not a
> > > pressing
> > > thing since we don't really get many apps compiling shaders
> > > before
> > > drawing. UE4 was doing that with GL 4.1 but not GL4.3, so that's
> > > cool.
> > > The only app where a shader cache would help a lot is Borderlands
> > > 2.
> > > Sadly, the same game would benefit from OpenGL driver
> > > multithreading
> > > even more, because once all shaders are compiled, the game is
> > > CPU-bound most of the time with frame rates as low as 20 on Core
> > > i5
> > > 3570.
> > >
> > > All in all, I need more user feedback to be sure if the on-disk
> > > shader
> > > cache would make any difference outside of Borderlands 2.
> >
> > Having played with Timothy's i965 cache I can tell you there are
> > certainly more games affected. One of them is The Talos Principle
> > where difference is quite large (Serious Sam 3 probably too as the
> > engine is the same). Another one is Left 4 Dead 2 where it solves
> > "do
> > something the first time" stalls. There are some bugs about older
> > Unreal engine games:
> > https://bugs.freedesktop.org/show_bug.cgi?id=96790
> > https://bugs.freedesktop.org/show_bug.cgi?id=92806
> >
> > I do note that from subjective user's point of view, i965's shader
> > compilation seems to be a lot slower than r600g's. I don't know how
> > that all compares to radeonsi though as I have no way to test that.
>
> Yeah, Talos looks like it compiles shaders on demand, but it's pretty
> fast and the majority of shaders are compiled during the first frame,
> which is like part of the loading. Any later compilations are barely
> noticeable even though our shader compiler is quite slow. I'd say the
> problem is that i965 and r600 have shader variants and do a lot of
> recompilations while radeonsi doesn't.
So I finally got around to setting up my new polaris card on fedora. I
was curious to see how Talos performed compared to i965 as I was pretty
sure the compiling/linking was not just to do with variants and should
be noticable regardless of hardware.
The results looks the same to me, there is really bad jerkyness for
around 5-10 seconds as you walk when the game first loads from a
contine point in the big hall area (the place where you often start a
continue from as this is where it auto saves). Obviously these get
cached in memory but you see them everytime you start the game and they
are very noticable on my machine at least.
Anyway obviously up to you what you work on but I don't think an on
disk shader cache for radeonsi would be a wasted effort.
Tim
> Even though Talos apparently
> compiles shaders on demand, there are only 324 unique GLSL program
> objects during the first few puzzles, which is pretty much nothing
> compared Borderlands 2 with its ~4000 unique program objects.The
> recent radeonsi shader improvements (no recompilations + in-memory
> shader cache) made shader compilation in Borderlands 2 seem so much
> faster that an on-disk shader cache is no longer an absolute
> necessity.
>
> I'm open to having the on-disk shader cache for radeonsi just because
> of Borderlands 2. I have yet to discover other games that would
> benefit from it.
>
> Marek
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list