[Mesa-dev] [PATCH 0/5] nvc0: better instruction pipelining for Maxwell GPUs

Jan Vesely jan.vesely at rutgers.edu
Fri Jan 6 10:53:45 UTC 2017


On Fri, 2016-12-23 at 00:15 +0100, Samuel Pitoiset wrote:
> Hello,
> 
> This series makes use of the scheduling control code in order to improve the
> instruction pipelining on Maxwell GPUs.
> 
> Starting with the Kepler architecture, where a control instruction has to be
> inserted every 7 instructions, Maxwell added additional control codes and the
> control instruction now has to be every 3 instructions. Maxwell control codes
> are really powerful and well documented [1]. By the way, I would like to thank
> Scott Gray who did an awesome reverse engineering work, although I had to
> figure out the missing parts myself.
> 
> On Maxwell, control codes are mainly used for setting the number of stall
> counts and for producing/consumming dependency barriers in order to avoid
> hazards. I'm not going to explain in details how do they work because the
> documentation is quite good and because I added explanations here and there
> in the source code. But the main thing to understand is that the previous
> control code used by default (ie. st 0x0) means "wait for all dependencies
> and stall the pipeline for 15 cycles which is the maximum".
> Which is quite bad...
> 
> Now, let's have a look at the (impressive) performance improvements. :-)
> I measured on a GeForce GTX 750 Ti (GM107) reclocked to the highest perf level,
> with and without the control codes (NV50_PROG_SCHED=0/1).
> 
> app: number of FPS without -> number of FPS with (+gain%)
> 
> FurMark:                   13  ->  42  (+223%)
> Pixmark Piano:             2   ->  7   (+250%)
> Pixmark Volposion:         6   ->  20  (+233%)
> Julia F32:                 61  ->  219 (+259%)
> LightMarks:                352 ->  685 (+94%)
> Heaven (low):              51  ->  102 (+100%)
> Heaven (ultra):            14  ->  27  (+93%)
> Valley (low):              30  ->  68  (+126%)
> Valley (ultra):            18  ->  39  (+100%)
> Talos (low):               32  ->  50  (+56%)
> Talos (ultra):             7   ->  14  (+100%)
> Shadow of Mordor (lowest): 13  ->  20  (+53%)
> 
> That's it! I think it's enough to understand the power of Maxwell control
> codes. We may get additional numbers from Phoronix (wink, wink, Michael).
> As I said in the main patch, the control codes can be disabled with
> 'export NV50_PROG_SCHED=0'.
> 
> Now, let's have a look how nouveau performs compared to NVIDIA's blob.
> 
> FurMark:                   42  ->  59   (+40%)
> Pixmark Piano:             7   ->  13   (+85%)
> Pixmark Volposion:         20  ->  42   (+110%)
> Julia F32:                 219 ->  351  (+60%)
> LightMarks:                685 ->  1192 (+74%)
> Heaven (low):              102 ->  144  (+41%)
> Heaven (ultra):            27  ->  46   (+70%)
> Valley (low):              68  ->  94   (+38%)
> Valley (ultra):            39  ->  60   (+53%)
> Talos (low):               50  ->  128  (+156%)
> Talos (ultra):             14  ->  30   (+114%)
> Shadow of Mordor (lowest): 20  ->  77   (+285%)

I see + 45% and + 33% for my gm107m (prime) for Valley and Heaven
(1024x768, medium). which pushes above the integrated skylake iGPU
performance. There are visual artifacts in both demos, but they appear
the same with and without these patches.

Tested-by: Jan Vesely <jan.vesely at rutgers.edu>

regards,
Jan

> 
> Nouveau is still far away from the blob, but now I think Maxwell is actually 
> in roughly the same shape as Kepler in terms of performance and features.
> Speaking about this, I will enable OpenGL 4.3 on Maxwell in a separate patch,
> later on.
> 
> The overhead at compile time added by this seris is rather small. For a full
> shader-db run with my private repository of shaders, it takes approximately
> 208s for compiling 25k shaders before the series and approximately 211s after.
> Less than 2% of overhead and it's comparable to a full shader-db run on Kepler.
> 
> No regressions with both piglit and dEQP (tested multiple times) and all
> benchmarks/games I have tried render fine and seem to be quite stable.
> 
> Due to a lack of time, some parts are still left to do and some others could
> be improved. With the following ideas implemented I'm pretty sure we can
> improve performance significantly.
> 
> * Add support for the yield flag. This seems to be a hint to the hardware for
>   improving how the work is balanced between the warps. I didn't figure out
>   how and where to use it without breaking a bunch of things. Need time and
>   patience.
> 
> * Add support for dual-issue, the rules are pretty different than Kepler 
>   especially because of the dependency barriers. Note that the yield flag has
>   to be set, otherwise the hardware won't dual-issue and in fact it will wait
>   for all dependencies (ie. st 0x0) which is really different that what you
>   are looking for.
> 
> * Reduce stall counts. A bunch of instructions have a read latency which is the
>   number of cycles before they can actually read the sources. This should be
>   fairly easy to implement but will require some reverse engineering to
>   completely understand the idea.
> 
> This is my last contribution for the Nouveau driver for a while because I have
> been hired by Valve to work on radeonsi. Do not expect such perf improvements
> with radeonsi because it already performs really well, unlike Nouveau. But
> with time and patience we can do better. :-)
> 
> This series is also available from my fdo account:
> https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=gm107_scheduler
> 
> Please, review!
> Thanks.
> 
> [1] https://github.com/NervanaSystems/maxas/wiki/Control-Codes
> 
> Samuel Pitoiset (5):
>   nv50/ir: do not insert texture barriers on gm107
>   nv50/ir: improve instruction pipelining on gm107
>   nv50/ir: use sched control codes for gm107 builtins
>   nvc0: use sched control codes for gm107 blitter shader
>   nvc0: use sched control codes for gm107 MP counters code
> 
>  src/gallium/drivers/nouveau/codegen/lib/gm107.asm  |  40 +-
>  .../drivers/nouveau/codegen/lib/gm107.asm.h        |  40 +-
>  .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 771 ++++++++++++++++++++-
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      |   3 +-
>  .../nouveau/codegen/nv50_ir_target_gm107.cpp       | 253 +++++++
>  .../drivers/nouveau/codegen/nv50_ir_target_gm107.h |   7 +
>  .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c        |  88 +--
>  src/gallium/drivers/nouveau/nvc0/nvc0_surface.c    |  20 +-
>  8 files changed, 1127 insertions(+), 95 deletions(-)
> 

-- 
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170106/b031463b/attachment.sig>


More information about the mesa-dev mailing list