[Mesa-dev] [RFC] ACO: A New Compiler Backend for RADV

Wed Jul 3 17:23:22 UTC 2019

Hello everyone,
as some of you already know, for a little over one year I have been 
working on
an alternate compiler backend for the RADV driver. At the beginning, Bas 
Nieuwenhuizen
helped out a lot, and since last December Rhys Perry has also helped 
tremendously
working full-time on ACO.

In this RFC, I'd like to share with you our motivation for this work as 
well as some
implementation details and the current state.

The current development branch of ACO with full commit history can be 
found at
  https://github.com/daniel-schuermann/mesa/tree/backend
while a slightly more stable branch is (until upstream) maintained at
  https://github.com/daniel-schuermann/mesa/tree/master/

For initial results, I'd like to refer to this post:
  https://steamcommunity.com/games/221410/announcements/detail/1602634609636894200

Feel free to ask questions or just add your thoughts.

Motivation:
The RADV driver currently uses LLVM as backend for shader compilation. 
There are some
shortcomings regarding LLVM's compilation of graphic shaders which need 
to be addressed.
The idea and motivation of ACO is the expectation that it would be less 
work long-term to
re-write the backend than to fix LLVM.
Without going to much into detail here, the main shortcomings of LLVM 
are compile times and
the handling of control flow, which has lead to some serious bugs in the 
past.
Additionally, we were able to implement a more aggressive divergence 
analysis and having more
precise control over register pressure which can ultimately lead to more 
efficient binaries.
A welcome side-effect is an integrated development process without 
having to deal with LLVM's
release cycles.

Implementation:
What started as a proof-of-concept and interesting experimental platform 
advanced quickly to
a full-featured backend capable of replacing LLVM in the RADV driver in 
the near future.
ACO is based on principles from recent compiler research results and 
tries to avoid the issues
we are experiencing with LLVM. The IR is fully SSA-based and also does 
register allocation on
SSA which allows to precisely pre-calculate the register demand of a shader.
We implemented the notion of a logical and linear (or physical) control 
flow graph which let us
quickly and easily add horizontal reductions (thx Connor Abbott) - a 
problem which took almost
two years and various attempts to solve in LLVM, still being far slower 
than our solution.
ACO is written with just-in-time compilation in mind and uses data 
structures which are quick to
iterate. Avoiding pointer-based data structures like linked lists and 
the common def-use chains
leads to much faster compile times. ACO is fully written in C++.

Current State:
Currently, ACO only supports FS and CS, only on VI+ and only on 32bit 
and some 64bit operations.
It misses VGPR spilling (we didn't need it on any tested game so far) 
and has a theoretical
issue (in case a divergent/uniform memory write is followed by a memory 
read of the other kind)
which needs a proper alias analysis to resolve.
Nevertheless, ACO is able to correctly compile the shaders of (almost?) 
all games including
complex ones like Shadow of the Tomb Raider and Wolfenstein II.
We'd like to upstream ACO as experimental driver option to ease 
development synchronization,
get more feedback, but ultimately also to give access to the performance 
enhancements we achieved.

To ease the upstreaming efforts, we created MRs for all changes to the 
NIR infrastructure.
After these MRs have went through the reviewing process and landed, we 
are going to create a
single MR for ACO. Meanwhile, we will refactor the coding style and 
squash the commits.
Please review!

nir: lowering shared memory derefs with nir_lower_io_explicit():
  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/622

radv/radeonsi: Use NIR barycentric interpolation intrinsics:
  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/906

WIP: nir: add divergence analysis pass:
  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/918

WIP: nir: lower int64 in a single pass:
  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1224

nir: A Couple of Comparison Optimizations:
  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1228

radv: disable lower_sub:
  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1236

nir: change nir_lower_io_to_vector() so that it can always vectorize FS 
outputs:
  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1238

nir/lower_idiv: add new urcp path:
  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1239

nir: add a memory load/store vectorization and combining pass:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1240

nir: replace nir_move_load_const() with nir_opt_sink():
  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1241

We welcome any testing feedback and bug reports at
  https://github.com/daniel-schuermann/mesa/issues

Thanks,
Daniel Schürmann
Rhys Perry
Bas Nieuwenhuizen