[Mesa-dev] [RFC PATCH 00/16] A new IR for Mesa

Mon Aug 18 08:47:45 PDT 2014

On 18/08/14 14:21, Marek Olšák wrote:
> On Mon, Aug 18, 2014 at 2:44 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 16.08.2014 02:12, schrieb Connor Abbott:
>>> I know what you might be thinking right now. "Wait, *another* IR? Don't
>>> we already have like 5 of those, not counting all the driver-specific
>>> ones? Isn't this stuff complicated enough already?" Well, there are some
>>> pretty good reasons to start afresh (again...). In the years we've been
>>> using GLSL IR, we've come to realize that, in fact, it's not what we
>>> want *at all* to do optimizations on. Ian has done a talk at FOSDEM that
>>> highlights some of the problems they've run into:
>>>
>>> https://urldefense.proofpoint.com/v1/url?u=https://video.fosdem.org/2014/H1301_Cornil/Saturday/Three_Years_Experience_with_a_Treelike_Shader_IR.webm&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0A&m=iXhCeAYmidPDc1lFo757Cc9V0PvWAN4n3X%2Fw%2B%2F7Lx%2Fs%3D%0A&s=f103fb26bf53eee64318a490517d1ee9ab88ecd29fcdbe49d54b5a27e7581c2e
>>>
>>> But here's the summary:
>>>
>>> * GLSL IR is way too much of a memory hog, since it has to make a new
>>> variable for each temporary the compiler creates and then each time you
>>> want to dereference that temporary you need to create an
>>> ir_dereference_variable that points to it which is also very
>>> cache-unfriendly ("downright cache-mean!").
>>>
>>> * The expression trees were originally added so that we could do
>>> pattern matching to automatically optimize things, but this turned out
>>> to be both very difficult to do and not very helpful. Instead, all it
>>> does is add more complexity to the IR without much benefit - with SSA or
>>> having proper use-def chains, we could get back what the trees give us
>>> while also being able to do lots more optimizations.
>>>
>>> * We don't have the concept of basic blocks in GLSL IR, which makes a
>>> lot of optimizations harder because they were originally designed with
>>> basic blocks in mind - take, for example, my SSA series. I had to map a
>>> whole lot of concepts that were based on the control flow graph to this
>>> tree of statements that GLSL IR uses, and the end result wound up
>>> looking nothing at all like the original paper. This problem gets even
>>> worse for things like e.g. Global Code Motion that depend upon having
>>> the dominance tree.
>>>
>>> I originally wanted to modify GLSL IR to fix these problems by adding
>>> new instruction types that would address these issues and then
>>> converting back and forth between the old and the new form, but I
>>> realized that fixing all the problems would basically mean a complete
>>> rewrite - and if that's the case, then why don't we start from scratch?
>>> So I took Ken's suggestions and started designing, and then at Intel
>>> over the summer started implementing, a completely new IR which I call
>>> NIR that's at a lower level than GLSL IR, but still high-level enough to
>>> be mostly device-independant (different drivers may have different
>>> passes and different ways of lowering e.g.  matrix multiplies) so that
>>> we can do generic optimizations on it. Having support for SSA from the
>>> beginning was also a must, because lots of optimisations that we really
>>> want for cleaning up DX9-translated games are either a lot easier in or
>>> made possible by SSA. I also made the decision for it to be typeless,
>>> because that's what the cool kids are all doing :) and for a
>>> lower-level, flat IR it seemed like the thing to do (it could have gone
>>> either way, though). So the key design points of NIR (pronounced either
>>> like "near" as in "NIR is near!" or to rhyme with "burr") are:
>>>
>>> * It's flat (no expression trees)
>>>
>>> * It's typeless
>>>
>>> * Modifiers (abs, negate, saturate), swizzles, and write masks are part
>>> of ALU instructions
>>>
>>> * It includes enough GLSL-like things (variables that you can load from
>>> or store to, function calls) to be hardware-agnostic (although we don't
>>> have a way to represent matrix multiplies right now, but that could
>>> easily be added) to be able to do optimizations at a high level, while
>>> having lowering passes that convert variables to registers and
>>> input/output/uniform loads/stores that will open up more opportunities
>>> for optimization and save memory while being more hardware-specific.
>>>
>>> * Control flow consists of a tree of if statements and loops, like in
>>> GLSL IR, except the leaves of the tree are now basic blocks instead of
>>> instructions. Also, each basic block keeps track of its successors and
>>> predecessors, so the control flow graph is explicit in the IR.
>>>
>>> * SSA is natively supported, and SSA uses point directly to the SSA
>>> definition, which means that the use-def chains are always there, and
>>> def-use chains are kept by tracking the set of all uses for each
>>> definition.
>>>
>>> * It's written in C.
>>>
>>> (see the README in patch 3 and nir.h in patch 4 for more details)
>>>
>>> Some things that are missing or could be improved:
>>>
>>> * There's currently no alias tracking for inputs, outputs, and uniforms.
>>> This is especially important for uniforms because we don't pack them
>>> like we pack inputs and outputs.
>>>
>>> * We need a way to represent matrix multiplies so that we can do
>>> matrix-flipping optimizations in NIR (currently GLSL IR does this for
>>> us).
>>>
>>> * I'm not entirely happy about how we represent loads and stores in the
>>> IR. Right now, they're intrinsics, but that means we need a different
>>> intrinsic for each size and combination of arguments (indirect vs. not
>>> indirect, etc.) and we might run into a combinatorial explosion problem
>>> in the future, so we might need to make separate load/store instructions
>>> like what I did for textures.
>>>
>>> * Right now, we only have a pass that lowers variables for scalar
>>> backends. We need to write a similar pass for vector backends that uses
>>> std140 packing or something similar, as well as porting
>>> lower_ubo_reference to NIR and changing it to output offsets in the
>>> hardware-native units instead of bytes.
>>>
>>> * We'll need to write a pass that splits up vector expressions for
>>> scalar backends.
>>
>
[...]

 > However, let's face it, gallium is stuck with TGSI
> forever. Switching to another IR in Gallium is insane (unless you can
> rewrite all drivers and state trackers for it - let's be realistic, it
> just won't happen). The next GL NG IR, whatever it is going to be,
> will be just as important as the IR of ARB_vertex_program. TGSI will
> continue to be the major IR whether we like or not.

No, switching to another IR in Gallium is not insane if approached the 
right way.   We already allow multiple IRs in gallium, so all it take to 
move to another IR is to having helper modules to do the translation:

- a pipe driver helper module that would translate new IR into TGSI, for 
the sake of old pipe drivers

- a state tracker helper module that would translate TGSI into the new 
IR, for the sake of old state trackers.

Once these are in place, all development effort to go on to 
improving/leveraging the new IR.  We could deprecate TGSI when it would 
have few users.

I also want to highlight there are two kinds of "IR".

a) one thing is a shader IR that communicates a shader between an 
interface (be it application interface

        High-level lang.             IR               GPU code
   App -----------------> front-end ----> back-end ---------->  GPU

b) another is a shader IR that is meant to faciliate code 
transformations (ie optimizations):

       opt. pass     opt. pass
    IR ---------> IR ---------> IR --> ....

Gallium needs a), but not necessarily b).  An optimizing compiler needs 
b) internally but necessarily a).

An IR that achieves both a) and b) is not impossible, but it is a more 
difficult trade-off.

My point is: it's OK to use a different IR in Gallium interface, 
provided that the IR used in Gallium's interface doesn't lose 
information any information.

On the other hand, there is a lot of momentum behind LLVM "inspired" 
IRs, like SPIR.  So there would probably be alot of synergy if LLVM 
became Gallium's "standard" IR.

Jose