[Mesa-dev] [RFC PATCH 00/16] A new IR for Mesa

Ian Romanick idr at freedesktop.org
Fri Aug 22 13:37:58 PDT 2014


On 08/18/2014 05:44 AM, Roland Scheidegger wrote:
> Am 16.08.2014 02:12, schrieb Connor Abbott:
>> I know what you might be thinking right now. "Wait, *another* IR? Don't
>> we already have like 5 of those, not counting all the driver-specific
>> ones? Isn't this stuff complicated enough already?" Well, there are some
>> pretty good reasons to start afresh (again...). In the years we've been
>> using GLSL IR, we've come to realize that, in fact, it's not what we
>> want *at all* to do optimizations on. Ian has done a talk at FOSDEM that
>> highlights some of the problems they've run into:
>>
>> https://urldefense.proofpoint.com/v1/url?u=https://video.fosdem.org/2014/H1301_Cornil/Saturday/Three_Years_Experience_with_a_Treelike_Shader_IR.webm&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0A&m=iXhCeAYmidPDc1lFo757Cc9V0PvWAN4n3X%2Fw%2B%2F7Lx%2Fs%3D%0A&s=f103fb26bf53eee64318a490517d1ee9ab88ecd29fcdbe49d54b5a27e7581c2e
>>
>> But here's the summary:
>>
>> * GLSL IR is way too much of a memory hog, since it has to make a new
>> variable for each temporary the compiler creates and then each time you
>> want to dereference that temporary you need to create an
>> ir_dereference_variable that points to it which is also very
>> cache-unfriendly ("downright cache-mean!").
>>
>> * The expression trees were originally added so that we could do
>> pattern matching to automatically optimize things, but this turned out
>> to be both very difficult to do and not very helpful. Instead, all it
>> does is add more complexity to the IR without much benefit - with SSA or
>> having proper use-def chains, we could get back what the trees give us
>> while also being able to do lots more optimizations.
>>
>> * We don't have the concept of basic blocks in GLSL IR, which makes a
>> lot of optimizations harder because they were originally designed with
>> basic blocks in mind - take, for example, my SSA series. I had to map a
>> whole lot of concepts that were based on the control flow graph to this
>> tree of statements that GLSL IR uses, and the end result wound up
>> looking nothing at all like the original paper. This problem gets even
>> worse for things like e.g. Global Code Motion that depend upon having
>> the dominance tree.
>>
>> I originally wanted to modify GLSL IR to fix these problems by adding
>> new instruction types that would address these issues and then
>> converting back and forth between the old and the new form, but I
>> realized that fixing all the problems would basically mean a complete
>> rewrite - and if that's the case, then why don't we start from scratch?
>> So I took Ken's suggestions and started designing, and then at Intel
>> over the summer started implementing, a completely new IR which I call
>> NIR that's at a lower level than GLSL IR, but still high-level enough to
>> be mostly device-independant (different drivers may have different
>> passes and different ways of lowering e.g.  matrix multiplies) so that
>> we can do generic optimizations on it. Having support for SSA from the
>> beginning was also a must, because lots of optimisations that we really
>> want for cleaning up DX9-translated games are either a lot easier in or
>> made possible by SSA. I also made the decision for it to be typeless,
>> because that's what the cool kids are all doing :) and for a
>> lower-level, flat IR it seemed like the thing to do (it could have gone
>> either way, though). So the key design points of NIR (pronounced either
>> like "near" as in "NIR is near!" or to rhyme with "burr") are:
>>
>> * It's flat (no expression trees)
>>
>> * It's typeless
>>
>> * Modifiers (abs, negate, saturate), swizzles, and write masks are part
>> of ALU instructions
>>
>> * It includes enough GLSL-like things (variables that you can load from
>> or store to, function calls) to be hardware-agnostic (although we don't
>> have a way to represent matrix multiplies right now, but that could
>> easily be added) to be able to do optimizations at a high level, while
>> having lowering passes that convert variables to registers and
>> input/output/uniform loads/stores that will open up more opportunities
>> for optimization and save memory while being more hardware-specific.
>>
>> * Control flow consists of a tree of if statements and loops, like in
>> GLSL IR, except the leaves of the tree are now basic blocks instead of
>> instructions. Also, each basic block keeps track of its successors and
>> predecessors, so the control flow graph is explicit in the IR.
>>
>> * SSA is natively supported, and SSA uses point directly to the SSA
>> definition, which means that the use-def chains are always there, and
>> def-use chains are kept by tracking the set of all uses for each
>> definition.
>>
>> * It's written in C.
>>
>> (see the README in patch 3 and nir.h in patch 4 for more details)
>>
>> Some things that are missing or could be improved:
>>
>> * There's currently no alias tracking for inputs, outputs, and uniforms.
>> This is especially important for uniforms because we don't pack them
>> like we pack inputs and outputs.
>>
>> * We need a way to represent matrix multiplies so that we can do
>> matrix-flipping optimizations in NIR (currently GLSL IR does this for
>> us).
>>
>> * I'm not entirely happy about how we represent loads and stores in the
>> IR. Right now, they're intrinsics, but that means we need a different
>> intrinsic for each size and combination of arguments (indirect vs. not
>> indirect, etc.) and we might run into a combinatorial explosion problem
>> in the future, so we might need to make separate load/store instructions
>> like what I did for textures.
>>
>> * Right now, we only have a pass that lowers variables for scalar
>> backends. We need to write a similar pass for vector backends that uses
>> std140 packing or something similar, as well as porting
>> lower_ubo_reference to NIR and changing it to output offsets in the
>> hardware-native units instead of bytes.
>>
>> * We'll need to write a pass that splits up vector expressions for
>> scalar backends.
> 
> Interesting. I think conceptually this makes sense (I'm far from an
> expert in that area though), though I wonder if we actually even should
> have our own IR? GL NG will specify a common shading language
> intermediate representation, and I suspect there'd be benefits if we'd
> just use that? Obviously I don't have any idea how that's going to look
> like but maybe it will be just like SPIR (which is llvm ir essentially)?
> Granted a lot of stuff in your isa is conceptually similar (such as
> being based around basic blocks).

It is intended to be an interchange language, nothing more.  If I had to
guess, I'd say that every driver will read in the GL binary, convert it
to its own internal representation, and go about its business.

> Roland
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



More information about the mesa-dev mailing list