[Mesa-dev] Mesa IR as a list of instructions

Wed May 28 20:45:19 PDT 2014

On Wed, May 28, 2014 at 8:50 PM, Eric Anholt <eric at anholt.net> wrote:
> Connor Abbott <cwabbott0 at gmail.com> writes:
>
>> On Wed, May 28, 2014 at 2:37 PM, Eric Anholt <eric at anholt.net> wrote:
>>> Here's a series I started back in January as a little experiment.
>>> Basically, I feel guilty for pushing GLSL IR into the driver, and wish I'd
>>> just fixed up Mesa IR back in the day.  But, given that we're still
>>> feeding Mesa IR into drivers as well (ARB programs and fixed function
>>> vertex programs), it made me think: What if I fixed it up now, and got
>>> Mesa IR to the point that we could just garbage collect the GLSL IR input
>>> paths to drivers?
>>>
>>> Mesa IR has a bunch of weaknesses that need to get sorted out if it's
>>> going to be useful:
>>>
>>> - It's a single giant array of instructions, making modifications of the
>>>   instruction stream (instruction lowering, optimization, etc.) more
>>>   expensive in code and CPU time than it should be.
>>> - It doesn't have any variable declarations, so if you have dynamic array
>>>   indexing, optimization just shuts down (plus, no annotation on the
>>>   temps, so debugging is irritating).
>>> - It doesn't have integer instructions or anything else post-GLSL-1.30.
>>> - The optimization passes for it are totally ad-hoc and fairly weak.
>>> - It's not SSA.
>>>
>>> I'm interested in fixing all of these.  How do people feel about this
>>> goal?
>>
>> I don't know much about Mesa IR, but this seems like a huge pile of
>> work... is this any easier than just making GLSL IR flatter, like with
>> my flatland idea that I posted a while ago? Also, iirc, there was an
>> effort a while ago to make ARB programs go through GLSL IR but it was
>> abandoned - why was it so difficult? I'm skeptical that basically
>> bringing an older IR from 1999 to 2014 is going to be less effort than
>> modifying a newer one that had some good ideas and some bad ones.
>
> The thing is that I want basically all the ways that GLSL IR is
> different from Mesa IR to go back to Mesa IR, except for size
> declarations on temps.  It was a mistake.  From there I want new stuff,
> but building new stuff doesn't feel that hard when it's incremental
> changes.

Well, I think that in my "perfect IR" we'd still have some parts of
GLSL IR. I'd have the if statements and loops in a tree structure
(more on that below), plus I think that the ir_variable stuff is
useful for a few things:

1. Global variables, including uniforms, ins and outs where we have to
match names and types
2. Function parameters where we have to do the same thing
3. Arrays and structures

Then we would have instructions for loading and storing ir_variables
(or for arrays and structures, a single vector in the ir_variable),
which later on would be lowered UBO-style. All instructions would be
in SSA form, with each use pointing directly to the corresponding
definition; we could introduce registers for backends that don't
support SSA but the majority of the compiler would ignore that. The
AST -> IR code would output SSA instructions where it would now output
expressions, and then after structure splitting we do the conversion
to SSA, which also acts as a pass that eliminates ir_variables. Now
*this* is something that can be easily done incrementally, since the
new stuff would very easily interoperate with the old stuff; it would
be easy to write a pass that converts between the old & new form.

>
> i965 faces a non-incremental change if it was to choose to follow this
> path: flipping everything over to brw_*_[vf]p.cpp.  That doesn't look
> pleasant, and it has some challenges (like its MOV-happy swizzle
> handling in brw_fs_fp.cpp).

But, as Ken pointed out, it would probably be a lot more work to bring
the Mesa IR path up to par with the current GLSL IR path...

>
>> By the way, correct me if I'm wrong, but I think you're still missing
>> a bullet point there. You've changed Mesa IR to be a list of
>> instructions, but it's still just one list. Ideally, you'd have both
>> the graph structure (the basic blocks) and the tree structure (if
>> statements and loops) available in the IR, since both of them are
>> going to be needed for different reasons, especially with converting
>> to SSA (or doing basically any non-trivial optimization). As of now,
>> GLSL IR has the former (I think Ian had some patches to introduce the
>> latter through an analysis pass as part of an experiment with
>> introducing def-use chains to GLSL IR...) but Mesa IR has neither - in
>> the end, I think it's much more practical to have a tree of if
>> statements and loops like in GLSL IR than a simple list of
>> instructions. Again, I don't think it's less effort than just fixing
>> GLSL IR.
>
> We absolutely need a CFG handy to do optimization well, but I disagree
> on having the tree structure in the IR for flow-control constructs.  I
> recently wrote a CFG constructor for GLSL IR, and found that trying to
> correctly walk the tree (particularly for dataflow analysis, but also to
> an extent for just building a CFG) was way harder than it was in i965's
> IR and I definitely don't trust the resulting code anything like I do
> for the i965 stuff (you can find that work on the deadcode branch of my
> tree).

I don't see how it was that much harder. You guys basically already do
the exact same thing you did (walking the tree to build a CFG) for
i965 in brw_cfg.cpp; in fact, you can basically copy-n-paste the logic
there to GLSL IR to do what your live variables code did. In the end,
a flat list of instructions has no advantage over a tree because
everything you can do with the list, you can automatically do with the
tree in an equivalent manner - if you wanted to walk the list forwards
or backwards, just visit the tree in a depth-first manner, something
that the visitor classes let you do very easily already. With a tree,
however, oftentimes you can write things in a more natural recursive
way. Take, for example, your example with figuring out the CFG. In
GLSL IR, you could do something like:

visit(ir_if *)
{
   struct basic_block *then_bb = ...
   then_bb->first_instr = first instr of then (unless it's a
control-flow statemtent)
   link this->cur_bb to then_bb

   struct basic_block *else_bb = NULL;
   if (else isn't empty) {
      else_bb = ...
      else_bb->first_instr = first instr of else (unless it's a
control-flow statement)
       link this->cur_bb to else_bb
   }

   struct basic_block *exit_bb = ...
   exit_bb->first_instr = first instr after if

   this->cur_bb = then_bb
   visit then statements...
   link this->cur_bb and exit_bb

   if (else isn't empty) {
      this->cur_bb = else_bb
      visit else statements...
      link this->cur_bb and exit_bb
   }

   this->cur_bb = exit_bb
}

visit(ir_loop *)
{
   struct basic_block *old_entry_bb = this->entry_bb
   struct basic_block *old_exit_bb = this->exit_bb

   this->entry_bb = ...
   this->entry_bb->first_instr = first instr of loop
   link this->cur_bb to this->entry_bb

   this->exit_bb = ...
   this->exit_bb->first_instr = first instr after loop

   this->cur_bb = this->entry_bb
   visit loop statements...

   this->cur_bb = this->exit_bb
   this->entry_bb = old_entry_bb
   this->exit_bb = old_exit_bb
}

visit(ir_loop_jump *)
{
   this->cur_bb->last_instr = this instr
   if (is break)
      link this->cur_bb and this->exit_bb
   else
      link this->cur_bb and this->entry_bb
}

visit(ir_assignment or ir_call)
{
   this->cur_bb->last_instr = this instr
}

This is pseudocode, but it's pretty close to what I would actually
write. It does what the big switch in brw_cfg.cpp does in about half
the code (150 lines vs. ~70-80 lines), plus it doesn't require the
makeshift stacks to store basic blocks. Also, it's probably more
efficient since it uses the system stack which doesn't have to call
malloc() every time something gets pushed on it. And beauty is in the
eye of the beholder, but I think that I can trust this code more than
the i965 code (after it's been tested, of course), since rather than
hiding the recursive nature of what you're doing, it puts it front and
center. Another way to think about it is that even with a flat list of
instructions, you always have a tree structure - the IF, ELSE, ENDIF
and WHILE and DO match up to form a tree - but every time you want to
recurse through the tree, you wind up having to manage state like the
entry and exit blocks of the loop by hand instead of simply letting
the compiler do it, sort of like rolling your own state machine vs.
using coroutines. So thanks for helping me with an example to prove my
point :)