[Mesa-dev] Flatland

Wed Feb 12 03:35:24 PST 2014

On 02/06/2014 09:34 PM, Connor Abbott wrote:
> Hi,
> 
> So I believe that we can all agree that the tree-based representation
> that GLSL IR currently uses for shaders needs to go.
[snip]

Hi Connor!

I agree 100%.  The current tree IR is nice as a simplified AST of sorts,
but it's really not at all suitable for writing a robust optimizing
compiler.  It's served us well, but we definitely need something else
for the future.

We've actually implemented a number of optimizations in the i965
backend, rather than the main compiler, simply because working with the
trees was too awkward.  Moving to a flat IR will make implementing a lot
of the optimization techniques much easier.

The other thing that's hurt us a lot is the lack of UD-chains, basic
block boundaries, and other "analysis" data.  We can't ask "what
generated this value?" without walking through instruction lists and
possibly trees.

We all agree that moving to SSA makes a ton of sense, as it either
implicitly provides these things, or makes them much easier to do.

I apologize for not having read through your patches yet - I've been
pretty swamped as of late, but I'm definitely interested.

> - It turns out that the original advantage of a tree-based IR, to be
> able to automatically generate pattern-matching code for optimizing
> certain code patterns, only really matters for CPU's with weird
> instruction sets with lots of exotic instructions; GPU's tend to be
> pretty regular and consistent in their ISA's, so being able to
> pattern-match with trees doesn't help us much here.

Well, the cool thing is that a flat IR in SSA form doesn't actually lose
this advantage.  It's actually really easy to get a tree-view when you
need one...

I spent a while dreaming up a new flat SSA IR about a week ago.  Here
are a few notes and thoughts I came up with...

Instructions can be broken down into a few categories:

* Flow Control
* "Memory" writes
  - Write to shader output (variable's location, value)
  - Write to image         (image location, coordinates, ..., value)
  - Write to array element (array location, subscript, value)
* Value generators
  - "Memory" reads
    : Load shader input
    : Load from uniform
    : Load from UBO
    : Load from texture
    : Load from image
    : Load from array element
  - Expressions (transform one or more existing values into a new value)
    - Usual unops/binops/etc.
    - Vector insert
    - Vector extract
  - Phi nodes (possibly considered expressions?)

For temporaries, we don't have traditional "variables" that are declared
and assigned.  Instead, we have "values", which come into being when
generated by an instruction.  There's a one-to-one correspondence
between a value and the generating instruction; if the instruction is
eliminated, so is the value.  We could represent values as pointers to
their generating instruction (which would probably include a human
readable value name ("n_dot_l") for debugging...)

In a tree-like view, the leaves of the tree would be "Memory Reads"
(which create a new value out of nothing), and the middle nodes would be
Expressions.  To obtain a tree, simply look at a value's generating
instruction.  If it's an expression, you have several new values, and
can consider those, inspecting their generating instructions, recursing.

Being able to quickly associate the use of a value with the generating
instruction makes it trivial to walk part of the IR as a tree, even if
it's actually stored as a flat list of instructions.  Which is pretty
awesome, and wouldn't rely on tree grafting working or the original
shader containing expression trees.

> Finally, it seems like a lot of important SSA-based passes assume that
> we have a flat IR, and so moving to SSA won't be nearly as beneficial
> as we would like it to be; we could flatten the IR before doing these
> passes, but that would make the first problem even worse. So we can't
> really take advantage of SSA too much either until we have a flat IR.

Agreed.  We should create a new flat IR and do SSA in that.

> The real issue is, how do we let this transition occur gradually, in
> pieces, without breaking existing code? Ian proposed one solution at
> FOSDEM, but here's my idea of another.
> 
> So, my idea is that rather than slowly introducing changes across the
> board, we create the IR in its final form in the beginning,

I agree.  While I wouldn't say "final form", I'm a strong believer in
spending some time and thinking really hard about what your IR or
language should look like before doing it.  Good design will pay off.

> write passes to flatten and unflatten the IR, and then piece-by-piece
> rewrite the rest of the compiler.

This, exactly.  The front-end should still generate trees, but then we
can flatten it, run new flatland passes, and then convert it back to the
tree IR.

As a next step, we could convert the linker to work on flat IR, then
convert the linked IR back to trees for the driver backend hand-off.

Then we could start writing direct flatland -> TGSI/i965/etc. converters.

Converting back to trees allows us to land things in stages, making the
whole thing a lot more manageable.  We just have to commit to actually
finish the job and eventually ditch the unflattening pass, so it doesn't
live forever :)

> We're going to have to rewrite a lot
> of the passes to support SSA in the first place, so why not convert
> them to a flat IR while we're at it? The benefit of this is that it's
> much easier to do asynchronously and in parallel; rather than
> introducing changes to the entire thing at once, several people can
> convert this and that pass, the frontend, the linker, etc.
> independently. It would entail some extra overhead during the
> transition in the form of the flattening and unflattening passes, but
> I think it would be worth it for the immediate benefits (optimizations
> like GVN-GCM and CSE made possible, etc.).
> 
> The first part to be converted would be my passes to convert to and
> from SSA, so that the compiler optimization part would look like this:
> 
> flatten -> convert to SSA -> (the new hotness) -> out of SSA ->
> unflatten -> (the old stuff)
> 
> Then we gradually convert ast_to_hir, various passes, the linker,
> backends, etc. to this form while now actually having the
> infrastructure to implement any advanced compiler optimization
> designed in the last ~15 years or so by more-or-less copying down the
> pseudocode. Hopefully, then, we can reach a point where we can rip out
> the old IR and the converters.
> 
> So what would this new IR look like? Well, here's my 2 cents (in the
> form of some abridged class definitions, you should get the point...)
> 
> struct ir_calc_source
> {
>     mode; /** < SSA or non-SSA */
>     union {
>         ir_calculation *def; /** < for SSA sources */
>         unsigned int reg; /** < for non-SSA sources */
>     } src;
>     unsigned swizzle : 8;
> };

Seems about right to me.  I'm not really sure about the best way to
represent swizzles and writemasks.  If you've found any decent
literature about SSA with vectors, I'd be really interested...

> struct ir_calc_dest
> {
>     mode; /** < SSA or non-SSA */
>     union {
>         unsigned int reg; /** < for non-SSA destinations */
> 
>         /**
>          * For SSA destinations. Types are needed here because
> normally they're part
>          * of the register, but SSA doesn't have registers.
>          */
>         glsl_type *type;
>     } reg_or_type; /* this name is kinda ugly but couldn't think of
> anything better. */
> };
>
> /*
>  * This is Ian's name for it, personally I would vote for
> s/ir_instruction/ir_node/ and
>  * call this ir_instruction
>  */
> 
> class ir_calculation
> {
>     ir_calc_dest dest;
>     ir_expression_operation op;
>     unsigned write_mask : 4;
>     ir_calc_source srcs[4];
> };
> 
> class ir_load_var
> {
>     ir_calc_dest dest;
>     ir_variable *src;

IMHO, we should keep the existing tree IR completely separate from the
new flatland IR.  Separate files and separate classes...maybe use a
different prefix.  The benefit would be that both IRs would be entirely
self-contained, and clearly defined, with explicit
flattening/unflattening modules that defines the translation between them.

>     /**
>      * For array and record loads, whether we're loading a specific
> member or the whole
>      * thing.
>      */
>     bool deref_member;
>     ir_calc_source array_index; /** < for array loads if
> deref_array_index is true */
>     char *record_index; /** < for structure loads */
> };
> 
> class ir_store_var
> {
>     ir_variable *dest;
>     ir_calc_source src;
>     bool deref_member;
>     ir_calc_source array_index; /** < for array loads */
>     char *record_index; /** < for structure loads */
>     unsigned write_mask : 4;
> };

These classes fit my categorization of things pretty well.

> So ir_variable still exists, but it will only be used for function
> parameters, shader in/outs and uniforms, and arrays and structures.
> Registers will be much more lightweight, only requiring a table with
> each register's type and perhaps uses and definitions. The flattening
> pass, and later ast_to_hir, will emit loads and stores wherever there
> is an ir_dereference now, but there will be an ir_variable -> register
> pass that converts these to moves that will later be eliminated by
> copy propagation (in SSA form, after converting the registers to SSA
> writes). This is similar to how LLVM works, with everything starting
> out allocated on the stack using alloca (equivalent to ir_variables
> here) and accessed explicitly using loads and stores, but then some of
> these loads/stores are optimized out.
> 
> What do you guys think about this? If everybody thinks this is a good
> idea, I can write up a patch series that implements the basic concept
> as well as the flatten and unflatten passes.
> 
> Connor

I'm really excited that you're looking into this.  Creating a new flat
IR was one of the main projects Matt and I had hoped to do this
year---we really need it for long term performance gains---but you've
beat us to it, and seem to be right on the mark.

I probably won't be able to spend a lot of time on this in the next few
weeks, but hopefully after that I'll have more time to start helping out.

--Ken

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140212/255cda2f/attachment-0001.pgp>