[Mesa-dev] Flatland

Christian König deathsimple at vodafone.de
Fri Feb 7 08:20:12 PST 2014


Am 07.02.2014 16:49, schrieb Alex Deucher:
> On Fri, Feb 7, 2014 at 12:34 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>> Hi,
>>
>> So I believe that we can all agree that the tree-based representation
>> that GLSL IR currently uses for shaders needs to go. For the benefit
>> of those that didn't watch Ian Romanick's talk at FOSDEM, I'll
>> reiterate some of the problems with it as of now:
>>
>> - All the ir_dereference chains blow up the memory usage, and the
>> constant pointer chasing in the recursive algorithms needed to handle
>> them is not just cache-unfriendly but "cache-mean."
>>
>> - The ir_hierachical_visitor pattern that we currently use for
>> optimization/analysis passes has to examine every piece of IR, even
>> the irrelevant stuff, making the above problems even worse.
>>
>> - Nobody else does it this way, meaning that the existing well-known
>> optimizations don't apply as much here, and oftentimes we have to
>> write some pretty nasty code in order to make necessary optimizations
>> (like tree grafting).
>>
>> - It turns out that the original advantage of a tree-based IR, to be
>> able to automatically generate pattern-matching code for optimizing
>> certain code patterns, only really matters for CPU's with weird
>> instruction sets with lots of exotic instructions; GPU's tend to be
>> pretty regular and consistent in their ISA's, so being able to
>> pattern-match with trees doesn't help us much here.
>>
>> Finally, it seems like a lot of important SSA-based passes assume that
>> we have a flat IR, and so moving to SSA won't be nearly as beneficial
>> as we would like it to be; we could flatten the IR before doing these
>> passes, but that would make the first problem even worse. So we can't
>> really take advantage of SSA too much either until we have a flat IR.
>>
>> The real issue is, how do we let this transition occur gradually, in
>> pieces, without breaking existing code? Ian proposed one solution at
>> FOSDEM, but here's my idea of another.
>>
>> So, my idea is that rather than slowly introducing changes across the
>> board, we create the IR in its final form in the beginning, write
>> passes to flatten and unflatten the IR, and then piece-by-piece
>> rewrite the rest of the compiler. We're going to have to rewrite a lot
>> of the passes to support SSA in the first place, so why not convert
>> them to a flat IR while we're at it? The benefit of this is that it's
>> much easier to do asynchronously and in parallel; rather than
>> introducing changes to the entire thing at once, several people can
>> convert this and that pass, the frontend, the linker, etc.
>> independently. It would entail some extra overhead during the
>> transition in the form of the flattening and unflattening passes, but
>> I think it would be worth it for the immediate benefits (optimizations
>> like GVN-GCM and CSE made possible, etc.).
>>
>> The first part to be converted would be my passes to convert to and
>> from SSA, so that the compiler optimization part would look like this:
>>
>> flatten -> convert to SSA -> (the new hotness) -> out of SSA ->
>> unflatten -> (the old stuff)
>>
>> Then we gradually convert ast_to_hir, various passes, the linker,
>> backends, etc. to this form while now actually having the
>> infrastructure to implement any advanced compiler optimization
>> designed in the last ~15 years or so by more-or-less copying down the
>> pseudocode. Hopefully, then, we can reach a point where we can rip out
>> the old IR and the converters.
>>
>> So what would this new IR look like? Well, here's my 2 cents (in the
>> form of some abridged class definitions, you should get the point...)
>>
>> struct ir_calc_source
>> {
>>      mode; /** < SSA or non-SSA */
>>      union {
>>          ir_calculation *def; /** < for SSA sources */
>>          unsigned int reg; /** < for non-SSA sources */
>>      } src;
>>      unsigned swizzle : 8;
>> };
>>
>> struct ir_calc_dest
>> {
>>      mode; /** < SSA or non-SSA */
>>      union {
>>          unsigned int reg; /** < for non-SSA destinations */
>>
>>          /**
>>           * For SSA destinations. Types are needed here because
>> normally they're part
>>           * of the register, but SSA doesn't have registers.
>>           */
>>          glsl_type *type;
>>      } reg_or_type; /* this name is kinda ugly but couldn't think of
>> anything better. */
>> };
>>
>> /*
>>   * This is Ian's name for it, personally I would vote for
>> s/ir_instruction/ir_node/ and
>>   * call this ir_instruction
>>   */
>>
>> class ir_calculation
>> {
>>      ir_calc_dest dest;
>>      ir_expression_operation op;
>>      unsigned write_mask : 4;
>>      ir_calc_source srcs[4];
>> };
>>
>> class ir_load_var
>> {
>>      ir_calc_dest dest;
>>      ir_variable *src;
>>
>>      /**
>>       * For array and record loads, whether we're loading a specific
>> member or the whole
>>       * thing.
>>       */
>>      bool deref_member;
>>      ir_calc_source array_index; /** < for array loads if
>> deref_array_index is true */
>>      char *record_index; /** < for structure loads */
>> };
>>
>> class ir_store_var
>> {
>>      ir_variable *dest;
>>      ir_calc_source src;
>>      bool deref_member;
>>      ir_calc_source array_index; /** < for array loads */
>>      char *record_index; /** < for structure loads */
>>      unsigned write_mask : 4;
>> };
>>
>> So ir_variable still exists, but it will only be used for function
>> parameters, shader in/outs and uniforms, and arrays and structures.
>> Registers will be much more lightweight, only requiring a table with
>> each register's type and perhaps uses and definitions. The flattening
>> pass, and later ast_to_hir, will emit loads and stores wherever there
>> is an ir_dereference now, but there will be an ir_variable -> register
>> pass that converts these to moves that will later be eliminated by
>> copy propagation (in SSA form, after converting the registers to SSA
>> writes). This is similar to how LLVM works, with everything starting
>> out allocated on the stack using alloca (equivalent to ir_variables
>> here) and accessed explicitly using loads and stores, but then some of
>> these loads/stores are optimized out.
>>
> What about just moving to llvm directly?  We already use it for
> compute/OpenCL on gallium and as the shader compiler for radeon
> hardware and llvmpipe.

That was discussed in the talk as well. LLVM would be a good choice for 
this, the only problem is that they have no stable API.

I'm currently thinking about if it isn't possible to make llvm-c stable 
and reliable enough to be used for this, but this is rather something we 
would need to discuss with the LLVM folks as well.

Christian.

>
> Alex
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev



More information about the mesa-dev mailing list