[Mesa-dev] Flatland

Fri Feb 7 12:18:37 PST 2014

On Fri, Feb 07, 2014 at 02:52:15PM -0500, Rob Clark wrote:
> On Fri, Feb 7, 2014 at 11:20 AM, Christian König
> <deathsimple at vodafone.de> wrote:
> > Am 07.02.2014 16:49, schrieb Alex Deucher:
> >
> >> On Fri, Feb 7, 2014 at 12:34 AM, Connor Abbott <cwabbott0 at gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> So I believe that we can all agree that the tree-based representation
> >>> that GLSL IR currently uses for shaders needs to go. For the benefit
> >>> of those that didn't watch Ian Romanick's talk at FOSDEM, I'll
> >>> reiterate some of the problems with it as of now:
> >>>
> >>> - All the ir_dereference chains blow up the memory usage, and the
> >>> constant pointer chasing in the recursive algorithms needed to handle
> >>> them is not just cache-unfriendly but "cache-mean."
> >>>
> >>> - The ir_hierachical_visitor pattern that we currently use for
> >>> optimization/analysis passes has to examine every piece of IR, even
> >>> the irrelevant stuff, making the above problems even worse.
> >>>
> >>> - Nobody else does it this way, meaning that the existing well-known
> >>> optimizations don't apply as much here, and oftentimes we have to
> >>> write some pretty nasty code in order to make necessary optimizations
> >>> (like tree grafting).
> >>>
> >>> - It turns out that the original advantage of a tree-based IR, to be
> >>> able to automatically generate pattern-matching code for optimizing
> >>> certain code patterns, only really matters for CPU's with weird
> >>> instruction sets with lots of exotic instructions; GPU's tend to be
> >>> pretty regular and consistent in their ISA's, so being able to
> >>> pattern-match with trees doesn't help us much here.
> >>>
> >>> Finally, it seems like a lot of important SSA-based passes assume that
> >>> we have a flat IR, and so moving to SSA won't be nearly as beneficial
> >>> as we would like it to be; we could flatten the IR before doing these
> >>> passes, but that would make the first problem even worse. So we can't
> >>> really take advantage of SSA too much either until we have a flat IR.
> >>>
> >>> The real issue is, how do we let this transition occur gradually, in
> >>> pieces, without breaking existing code? Ian proposed one solution at
> >>> FOSDEM, but here's my idea of another.
> >>>
> >>> So, my idea is that rather than slowly introducing changes across the
> >>> board, we create the IR in its final form in the beginning, write
> >>> passes to flatten and unflatten the IR, and then piece-by-piece
> >>> rewrite the rest of the compiler. We're going to have to rewrite a lot
> >>> of the passes to support SSA in the first place, so why not convert
> >>> them to a flat IR while we're at it? The benefit of this is that it's
> >>> much easier to do asynchronously and in parallel; rather than
> >>> introducing changes to the entire thing at once, several people can
> >>> convert this and that pass, the frontend, the linker, etc.
> >>> independently. It would entail some extra overhead during the
> >>> transition in the form of the flattening and unflattening passes, but
> >>> I think it would be worth it for the immediate benefits (optimizations
> >>> like GVN-GCM and CSE made possible, etc.).
> >>>
> >>> The first part to be converted would be my passes to convert to and
> >>> from SSA, so that the compiler optimization part would look like this:
> >>>
> >>> flatten -> convert to SSA -> (the new hotness) -> out of SSA ->
> >>> unflatten -> (the old stuff)
> >>>
> >>> Then we gradually convert ast_to_hir, various passes, the linker,
> >>> backends, etc. to this form while now actually having the
> >>> infrastructure to implement any advanced compiler optimization
> >>> designed in the last ~15 years or so by more-or-less copying down the
> >>> pseudocode. Hopefully, then, we can reach a point where we can rip out
> >>> the old IR and the converters.
> >>>
> >>> So what would this new IR look like? Well, here's my 2 cents (in the
> >>> form of some abridged class definitions, you should get the point...)
> >>>
> >>> struct ir_calc_source
> >>> {
> >>>      mode; /** < SSA or non-SSA */
> >>>      union {
> >>>          ir_calculation *def; /** < for SSA sources */
> >>>          unsigned int reg; /** < for non-SSA sources */
> >>>      } src;
> >>>      unsigned swizzle : 8;
> >>> };
> >>>
> >>> struct ir_calc_dest
> >>> {
> >>>      mode; /** < SSA or non-SSA */
> >>>      union {
> >>>          unsigned int reg; /** < for non-SSA destinations */
> >>>
> >>>          /**
> >>>           * For SSA destinations. Types are needed here because
> >>> normally they're part
> >>>           * of the register, but SSA doesn't have registers.
> >>>           */
> >>>          glsl_type *type;
> >>>      } reg_or_type; /* this name is kinda ugly but couldn't think of
> >>> anything better. */
> >>> };
> >>>
> >>> /*
> >>>   * This is Ian's name for it, personally I would vote for
> >>> s/ir_instruction/ir_node/ and
> >>>   * call this ir_instruction
> >>>   */
> >>>
> >>> class ir_calculation
> >>> {
> >>>      ir_calc_dest dest;
> >>>      ir_expression_operation op;
> >>>      unsigned write_mask : 4;
> >>>      ir_calc_source srcs[4];
> >>> };
> >>>
> >>> class ir_load_var
> >>> {
> >>>      ir_calc_dest dest;
> >>>      ir_variable *src;
> >>>
> >>>      /**
> >>>       * For array and record loads, whether we're loading a specific
> >>> member or the whole
> >>>       * thing.
> >>>       */
> >>>      bool deref_member;
> >>>      ir_calc_source array_index; /** < for array loads if
> >>> deref_array_index is true */
> >>>      char *record_index; /** < for structure loads */
> >>> };
> >>>
> >>> class ir_store_var
> >>> {
> >>>      ir_variable *dest;
> >>>      ir_calc_source src;
> >>>      bool deref_member;
> >>>      ir_calc_source array_index; /** < for array loads */
> >>>      char *record_index; /** < for structure loads */
> >>>      unsigned write_mask : 4;
> >>> };
> >>>
> >>> So ir_variable still exists, but it will only be used for function
> >>> parameters, shader in/outs and uniforms, and arrays and structures.
> >>> Registers will be much more lightweight, only requiring a table with
> >>> each register's type and perhaps uses and definitions. The flattening
> >>> pass, and later ast_to_hir, will emit loads and stores wherever there
> >>> is an ir_dereference now, but there will be an ir_variable -> register
> >>> pass that converts these to moves that will later be eliminated by
> >>> copy propagation (in SSA form, after converting the registers to SSA
> >>> writes). This is similar to how LLVM works, with everything starting
> >>> out allocated on the stack using alloca (equivalent to ir_variables
> >>> here) and accessed explicitly using loads and stores, but then some of
> >>> these loads/stores are optimized out.
> >>>
> >> What about just moving to llvm directly?  We already use it for
> >> compute/OpenCL on gallium and as the shader compiler for radeon
> >> hardware and llvmpipe.
> >
> >
> > That was discussed in the talk as well. LLVM would be a good choice for
> > this, the only problem is that they have no stable API.
> >
> > I'm currently thinking about if it isn't possible to make llvm-c stable and
> > reliable enough to be used for this, but this is rather something we would
> > need to discuss with the LLVM folks as well.
> 
> Would the C API be sufficient for a driver that had it's own special
> scheduling or register assignment constraints?  Or would it just be
> something we continue to turn into our own driver private IR like we
> currently do with tgsi?
>

You can't really use the C API or even the C++ API for scheduling or
register assignments.  You would need to write a full-blown LLVM backend
in order to do that, otherwise you would still have to lower it into your
on driver specific IR.

The C API gives gives you the ability to manipulate the IR and also run
generic transforms and optimization passes on your code.  If LLVM IR
were used as a target independent shader IR in Mesa, I think this would
be most of the functionality that was needed.

> Just curious, something more suitable than tgsi would be nice but
> dealing with unstable c++ abi seems like a real pain.  Especially on
> slower arm devices if I end up having to recompile llvm all the time.
>

It may be possible to build only a subset of the LLVM libraries to use
with Mesa.  If you didn't have to build any of the CodeGen libraries, then
recompiling wouldn't be so bad.

-Tom

> BR,
> -R
> 
> > Christian.
> >
> >
> >>
> >> Alex
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev