[Mesa-dev] Flatland

Rob Clark robdclark at gmail.com
Fri Feb 7 11:52:15 PST 2014


On Fri, Feb 7, 2014 at 11:20 AM, Christian König
<deathsimple at vodafone.de> wrote:
> Am 07.02.2014 16:49, schrieb Alex Deucher:
>
>> On Fri, Feb 7, 2014 at 12:34 AM, Connor Abbott <cwabbott0 at gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> So I believe that we can all agree that the tree-based representation
>>> that GLSL IR currently uses for shaders needs to go. For the benefit
>>> of those that didn't watch Ian Romanick's talk at FOSDEM, I'll
>>> reiterate some of the problems with it as of now:
>>>
>>> - All the ir_dereference chains blow up the memory usage, and the
>>> constant pointer chasing in the recursive algorithms needed to handle
>>> them is not just cache-unfriendly but "cache-mean."
>>>
>>> - The ir_hierachical_visitor pattern that we currently use for
>>> optimization/analysis passes has to examine every piece of IR, even
>>> the irrelevant stuff, making the above problems even worse.
>>>
>>> - Nobody else does it this way, meaning that the existing well-known
>>> optimizations don't apply as much here, and oftentimes we have to
>>> write some pretty nasty code in order to make necessary optimizations
>>> (like tree grafting).
>>>
>>> - It turns out that the original advantage of a tree-based IR, to be
>>> able to automatically generate pattern-matching code for optimizing
>>> certain code patterns, only really matters for CPU's with weird
>>> instruction sets with lots of exotic instructions; GPU's tend to be
>>> pretty regular and consistent in their ISA's, so being able to
>>> pattern-match with trees doesn't help us much here.
>>>
>>> Finally, it seems like a lot of important SSA-based passes assume that
>>> we have a flat IR, and so moving to SSA won't be nearly as beneficial
>>> as we would like it to be; we could flatten the IR before doing these
>>> passes, but that would make the first problem even worse. So we can't
>>> really take advantage of SSA too much either until we have a flat IR.
>>>
>>> The real issue is, how do we let this transition occur gradually, in
>>> pieces, without breaking existing code? Ian proposed one solution at
>>> FOSDEM, but here's my idea of another.
>>>
>>> So, my idea is that rather than slowly introducing changes across the
>>> board, we create the IR in its final form in the beginning, write
>>> passes to flatten and unflatten the IR, and then piece-by-piece
>>> rewrite the rest of the compiler. We're going to have to rewrite a lot
>>> of the passes to support SSA in the first place, so why not convert
>>> them to a flat IR while we're at it? The benefit of this is that it's
>>> much easier to do asynchronously and in parallel; rather than
>>> introducing changes to the entire thing at once, several people can
>>> convert this and that pass, the frontend, the linker, etc.
>>> independently. It would entail some extra overhead during the
>>> transition in the form of the flattening and unflattening passes, but
>>> I think it would be worth it for the immediate benefits (optimizations
>>> like GVN-GCM and CSE made possible, etc.).
>>>
>>> The first part to be converted would be my passes to convert to and
>>> from SSA, so that the compiler optimization part would look like this:
>>>
>>> flatten -> convert to SSA -> (the new hotness) -> out of SSA ->
>>> unflatten -> (the old stuff)
>>>
>>> Then we gradually convert ast_to_hir, various passes, the linker,
>>> backends, etc. to this form while now actually having the
>>> infrastructure to implement any advanced compiler optimization
>>> designed in the last ~15 years or so by more-or-less copying down the
>>> pseudocode. Hopefully, then, we can reach a point where we can rip out
>>> the old IR and the converters.
>>>
>>> So what would this new IR look like? Well, here's my 2 cents (in the
>>> form of some abridged class definitions, you should get the point...)
>>>
>>> struct ir_calc_source
>>> {
>>>      mode; /** < SSA or non-SSA */
>>>      union {
>>>          ir_calculation *def; /** < for SSA sources */
>>>          unsigned int reg; /** < for non-SSA sources */
>>>      } src;
>>>      unsigned swizzle : 8;
>>> };
>>>
>>> struct ir_calc_dest
>>> {
>>>      mode; /** < SSA or non-SSA */
>>>      union {
>>>          unsigned int reg; /** < for non-SSA destinations */
>>>
>>>          /**
>>>           * For SSA destinations. Types are needed here because
>>> normally they're part
>>>           * of the register, but SSA doesn't have registers.
>>>           */
>>>          glsl_type *type;
>>>      } reg_or_type; /* this name is kinda ugly but couldn't think of
>>> anything better. */
>>> };
>>>
>>> /*
>>>   * This is Ian's name for it, personally I would vote for
>>> s/ir_instruction/ir_node/ and
>>>   * call this ir_instruction
>>>   */
>>>
>>> class ir_calculation
>>> {
>>>      ir_calc_dest dest;
>>>      ir_expression_operation op;
>>>      unsigned write_mask : 4;
>>>      ir_calc_source srcs[4];
>>> };
>>>
>>> class ir_load_var
>>> {
>>>      ir_calc_dest dest;
>>>      ir_variable *src;
>>>
>>>      /**
>>>       * For array and record loads, whether we're loading a specific
>>> member or the whole
>>>       * thing.
>>>       */
>>>      bool deref_member;
>>>      ir_calc_source array_index; /** < for array loads if
>>> deref_array_index is true */
>>>      char *record_index; /** < for structure loads */
>>> };
>>>
>>> class ir_store_var
>>> {
>>>      ir_variable *dest;
>>>      ir_calc_source src;
>>>      bool deref_member;
>>>      ir_calc_source array_index; /** < for array loads */
>>>      char *record_index; /** < for structure loads */
>>>      unsigned write_mask : 4;
>>> };
>>>
>>> So ir_variable still exists, but it will only be used for function
>>> parameters, shader in/outs and uniforms, and arrays and structures.
>>> Registers will be much more lightweight, only requiring a table with
>>> each register's type and perhaps uses and definitions. The flattening
>>> pass, and later ast_to_hir, will emit loads and stores wherever there
>>> is an ir_dereference now, but there will be an ir_variable -> register
>>> pass that converts these to moves that will later be eliminated by
>>> copy propagation (in SSA form, after converting the registers to SSA
>>> writes). This is similar to how LLVM works, with everything starting
>>> out allocated on the stack using alloca (equivalent to ir_variables
>>> here) and accessed explicitly using loads and stores, but then some of
>>> these loads/stores are optimized out.
>>>
>> What about just moving to llvm directly?  We already use it for
>> compute/OpenCL on gallium and as the shader compiler for radeon
>> hardware and llvmpipe.
>
>
> That was discussed in the talk as well. LLVM would be a good choice for
> this, the only problem is that they have no stable API.
>
> I'm currently thinking about if it isn't possible to make llvm-c stable and
> reliable enough to be used for this, but this is rather something we would
> need to discuss with the LLVM folks as well.

Would the C API be sufficient for a driver that had it's own special
scheduling or register assignment constraints?  Or would it just be
something we continue to turn into our own driver private IR like we
currently do with tgsi?

Just curious, something more suitable than tgsi would be nice but
dealing with unstable c++ abi seems like a real pain.  Especially on
slower arm devices if I end up having to recompile llvm all the time.

BR,
-R

> Christian.
>
>
>>
>> Alex
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list