[Mesa-dev] Proposal for a long-term shader compiler (and IR) architecture

Mon Oct 18 12:43:33 PDT 2010

On Mon, Oct 18, 2010 at 12:27 PM, José Fonseca <jfonseca at vmware.com> wrote:
> On Mon, 2010-10-18 at 10:52 -0700, Keith Whitwell wrote:
>> On Mon, Oct 18, 2010 at 9:18 AM, Jerome Glisse <j.glisse at gmail.com> wrote:
>> > On Fri, Oct 15, 2010 at 7:44 PM, John Kessenich <johnk at lunarg.com> wrote:
>> >> Hi,
>> >> LunarG has decided to work on an open source, long-term, highly-functional,
>> >> and modular shader and kernel compiler stack. Attached is our high-level
>> >> proposal for this compiler architecture (LunarGLASS).  We would like to
>> >> solicit feedback from the open source community on doing this.
>> >> I have read several posts here where it seems the time has come for
>> >> something like this, and in that spirit, I hope this is consistent with the
>> >> desire and direction many contributors to this list have already alluded to.
>> >> Perhaps the biggest point of the proposal is to standardize on LLVM as an
>> >> intermediate representation.  This is actually done at two levels within the
>> >> proposal; one at a high-level IR close to the source language and one at a
>> >> low-level IR close to the target architecture.  The full picture is in the
>> >> attached document.
>> >> Based on feedback to this proposal, our next step is to more precisely
>> >> define the two forms of LLVM IR.
>> >> Please let me know if you have any trouble reading the attached, or any
>> >> questions, or any feedback regarding the proposal.
>> >> Thanks,
>> >> JohnK
>
>
>> > Just a quick reply (i won't have carefully read through this proposition before
>> > couple weeks) last time i check LLVM didn't seemed to fit the bill for GPU,
>> > newer GPU can be seen as close to scalar but not completely, there are
>> > restriction on instruction packing and the amount of data computation
>> > unit of gpu can access per cycle, also register allocation is different
>> > from normal CPU, you don't wan to do register peeling on GPU. So from
>> > my POV instruction scheduling & packing and register allocation are
>> > interlace process (where you store variable impact instruction packing).
>> > Also on newer gpu it makes sense to use a mixed scalar/vector representation
>> > to preserve things like dot product.
>
> LLVM has always been able represent both scalar and vectors. Although
> the dot product is not natively represented in IR, one can perfectly
> define an dot product intrinsic which takes two vectors and returns a
> scalar. I haven't look at the backends, but I believe the same applies.
>
>> Last loop, jump, function have kind
>> > of unsual restriction unlike any CPU (thought i haven't broad CPU knowledge)
>> >
>> > Bottom line is i don't think LLVM is anywhere near what would help us.
>>
>> I think this is the big question mark with this proposal -- basically
>> can it be done?
>
> I also think there are indeed challenges translating LLVM IR to
> something like TGSI, Mesa IR; and I was skeptical about standardizing on
> LLVM IR for quite some time, but lately I've been reaching the
> conclusion that there's so much momentum behind LLVM that the
> benefits/synergy one gets by leveraging it will most likely exceed the
> pitfalls.
>
> But I never felt much skepticism for GPU code generation. There is e.g.,
> a LLVM PTX backend already out there. And if it's not easy to make a
> LLVM backend for a particular GPU, then it should be at very least
> possible to implement a LLVM backend that generates a code in a
> representation very close to the GPU code, and do the final steps (e.g.,
> register allocation, scheduling, etc) in a custom pass. Therefore
> benefiting from all high level optimizations that happened before.
>
>> If it can't be done, we'll find out quickly, if it can then we can
>> stop debating whether or not it's possible.

The biggest problems I had when trying to write an r300 backend for
LLVM were largely because of the massively specialized nature of
pre-Dx10 GPUs, which are closer to DSPs than anything LLVM normally
targets. In particular, v4f32 is the only kind of register available,
there's not really any way to spill registers, etc. I suspect nvfx and
i915 have similar issues although I'll readily admit to not knowing
the hardware that well.

If we transparently LLVMize all shaders before handing them to pipe
drivers, and use a low-level IR that retains LLVM's optimizations,
then I am okay with that. If LLVM can understand enough of the various
scheduling problems to be worthwhile for the entire shader path, then
I'm okay with that too. I just don't want yet another intermediate
layer that doesn't actually improve anything.

~ C.

-- 
When the facts change, I change my mind. What do you do, sir? ~ Keynes

Corbin Simpson
<MostAwesomeDude at gmail.com>