[Nouveau] Some llvm questions (for tgsi backend)

Tom Stellard tom at stellard.net
Mon Jan 11 10:17:47 PST 2016


On Mon, Jan 11, 2016 at 12:04:40PM -0500, Ilia Mirkin wrote:
> On Mon, Jan 11, 2016 at 6:07 AM, Hans de Goede <hdegoede at redhat.com> wrote:
> > Hi,
> >
> > After a few distractions I'm back to work on the llvm tgsi backend. I've
> > added clang integration and I can now compile a simple opencl program
> > to something which sort of looks like tgsi.
> >
> > You can find my latest work on this here:
> > http://cgit.freedesktop.org/~jwrdegoede/llvm
> > http://cgit.freedesktop.org/~jwrdegoede/clang
> > (the latter may still need to sync)
> >
> > I've a little test program of which I have 3 versions now,
> > 1 raw gallium calls + a tgsi kernel
> > 2 opencl calls to clover + a tgsi kernel
> > 3 opencl calls to clover + an opencl kernel
> >
> > 1 and 2 have been tested on a kepler card, 3 has been
> > tested with pocl. My goal for this week is to get
> > the tgsi backend to produce code which I can copy
> > and paste into 2 and then have it working on a kepler card.
> >
> > The test program looks like this:
> >
> > __kernel void test_kern(__global uint *vals, __global uint *buf)
> > {
> >   uint id = get_global_id(0);
> >
> >   buf[32 * id] -= vals[id];
> > }
> >
> > The llvm ir looks like this:
> >
> > bin/clang -x cl -c -emit-llvm -target tgsi-- -include
> > /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl
> >
> > ; ModuleID = '/home/hans/foo.cl'
> > target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
> > target triple = "tgsi--"
> >
> > ; Function Attrs: nounwind
> > define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32
> > addrspace(1)* nocapture %buf) #0 {
> > entry:
> >   %call = tail call i32 @_Z13get_global_idj(i32 0) #2
> >   %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
> >   %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
> >   %mul = shl i32 %call, 5
> >   %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
> >   %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
> >   %sub = sub i32 %1, %0
> >   store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
> >   ret void
> > }
> >
> > declare i32 @_Z13get_global_idj(i32) #1
> >
> > attributes #0 = { nounwind "disable-tail-calls"="false"
> > "less-precise-fpmad"="false" "no-frame-pointer-elim"="true"
> > "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false"
> > "no-nans-fp-math"="false" "stack-protector-buffer-size"="8"
> > "unsafe-fp-math"="false" "use-soft-float"="false" }
> > attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false"
> > "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
> > "no-infs-fp-math"="false" "no-nans-fp-math"="false"
> > "stack-protector-buffer-size"="8" "unsafe-fp-math"="false"
> > "use-soft-float"="false" }
> > attributes #2 = { nounwind }
> >
> > !opencl.kernels = !{!0}
> > !llvm.ident = !{!6}
> >
> > !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3,
> > !4, !5}
> > !1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
> > !2 = !{!"kernel_arg_access_qual", !"none", !"none"}
> > !3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
> > !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
> > !5 = !{!"kernel_arg_type_qual", !"", !""}
> > !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git
> > 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git
> > 7a311143550c6fc01aa5000049825ecc09787440)"}
> > !7 = !{!8, !8, i64 0}
> > !8 = !{!"int", !9, i64 0}
> > !9 = !{!"omnipotent char", !10, i64 0}
> > !10 = !{!"Simple C/C++ TBAA"}
> >
> > And the "tgsi" looks like this:
> >
> >         .text
> >         .file   "/home/hans/foo.cl"
> >         .globl  test_kern
> > test_kern:
> >         BGNSUB
> >         MOVis TEMP1x, 0
> >         CAL _Z13get_global_idj
> >         SHLs TEMP1y, TEMP1x, 7
> >         LOADiis TEMP1z, [4]
> >         UADDs TEMP1y, TEMP1z, TEMP1y
> >         SHLs TEMP1x, TEMP1x, 2
> >         LOADiis TEMP1z, [0]
> >         UADDs TEMP1x, TEMP1z, TEMP1x
> >         LOADgis TEMP1x, [TEMP1x]
> >         INEGs TEMP1x, TEMP1x
> >         LOADgis TEMP1z, [TEMP1y]
> >         UADDs TEMP1x, TEMP1x, TEMP1z
> >         STOREgis [TEMP1y], TEMP1x
> >         RET
> >         ENDSUB
> >
> > Working tgsi for this would look like this:
> >
> > COMP
> > DCL SV[0], THREAD_ID[0]
> > DCL TEMP[0], LOCAL
> > DCL TEMP[1], LOCAL
> > IMM UINT32 { 0, 0, 0, 0 }
> > IMM UINT32 { 4, 0, 0, 0 }
> > IMM UINT32 { 128, 0, 0, 0 }
> >
> > BGNSUB
> >        LOAD TEMP[0].xy, RINPUT, IMM[0]
> >        UMUL TEMP[1].x, SV[0], IMM[1]
> >        UADD TEMP[0].x, TEMP[0], TEMP[1]
> >        UMUL TEMP[1].x, SV[0], IMM[2]
> >        UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
> >        LOAD TEMP[1].x, RGLOBAL, TEMP[0]
> >        LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
> >        UADD TEMP[1].x, TEMP[0], -TEMP[1]
> >        STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]
> >        RET
> > ENDSUB;
> >
> > So my questions (I'm still quite green when it comes to llvm):
> >
> > 1) As you can see a proper tgsi program needs a header
> > to declare which registers (etc) it is using, in which
> > class-method should I implement this ?
> >
> > 2) Immediates need to be declared with a specific
> > value and then addressed as IMM[x], how would I go about
> > this ?
> >
> > 3) The get_global_id call needs to be translated into
> > simply using the SV[0] "register", how would I go about
> > this ?
> >
> > 4) The global and input load / stores are not handled
> > correctly, I see that the LOAD instructions get postfixed
> > with a i reps. g for input / global how would I go about
> > modifying the code emitter (AsmPrinter?) to change "LOADi"
> > into "LOAD <dest> RINPUT <offset>"?
> >
> > 5) Talking about the lowecase suffixes to the instructions,
> > these should not be part of the output, how do I filter these?
> >
> > 6) And finally, the current llvm-tgsi output uses e.g.
> > TEMP1y where as for the destination it should use TEMP[1].y
> > and for the sources it should use TEMP[1].xxxx (so include
> > proper swizzling info).
> >
> > Lots of questions, sorry about that. Feel free to point me
> > to some relvant parts of the docs, I've tried to find answers
> > myself but I've gotten a bit lost in the docs.
> 
> You may consider emitting binary TGSI. It's a semi-fluid format, but
> doesn't change too often (and usually does so in backwards-compatible
> ways).
> 

Yes, if you aren't already emitting binary, you should start doing this.  It will
make things much easier.

-Tom

> BTW, note that I recently got rid of TGSI_FILE_RESOURCE in favor of
> BUFFER and IMAGE register files (which in turn correlate to
> ->set_shader_buffers and ->set_shader_images). We need this for the
> various GL extensions (ssbo, atomic, images). Not sure how that
> integrates with what OpenCL needs.
> 
>   -ilia


More information about the Nouveau mailing list