[Nouveau] Some llvm questions (for tgsi backend)

Tom Stellard tom at stellard.net
Mon Jan 11 09:10:10 PST 2016


On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote:
> Hi,
> 
> After a few distractions I'm back to work on the llvm tgsi backend. I've
> added clang integration and I can now compile a simple opencl program
> to something which sort of looks like tgsi.
> 
> You can find my latest work on this here:
> http://cgit.freedesktop.org/~jwrdegoede/llvm
> http://cgit.freedesktop.org/~jwrdegoede/clang
> (the latter may still need to sync)
> 
> I've a little test program of which I have 3 versions now,
> 1 raw gallium calls + a tgsi kernel
> 2 opencl calls to clover + a tgsi kernel
> 3 opencl calls to clover + an opencl kernel
> 
> 1 and 2 have been tested on a kepler card, 3 has been
> tested with pocl. My goal for this week is to get
> the tgsi backend to produce code which I can copy
> and paste into 2 and then have it working on a kepler card.
> 
> The test program looks like this:
> 
> __kernel void test_kern(__global uint *vals, __global uint *buf)
> {
>   uint id = get_global_id(0);
> 
>   buf[32 * id] -= vals[id];
> }
> 
> The llvm ir looks like this:
> 
> bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl
> 
> ; ModuleID = '/home/hans/foo.cl'
> target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
> target triple = "tgsi--"
> 
> ; Function Attrs: nounwind
> define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32 addrspace(1)* nocapture %buf) #0 {
> entry:
>   %call = tail call i32 @_Z13get_global_idj(i32 0) #2
>   %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
>   %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
>   %mul = shl i32 %call, 5
>   %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
>   %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>   %sub = sub i32 %1, %0
>   store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>   ret void
> }
> 
> declare i32 @_Z13get_global_idj(i32) #1
> 
> attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
> attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
> attributes #2 = { nounwind }
> 
> !opencl.kernels = !{!0}
> !llvm.ident = !{!6}
> 
> !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3, !4, !5}
> !1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
> !2 = !{!"kernel_arg_access_qual", !"none", !"none"}
> !3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
> !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
> !5 = !{!"kernel_arg_type_qual", !"", !""}
> !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git 7a311143550c6fc01aa5000049825ecc09787440)"}
> !7 = !{!8, !8, i64 0}
> !8 = !{!"int", !9, i64 0}
> !9 = !{!"omnipotent char", !10, i64 0}
> !10 = !{!"Simple C/C++ TBAA"}
> 
> And the "tgsi" looks like this:
> 
>         .text
>         .file   "/home/hans/foo.cl"
>         .globl  test_kern
> test_kern:
>         BGNSUB
>         MOVis TEMP1x, 0
>         CAL _Z13get_global_idj
>         SHLs TEMP1y, TEMP1x, 7
>         LOADiis TEMP1z, [4]
>         UADDs TEMP1y, TEMP1z, TEMP1y
>         SHLs TEMP1x, TEMP1x, 2
>         LOADiis TEMP1z, [0]
>         UADDs TEMP1x, TEMP1z, TEMP1x
>         LOADgis TEMP1x, [TEMP1x]
>         INEGs TEMP1x, TEMP1x
>         LOADgis TEMP1z, [TEMP1y]
>         UADDs TEMP1x, TEMP1x, TEMP1z
>         STOREgis [TEMP1y], TEMP1x
>         RET
>         ENDSUB
> 
> Working tgsi for this would look like this:
> 
> COMP
> DCL SV[0], THREAD_ID[0]
> DCL TEMP[0], LOCAL
> DCL TEMP[1], LOCAL
> IMM UINT32 { 0, 0, 0, 0 }
> IMM UINT32 { 4, 0, 0, 0 }
> IMM UINT32 { 128, 0, 0, 0 }
> 
> BGNSUB
>        LOAD TEMP[0].xy, RINPUT, IMM[0]
>        UMUL TEMP[1].x, SV[0], IMM[1]
>        UADD TEMP[0].x, TEMP[0], TEMP[1]
>        UMUL TEMP[1].x, SV[0], IMM[2]
>        UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
>        LOAD TEMP[1].x, RGLOBAL, TEMP[0]
>        LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
>        UADD TEMP[1].x, TEMP[0], -TEMP[1]
>        STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]
>        RET
> ENDSUB;
> 
> So my questions (I'm still quite green when it comes to llvm):
> 
> 1) As you can see a proper tgsi program needs a header
> to declare which registers (etc) it is using, in which
> class-method should I implement this ?
> 

These kinds of things should be emitted by the TGSI implementation of the
AsmPrinter class: http://llvm.org/docs/doxygen/html/classllvm_1_1AsmPrinter.html
You probably want to use EmitStartOfAsmFile() for the headers.

> 2) Immediates need to be declared with a specific
> value and then addressed as IMM[x], how would I go about
> this ?
> 

I would recommend adding a pass that replaced immediates in the code with
IMM file regitser accesses.

> 3) The get_global_id call needs to be translated into
> simply using the SV[0] "register", how would I go about
> this ?
> 

get_global_id should be implemented in libclc with tgsi specific intrinsics
that read from the system value registers.

> 4) The global and input load / stores are not handled
> correctly, I see that the LOAD instructions get postfixed
> with a i reps. g for input / global how would I go about
> modifying the code emitter (AsmPrinter?) to change "LOADi"
> into "LOAD <dest> RINPUT <offset>"?
> 

I don't understand exactly why you want to change this.  Are you trying to
make the assembly string for input loads and global loads look the same?

> 5) Talking about the lowecase suffixes to the instructions,
> these should not be part of the output, how do I filter these?
> 
> 6) And finally, the current llvm-tgsi output uses e.g.
> TEMP1y where as for the destination it should use TEMP[1].y
> and for the sources it should use TEMP[1].xxxx (so include
> proper swizzling info).
> 

You just need to change how the registers are printed in the
TGSIRegisterInfo.td file which should fix this.

> Lots of questions, sorry about that. Feel free to point me
> to some relvant parts of the docs, I've tried to find answers
> myself but I've gotten a bit lost in the docs.
> 

Ask more if you have them.

-Tom

> Regards,
> 
> Hans


More information about the Nouveau mailing list