[Nouveau] Some llvm questions (for tgsi backend)
Hans de Goede
hdegoede at redhat.com
Tue Jan 12 06:53:10 PST 2016
Hi Tom,
Thanks for taking the time to answer this.
On 11-01-16 18:10, Tom Stellard wrote:
> On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> After a few distractions I'm back to work on the llvm tgsi backend. I've
>> added clang integration and I can now compile a simple opencl program
>> to something which sort of looks like tgsi.
>>
>> You can find my latest work on this here:
>> http://cgit.freedesktop.org/~jwrdegoede/llvm
>> http://cgit.freedesktop.org/~jwrdegoede/clang
>> (the latter may still need to sync)
>>
>> I've a little test program of which I have 3 versions now,
>> 1 raw gallium calls + a tgsi kernel
>> 2 opencl calls to clover + a tgsi kernel
>> 3 opencl calls to clover + an opencl kernel
>>
>> 1 and 2 have been tested on a kepler card, 3 has been
>> tested with pocl. My goal for this week is to get
>> the tgsi backend to produce code which I can copy
>> and paste into 2 and then have it working on a kepler card.
>>
>> The test program looks like this:
>>
>> __kernel void test_kern(__global uint *vals, __global uint *buf)
>> {
>> uint id = get_global_id(0);
>>
>> buf[32 * id] -= vals[id];
>> }
>>
>> The llvm ir looks like this:
>>
>> bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl
>>
>> ; ModuleID = '/home/hans/foo.cl'
>> target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
>> target triple = "tgsi--"
>>
>> ; Function Attrs: nounwind
>> define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32 addrspace(1)* nocapture %buf) #0 {
>> entry:
>> %call = tail call i32 @_Z13get_global_idj(i32 0) #2
>> %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
>> %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
>> %mul = shl i32 %call, 5
>> %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
>> %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>> %sub = sub i32 %1, %0
>> store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>> ret void
>> }
>>
>> declare i32 @_Z13get_global_idj(i32) #1
>>
>> attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
>> attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
>> attributes #2 = { nounwind }
>>
>> !opencl.kernels = !{!0}
>> !llvm.ident = !{!6}
>>
>> !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3, !4, !5}
>> !1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
>> !2 = !{!"kernel_arg_access_qual", !"none", !"none"}
>> !3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
>> !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
>> !5 = !{!"kernel_arg_type_qual", !"", !""}
>> !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git 7a311143550c6fc01aa5000049825ecc09787440)"}
>> !7 = !{!8, !8, i64 0}
>> !8 = !{!"int", !9, i64 0}
>> !9 = !{!"omnipotent char", !10, i64 0}
>> !10 = !{!"Simple C/C++ TBAA"}
>>
>> And the "tgsi" looks like this:
>>
>> .text
>> .file "/home/hans/foo.cl"
>> .globl test_kern
>> test_kern:
>> BGNSUB
>> MOVis TEMP1x, 0
>> CAL _Z13get_global_idj
>> SHLs TEMP1y, TEMP1x, 7
>> LOADiis TEMP1z, [4]
>> UADDs TEMP1y, TEMP1z, TEMP1y
>> SHLs TEMP1x, TEMP1x, 2
>> LOADiis TEMP1z, [0]
>> UADDs TEMP1x, TEMP1z, TEMP1x
>> LOADgis TEMP1x, [TEMP1x]
>> INEGs TEMP1x, TEMP1x
>> LOADgis TEMP1z, [TEMP1y]
>> UADDs TEMP1x, TEMP1x, TEMP1z
>> STOREgis [TEMP1y], TEMP1x
>> RET
>> ENDSUB
>>
>> Working tgsi for this would look like this:
>>
>> COMP
>> DCL SV[0], THREAD_ID[0]
>> DCL TEMP[0], LOCAL
>> DCL TEMP[1], LOCAL
>> IMM UINT32 { 0, 0, 0, 0 }
>> IMM UINT32 { 4, 0, 0, 0 }
>> IMM UINT32 { 128, 0, 0, 0 }
>>
>> BGNSUB
>> LOAD TEMP[0].xy, RINPUT, IMM[0]
>> UMUL TEMP[1].x, SV[0], IMM[1]
>> UADD TEMP[0].x, TEMP[0], TEMP[1]
>> UMUL TEMP[1].x, SV[0], IMM[2]
>> UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
>> LOAD TEMP[1].x, RGLOBAL, TEMP[0]
>> LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
>> UADD TEMP[1].x, TEMP[0], -TEMP[1]
>> STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]
>> RET
>> ENDSUB;
>>
>> So my questions (I'm still quite green when it comes to llvm):
>>
>> 1) As you can see a proper tgsi program needs a header
>> to declare which registers (etc) it is using, in which
>> class-method should I implement this ?
>>
>
> These kinds of things should be emitted by the TGSI implementation of the
> AsmPrinter class: http://llvm.org/docs/doxygen/html/classllvm_1_1AsmPrinter.html
> You probably want to use EmitStartOfAsmFile() for the headers.
Ok, is there an easy way to suppress the generation of:
.text
.file "/home/hans/foo.cl"
.globl test_kern
?
It seems I need to tackle all 3 of those separately ?
>> 2) Immediates need to be declared with a specific
>> value and then addressed as IMM[x], how would I go about
>> this ?
>>
>
> I would recommend adding a pass that replaced immediates in the code with
> IMM file regitser accesses.
Thinking more about this, I believe I can use the existing const
mechanism for this (and then add a const generation pass to AsmPrinter).
How do I tell llvm to never generate instructions using immediates
and to instead always use the const address space for this ?
>> 3) The get_global_id call needs to be translated into
>> simply using the SV[0] "register", how would I go about
>> this ?
>>
>
> get_global_id should be implemented in libclc with tgsi specific intrinsics
> that read from the system value registers.
I see I will investigate this further.
>> 4) The global and input load / stores are not handled
>> correctly, I see that the LOAD instructions get postfixed
>> with a i reps. g for input / global how would I go about
>> modifying the code emitter (AsmPrinter?) to change "LOADi"
>> into "LOAD <dest> RINPUT <offset>"?
>>
>
> I don't understand exactly why you want to change this. Are you trying to
> make the assembly string for input loads and global loads look the same?
Currently the following code is generated:
LOADiis TEMP[1].z, [0]
This should be:
LOAD TEMP[1].z, RINPUT, 0
So I need to add handling to generate the RINPUT here. I'm wondering
what the best place is for this. Do I simply intercept the LOAD in
LowerMachineInstrToMCInst and add an extra operand there ?
>> 5) Talking about the lowecase suffixes to the instructions,
>> these should not be part of the output, how do I filter these?
>>
>> 6) And finally, the current llvm-tgsi output uses e.g.
>> TEMP1y where as for the destination it should use TEMP[1].y
>> and for the sources it should use TEMP[1].xxxx (so include
>> proper swizzling info).
>>
>
> You just need to change how the registers are printed in the
> TGSIRegisterInfo.td file which should fix this.
OK, so I've managed (easy) to change things so that now I get
(i.e.) :
UADDs TEMP[1].x, TEMP[1].x, TEMP[1].z
This is however not correct TGSI, correct would be:
UADD TEMP[1].x, TEMP[1].xxxx, TEMP[1].zzzz
Which may be shortend to:
UADD TEMP[1].x, TEMP[1], TEMP[1].zzzz
TGSI will automatically use the dest vector component from
the sources, and if you want to use another component you
need to specify a swizzle order, which is the 4 components
in an alternate order.
I'm not quite sure yet how to deal with this.
Regards,
Hans
More information about the Nouveau
mailing list