[Mesa-dev] A question about TGSI->LLVM IR conversion

Sun Feb 13 14:05:26 PST 2011

On Sun, Feb 13, 2011 at 2:45 PM, Christian König
<deathsimple at vodafone.de> wrote:
> Hi,
>
> I'm currently playing around a bit with the TGSI->LLVM IR conversion
> found in the llvmpipe/gallivm driver. While doing so I managed to hock
> into the r600g shader generation a call to lp_build_tgsi_soa and dumping
> the resulting LLVM IR additionally to the TGSI and R600 assembler
> representation. The goal is to compare TGSI with R600 assembler and LLVM
> IR for possible new ideas how to make better optimization or even
> simplifying code generation.
>
> The problem is that the LLVM IR doesn't looks like what I have expected:
>
> TGSI input:
> FRAG
> DCL IN[0], COLOR, PERSPECTIVE
> DCL OUT[0], COLOR
>  0: MOV OUT[0], IN[0]
>  1: END
>
> generated LLVM IR:
> define void @test_shader(<4 x float>) {
> entry:
>  %output3 = alloca <4 x float>
>  %output2 = alloca <4 x float>
>  %output1 = alloca <4 x float>
>  %output = alloca <4 x float>
>  store <4 x float> zeroinitializer, <4 x float>* %output
>  store <4 x float> zeroinitializer, <4 x float>* %output1
>  store <4 x float> zeroinitializer, <4 x float>* %output2
>  store <4 x float> zeroinitializer, <4 x float>* %output3
>  store <4 x float> %0, <4 x float>* %output
>  store <4 x float> %0, <4 x float>* %output1
>  store <4 x float> %0, <4 x float>* %output2
>  store <4 x float> %0, <4 x float>* %output3
> }
>
> It looks like that a vector type is used for each TGSI channel instead
> of a scalar. So is there something wrong with my setup of gallivm, call
> to lp_build_tgsi_soa or any other prerequisite not meet? Or is
> lp_build_tgsi_soa just supposed to work like this because llvmpipe
> always work with a 2x2 pixel block?
>
> I also expected a pure SSA form, but the generated code uses stack
> allocated memory for storing the temporary and output registers, but
> this clearly seems to be a current limitation of lp_build_tgsi_soa.
>
> Here is the code I'm using to call lp_build_tgsi_soa:
>
> static void llvm_test(const struct pipe_shader_state *state)
> {
>        int i, j;
>
>        struct gallivm_state *gallivm = gallivm_create();
>        struct lp_type type = lp_type_float_vec(32);
>        struct tgsi_shader_info info;
>
>        LLVMTypeRef float_type = LLVMFloatTypeInContext(gallivm->context);
>        LLVMTypeRef vec_type = LLVMVectorType(float_type, 4);
>        LLVMTypeRef args[PIPE_MAX_SHADER_INPUTS] = { };
>        LLVMValueRef inputs[PIPE_MAX_SHADER_INPUTS][4] = { };
>        LLVMValueRef outputs[PIPE_MAX_SHADER_OUTPUTS][4] = { };
>        LLVMValueRef consts_ptr;
>        LLVMTypeRef func_type;
>        LLVMValueRef function;
>        LLVMBasicBlockRef block;
>        LLVMBuilderRef builder;
>
>        tgsi_scan_shader(state->tokens, &info);
>
>        for (i=0; i<info.file_count[TGSI_FILE_INPUT]; ++i)
>                args[i] = vec_type;
>
>        func_type = LLVMFunctionType(LLVMVoidTypeInContext(gallivm->context),
>                                        (LLVMTypeRef*)args, info.file_count[TGSI_FILE_INPUT], 0);
>        function = LLVMAddFunction(gallivm->module, "test_shader", func_type);
>        LLVMSetFunctionCallConv(function, LLVMCCallConv);
>
>        for (i=0; i<info.file_count[TGSI_FILE_INPUT]; ++i)
>                for (j=0; j<4; ++j)
>                        inputs[i][j] = LLVMGetParam(function, i);
>
>        consts_ptr = LLVMAddGlobal(gallivm->module, float_type, "consts_ptr");
>
>        block = LLVMAppendBasicBlockInContext(gallivm->context, function, "entry");
>        builder = gallivm->builder;
>        assert(builder);
>        LLVMPositionBuilderAtEnd(builder, block);
>
>        lp_build_tgsi_soa(gallivm, state->tokens, type, NULL,
>                          consts_ptr, NULL,
>                          NULL, inputs,
>                          outputs, NULL, &info);
>
>        LLVMDeleteFunction(function);
>        LLVMDeleteGlobal(consts_ptr);
>        gallivm_destroy(gallivm);
> }
>
> Could somebody please explain this behaviour, is it a bug in my test
> code, does it just works as designed?
>
> Thanks for your time,
> Christian.

IIRC it due to the soa (structure of array) approach each output
component is an array and it deals with 2x2 block. That being said i
don't think llvm is suited to GPU. Instruction scheduling and register
allocation are way too much different than from the other target
architecture of LLVM.

Also i am not sure how good would be llvm SSA in front of masked
vector use (when only a subset of the vector component are updated)
last time i looked at this llvm would simply miss lot of optimization
opportunities.

I have been working on reusing the nvc0/nv50 shader compiler, i still
don't know how good the end result might be as scheduling+gpr
allocation from scalar form is not a simple problem. I have got couple
different algorithm in mind and i have been working on implementing
them i hope to have something working early next week.

Cheers,
Jerome Glisse