[Mesa-dev] A question about TGSI->LLVM IR conversion
Jerome Glisse
j.glisse at gmail.com
Sun Feb 13 14:05:26 PST 2011
On Sun, Feb 13, 2011 at 2:45 PM, Christian König
<deathsimple at vodafone.de> wrote:
> Hi,
>
> I'm currently playing around a bit with the TGSI->LLVM IR conversion
> found in the llvmpipe/gallivm driver. While doing so I managed to hock
> into the r600g shader generation a call to lp_build_tgsi_soa and dumping
> the resulting LLVM IR additionally to the TGSI and R600 assembler
> representation. The goal is to compare TGSI with R600 assembler and LLVM
> IR for possible new ideas how to make better optimization or even
> simplifying code generation.
>
> The problem is that the LLVM IR doesn't looks like what I have expected:
>
> TGSI input:
> FRAG
> DCL IN[0], COLOR, PERSPECTIVE
> DCL OUT[0], COLOR
> 0: MOV OUT[0], IN[0]
> 1: END
>
> generated LLVM IR:
> define void @test_shader(<4 x float>) {
> entry:
> %output3 = alloca <4 x float>
> %output2 = alloca <4 x float>
> %output1 = alloca <4 x float>
> %output = alloca <4 x float>
> store <4 x float> zeroinitializer, <4 x float>* %output
> store <4 x float> zeroinitializer, <4 x float>* %output1
> store <4 x float> zeroinitializer, <4 x float>* %output2
> store <4 x float> zeroinitializer, <4 x float>* %output3
> store <4 x float> %0, <4 x float>* %output
> store <4 x float> %0, <4 x float>* %output1
> store <4 x float> %0, <4 x float>* %output2
> store <4 x float> %0, <4 x float>* %output3
> }
>
> It looks like that a vector type is used for each TGSI channel instead
> of a scalar. So is there something wrong with my setup of gallivm, call
> to lp_build_tgsi_soa or any other prerequisite not meet? Or is
> lp_build_tgsi_soa just supposed to work like this because llvmpipe
> always work with a 2x2 pixel block?
>
> I also expected a pure SSA form, but the generated code uses stack
> allocated memory for storing the temporary and output registers, but
> this clearly seems to be a current limitation of lp_build_tgsi_soa.
>
> Here is the code I'm using to call lp_build_tgsi_soa:
>
> static void llvm_test(const struct pipe_shader_state *state)
> {
> int i, j;
>
> struct gallivm_state *gallivm = gallivm_create();
> struct lp_type type = lp_type_float_vec(32);
> struct tgsi_shader_info info;
>
> LLVMTypeRef float_type = LLVMFloatTypeInContext(gallivm->context);
> LLVMTypeRef vec_type = LLVMVectorType(float_type, 4);
> LLVMTypeRef args[PIPE_MAX_SHADER_INPUTS] = { };
> LLVMValueRef inputs[PIPE_MAX_SHADER_INPUTS][4] = { };
> LLVMValueRef outputs[PIPE_MAX_SHADER_OUTPUTS][4] = { };
> LLVMValueRef consts_ptr;
> LLVMTypeRef func_type;
> LLVMValueRef function;
> LLVMBasicBlockRef block;
> LLVMBuilderRef builder;
>
> tgsi_scan_shader(state->tokens, &info);
>
> for (i=0; i<info.file_count[TGSI_FILE_INPUT]; ++i)
> args[i] = vec_type;
>
> func_type = LLVMFunctionType(LLVMVoidTypeInContext(gallivm->context),
> (LLVMTypeRef*)args, info.file_count[TGSI_FILE_INPUT], 0);
> function = LLVMAddFunction(gallivm->module, "test_shader", func_type);
> LLVMSetFunctionCallConv(function, LLVMCCallConv);
>
> for (i=0; i<info.file_count[TGSI_FILE_INPUT]; ++i)
> for (j=0; j<4; ++j)
> inputs[i][j] = LLVMGetParam(function, i);
>
> consts_ptr = LLVMAddGlobal(gallivm->module, float_type, "consts_ptr");
>
> block = LLVMAppendBasicBlockInContext(gallivm->context, function, "entry");
> builder = gallivm->builder;
> assert(builder);
> LLVMPositionBuilderAtEnd(builder, block);
>
> lp_build_tgsi_soa(gallivm, state->tokens, type, NULL,
> consts_ptr, NULL,
> NULL, inputs,
> outputs, NULL, &info);
>
> LLVMDeleteFunction(function);
> LLVMDeleteGlobal(consts_ptr);
> gallivm_destroy(gallivm);
> }
>
> Could somebody please explain this behaviour, is it a bug in my test
> code, does it just works as designed?
>
> Thanks for your time,
> Christian.
IIRC it due to the soa (structure of array) approach each output
component is an array and it deals with 2x2 block. That being said i
don't think llvm is suited to GPU. Instruction scheduling and register
allocation are way too much different than from the other target
architecture of LLVM.
Also i am not sure how good would be llvm SSA in front of masked
vector use (when only a subset of the vector component are updated)
last time i looked at this llvm would simply miss lot of optimization
opportunities.
I have been working on reusing the nvc0/nv50 shader compiler, i still
don't know how good the end result might be as scheduling+gpr
allocation from scalar form is not a simple problem. I have got couple
different algorithm in mind and i have been working on implementing
them i hope to have something working early next week.
Cheers,
Jerome Glisse
More information about the mesa-dev
mailing list