[Mesa-dev] A question about TGSI->LLVM IR conversion

Sun Feb 13 14:33:00 PST 2011

You're seeing the expected result.  gallivm current implementation is geared towards TGSI -> CPU SIMD processing for both draw and llvmpipe modules.  SSE has 128 bits, therefore, in SOA mode, it can process four instances of a TGSI shader. This is defined in src/gallium/auxiliary/gallivm/lp_bld_type.h 's LP_NATIVE_VECTOR_WIDTH define.

Part of the work to make gallivm more useful to generic GPUs would be to replace hardcoded LP_NATIVE_VECTOR_WIDTH define with a runtime determinable parameter. 

Alternative, you could also try to play with lp_build_tgsi_aos, as in AOS mode, we can run one instance of a TGSI shader at a time on a 128bit SIMD ISA.

About SSA and memory, this is the recommended way to generated LLVM IR. It's very hard and pointless to generate Phi variables manually. I tried that initially but then Zack brought me back to sanity. Please read
http://llvm.org/docs/tutorial/OCamlLangImpl7.html#memory for more details.  In your example, you'll need to run the mem2reg pass.

Another detail: the current gallivm TGSI -> LLVM code does translation in a single pass, because it is based from the tgsi_sse2.c TGSI -> SSE2 code. But in retrospect it was a mistake not using Zack's original gallivm design of doing translation in multiple stages; i.e., translate each TGSI instruction to custum LLVM instrinsincs (e.g., "gallivm.tgsi.exp2"), do a bunch of high level optimizations, and only at the final stage lower everything to elementary LLVM operations. You should take a look at original Zack's gallivm code too, I don't have a commit hash handy but it is in git.

Jose

________________________________________
From: Christian König [deathsimple at vodafone.de]
Sent: Sunday, February 13, 2011 19:45
To: Jose Fonseca; Brian Paul; Keith Whitwell
Cc: mesa-dev at lists.freedesktop.org
Subject: A question about TGSI->LLVM IR conversion

Hi,

I'm currently playing around a bit with the TGSI->LLVM IR conversion
found in the llvmpipe/gallivm driver. While doing so I managed to hock
into the r600g shader generation a call to lp_build_tgsi_soa and dumping
the resulting LLVM IR additionally to the TGSI and R600 assembler
representation. The goal is to compare TGSI with R600 assembler and LLVM
IR for possible new ideas how to make better optimization or even
simplifying code generation.

The problem is that the LLVM IR doesn't looks like what I have expected:

TGSI input:
FRAG
DCL IN[0], COLOR, PERSPECTIVE
DCL OUT[0], COLOR
  0: MOV OUT[0], IN[0]
  1: END

generated LLVM IR:
define void @test_shader(<4 x float>) {
entry:
  %output3 = alloca <4 x float>
  %output2 = alloca <4 x float>
  %output1 = alloca <4 x float>
  %output = alloca <4 x float>
  store <4 x float> zeroinitializer, <4 x float>* %output
  store <4 x float> zeroinitializer, <4 x float>* %output1
  store <4 x float> zeroinitializer, <4 x float>* %output2
  store <4 x float> zeroinitializer, <4 x float>* %output3
  store <4 x float> %0, <4 x float>* %output
  store <4 x float> %0, <4 x float>* %output1
  store <4 x float> %0, <4 x float>* %output2
  store <4 x float> %0, <4 x float>* %output3
}

It looks like that a vector type is used for each TGSI channel instead
of a scalar. So is there something wrong with my setup of gallivm, call
to lp_build_tgsi_soa or any other prerequisite not meet? Or is
lp_build_tgsi_soa just supposed to work like this because llvmpipe
always work with a 2x2 pixel block?

I also expected a pure SSA form, but the generated code uses stack
allocated memory for storing the temporary and output registers, but
this clearly seems to be a current limitation of lp_build_tgsi_soa.

Here is the code I'm using to call lp_build_tgsi_soa:

static void llvm_test(const struct pipe_shader_state *state)
{
        int i, j;

        struct gallivm_state *gallivm = gallivm_create();
        struct lp_type type = lp_type_float_vec(32);
        struct tgsi_shader_info info;

        LLVMTypeRef float_type = LLVMFloatTypeInContext(gallivm->context);
        LLVMTypeRef vec_type = LLVMVectorType(float_type, 4);
        LLVMTypeRef args[PIPE_MAX_SHADER_INPUTS] = { };
        LLVMValueRef inputs[PIPE_MAX_SHADER_INPUTS][4] = { };
        LLVMValueRef outputs[PIPE_MAX_SHADER_OUTPUTS][4] = { };
        LLVMValueRef consts_ptr;
        LLVMTypeRef func_type;
        LLVMValueRef function;
        LLVMBasicBlockRef block;
        LLVMBuilderRef builder;

        tgsi_scan_shader(state->tokens, &info);

        for (i=0; i<info.file_count[TGSI_FILE_INPUT]; ++i)
                args[i] = vec_type;

        func_type = LLVMFunctionType(LLVMVoidTypeInContext(gallivm->context),
                                        (LLVMTypeRef*)args, info.file_count[TGSI_FILE_INPUT], 0);
        function = LLVMAddFunction(gallivm->module, "test_shader", func_type);
        LLVMSetFunctionCallConv(function, LLVMCCallConv);

        for (i=0; i<info.file_count[TGSI_FILE_INPUT]; ++i)
                for (j=0; j<4; ++j)
                        inputs[i][j] = LLVMGetParam(function, i);

        consts_ptr = LLVMAddGlobal(gallivm->module, float_type, "consts_ptr");

        block = LLVMAppendBasicBlockInContext(gallivm->context, function, "entry");
        builder = gallivm->builder;
        assert(builder);
        LLVMPositionBuilderAtEnd(builder, block);

        lp_build_tgsi_soa(gallivm, state->tokens, type, NULL,
                          consts_ptr, NULL,
                          NULL, inputs,
                          outputs, NULL, &info);

        LLVMDeleteFunction(function);
        LLVMDeleteGlobal(consts_ptr);
        gallivm_destroy(gallivm);
}

Could somebody please explain this behaviour, is it a bug in my test
code, does it just works as designed?

Thanks for your time,
Christian.