[Mesa-dev] [PATCH] gallivm: use llvm function calls for texturing instead of inlining
Jose Fonseca
jfonseca at vmware.com
Thu Mar 26 12:31:33 PDT 2015
On 25/03/15 19:03, sroland at vmware.com wrote:
> From: Roland Scheidegger <sroland at vmware.com>
>
> There are issues with inlining everything, most notably llvm will use much
> more memory (and be slower) when compiling. Ideally we'd probably use
> functions for shader functions too but texture sampling usually is responsible
> for quite some IR (it can easily reach 80% of total IR instructions) so this
> seems like a good start.
> This still generates a different function for all different combinations just
> like before, however it is possible llvm is missing some optimization
> opportunities - it is believed though such opportunities should be somewhat
> rare, but at least for now it can still be switched off (at compile time only).
> It should probably make compiled code also smaller because the same function
> should be used for different variants in the same module (so for the
> opaque/partial or linear/elts variants).
> No piglit change (though it does indeed speed up unrealistic tests like
> fp-indirections2 by a factor of 30 or so).
> Has a small negative performance impact in openarena - I suspect this could
> be fixed by running some IPO passes (despite the private linkage, llvm right
> now does NO optimization at all wrt anything going past the call, even if
> there's just one caller - so things like values stored before the call and then
> always written by the function etc. will not be optimized away, nor will dead
> arguments (which we mostly shouldn't have) be eliminated, always constant
> arguments promoted etc.).
>
> v2: use proper return values instead of pointer function arguments.
> llvm supports aggregate return values, which do wonders here eliminating
> unnecessary stack variables - everything in fact will be returned in registers
> even without any IPO optimizations. It makes the code simpler too.
> With this I could not measure a peformance impact in openarena any longer
> (though since there's still no constant value propagation etc. into the tex
> functions this does not mean it couldn't have a negative impact elsewhere).
> ---
> src/gallium/auxiliary/gallivm/lp_bld_init.c | 23 ++
> src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 418 +++++++++++++++++++++-
> 2 files changed, 423 insertions(+), 18 deletions(-)
>
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gallium/auxiliary/gallivm/lp_bld_init.c
> index 6133883..ee23ea0 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
> @@ -113,6 +113,10 @@ create_pass_manager(struct gallivm_state *gallivm)
> gallivm->passmgr = LLVMCreateFunctionPassManagerForModule(gallivm->module);
> if (!gallivm->passmgr)
> return FALSE;
> + /*
> + * FIXME: probably would need a per module pass manager (with some IPO
> + * passes) to optimize the quite bad looking texture function calls.
> + */
I think we can tone this down a bit after v2. That is, FIXME->TODO, "to
optimize the quite bad..." -> "to attempt to inline small texture
function calls."
>
> // Old versions of LLVM get the DataLayout from the pass manager.
> LLVMAddTargetData(gallivm->target, gallivm->passmgr);
> @@ -575,6 +579,25 @@ gallivm_jit_function(struct gallivm_state *gallivm,
> jit_func = pointer_to_func(code);
>
> if (gallivm_debug & GALLIVM_DEBUG_ASM) {
> + /*
> + * XXX hack: we can only disassemble functions after compiling the
> + * module, however we've got no idea what texture functions we generated.
> + * Hence, get all functions in the module and print all matching some
> + * pattern. (Because this is triggered per function and not per module,
> + * this will of course print the texture functions each time
> + * gallivm_jit_function is invoked, not just once per module.)
> + */
Instead of being a hack, we should consider moving this code to when
compiling, to gallivm_compile_module.
In short, at the end of gallivm_compile_module() we should call
LLVMGetPointerToGlobal/lp_disassemble for every single function in there
as appropriate (regardless of their name).
> + LLVMValueRef llvm_func = LLVMGetFirstFunction(gallivm->module);
> +
> + while (llvm_func) {
> + if (!util_strncmp("texfunc", LLVMGetValueName(llvm_func), 7)) {
> + void *texfunc_code = LLVMGetPointerToGlobal(gallivm->engine, llvm_func);
> + lp_disassemble(llvm_func, texfunc_code);
> + }
> + llvm_func = LLVMGetNextFunction(llvm_func);
> + }
> + }
> + if (gallivm_debug & GALLIVM_DEBUG_ASM) {
> lp_disassemble(func, code);
> }
>
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> index a90278e..c91ae59 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> @@ -2357,30 +2357,30 @@ lp_build_sample_nop(struct gallivm_state *gallivm,
>
>
> /**
> - * Build texture sampling code.
> + * Build the actual texture sampling code.
> * 'texel' will return a vector of four LLVMValueRefs corresponding to
> * R, G, B, A.
> * \param type vector float type to use for coords, etc.
> * \param is_fetch if this is a texel fetch instruction.
> * \param derivs partial derivatives of (s,t,r,q) with respect to x and y
> */
> -void
> -lp_build_sample_soa(struct gallivm_state *gallivm,
> - const struct lp_static_texture_state *static_texture_state,
> - const struct lp_static_sampler_state *static_sampler_state,
> - struct lp_sampler_dynamic_state *dynamic_state,
> - struct lp_type type,
> - boolean is_fetch,
> - unsigned texture_index,
> - unsigned sampler_index,
> - LLVMValueRef context_ptr,
> - const LLVMValueRef *coords,
> - const LLVMValueRef *offsets,
> - const struct lp_derivatives *derivs, /* optional */
> - LLVMValueRef lod_bias, /* optional */
> - LLVMValueRef explicit_lod, /* optional */
> - enum lp_sampler_lod_property lod_property,
> - LLVMValueRef texel_out[4])
> +static void
> +lp_build_sample_soa_code(struct gallivm_state *gallivm,
> + const struct lp_static_texture_state *static_texture_state,
> + const struct lp_static_sampler_state *static_sampler_state,
> + struct lp_sampler_dynamic_state *dynamic_state,
> + struct lp_type type,
> + boolean is_fetch,
> + unsigned texture_index,
> + unsigned sampler_index,
> + LLVMValueRef context_ptr,
> + const LLVMValueRef *coords,
> + const LLVMValueRef *offsets,
> + const struct lp_derivatives *derivs, /* optional */
> + LLVMValueRef lod_bias, /* optional */
> + LLVMValueRef explicit_lod, /* optional */
> + enum lp_sampler_lod_property lod_property,
> + LLVMValueRef texel_out[4])
> {
> unsigned target = static_texture_state->target;
> unsigned dims = texture_dims(target);
> @@ -2891,6 +2891,388 @@ lp_build_sample_soa(struct gallivm_state *gallivm,
> }
>
>
> +#define USE_TEX_FUNC_CALL 1
> +
> +#define LP_MAX_TEX_FUNC_ARGS 32
> +
> +#define LP_SAMPLER_FUNC_LOD_BIAS (1 << 0)
> +#define LP_SAMPLER_FUNC_LOD_EXPLICIT (1 << 1)
> +#define LP_SAMPLER_FUNC_EXPLICITDERIVS (1 << 2)
> +#define LP_SAMPLER_FUNC_SHADOW (1 << 3)
> +#define LP_SAMPLER_FUNC_OFFSETS (1 << 4)
> +#define LP_SAMPLER_FUNC_LOD_PROPERTY_SHIFT 5
> +
> +
> +static inline void
> +get_target_info(enum pipe_texture_target target,
> + unsigned *num_coords, unsigned *num_derivs,
> + unsigned *num_offsets, unsigned *layer)
> +{
> + unsigned dims = texture_dims(target);
> + *num_coords = dims;
> + *num_offsets = dims;
> + *num_derivs = (target == PIPE_TEXTURE_CUBE ||
> + target == PIPE_TEXTURE_CUBE_ARRAY) ? 3 : dims;
> + *layer = has_layer_coord(target) ? 2: 0;
> + if (target == PIPE_TEXTURE_CUBE_ARRAY) {
> + /*
> + * dims doesn't include r coord for cubes - this is handled
> + * by layer instead, but need to fix up for cube arrays...
> + */
> + *layer = 3;
> + *num_coords = 3;
> + }
> +}
> +
> +
> +/**
> + * Generate the function body for a texture sampling function.
> + */
> +static void
> +lp_build_sample_gen_func(struct gallivm_state *gallivm,
> + const struct lp_static_texture_state *static_texture_state,
> + const struct lp_static_sampler_state *static_sampler_state,
> + struct lp_sampler_dynamic_state *dynamic_state,
> + struct lp_type type,
> + boolean is_fetch,
> + unsigned texture_index,
> + unsigned sampler_index,
> + LLVMValueRef function,
> + unsigned num_args,
> + unsigned sampler_bits,
> + enum lp_sampler_lod_property lod_property)
> +{
> +
> + LLVMBuilderRef old_builder;
> + LLVMBasicBlockRef block;
> + LLVMValueRef coords[5];
> + LLVMValueRef offsets[3] = { NULL };
> + LLVMValueRef lod_bias = NULL;
> + LLVMValueRef explicit_lod = NULL;
> + LLVMValueRef context_ptr;
> + LLVMValueRef texel_out[4];
> + struct lp_derivatives derivs;
> + struct lp_derivatives *deriv_ptr = NULL;
> + unsigned num_param = 0;
> + unsigned i, num_coords, num_derivs, num_offsets, layer;
> +
> + get_target_info(static_texture_state->target,
> + &num_coords, &num_derivs, &num_offsets, &layer);
> +
> + /* "unpack" arguments */
> + context_ptr = LLVMGetParam(function, num_param++);
> + for (i = 0; i < num_coords; i++) {
> + coords[i] = LLVMGetParam(function, num_param++);
> + }
> + for (i = num_coords; i < 5; i++) {
> + /* This is rather unfortunate... */
> + coords[i] = lp_build_undef(gallivm, type);
> + }
> + if (layer) {
> + coords[layer] = LLVMGetParam(function, num_param++);
> + }
> + if (sampler_bits & LP_SAMPLER_FUNC_SHADOW) {
> + coords[4] = LLVMGetParam(function, num_param++);
> + }
> + if (sampler_bits & LP_SAMPLER_FUNC_OFFSETS) {
> + for (i = 0; i < num_offsets; i++) {
> + offsets[i] = LLVMGetParam(function, num_param++);
> + }
> + }
> + if (sampler_bits & LP_SAMPLER_FUNC_LOD_BIAS) {
> + lod_bias = LLVMGetParam(function, num_param++);
> + }
> + else if (sampler_bits & LP_SAMPLER_FUNC_LOD_EXPLICIT) {
> + explicit_lod = LLVMGetParam(function, num_param++);
> + }
> + else if (sampler_bits & LP_SAMPLER_FUNC_EXPLICITDERIVS) {
> + for (i = 0; i < num_derivs; i++) {
> + derivs.ddx[i] = LLVMGetParam(function, num_param++);
> + derivs.ddy[i] = LLVMGetParam(function, num_param++);
> + }
> + deriv_ptr = &derivs;
> + }
> +
> + assert(num_args == num_param);
> +
> + /*
> + * Function body
> + */
> +
> + old_builder = gallivm->builder;
> + block = LLVMAppendBasicBlockInContext(gallivm->context, function, "entry");
> + gallivm->builder = LLVMCreateBuilderInContext(gallivm->context);
> + LLVMPositionBuilderAtEnd(gallivm->builder, block);
> +
> + lp_build_sample_soa_code(gallivm,
> + static_texture_state,
> + static_sampler_state,
> + dynamic_state,
> + type,
> + is_fetch,
> + texture_index,
> + sampler_index,
> + context_ptr,
> + coords,
> + offsets,
> + deriv_ptr,
> + lod_bias,
> + explicit_lod,
> + lod_property,
> + texel_out);
> +
> + LLVMBuildAggregateRet(gallivm->builder, texel_out, 4);
> +
> + LLVMDisposeBuilder(gallivm->builder);
> + gallivm->builder = old_builder;
> +
> + gallivm_verify_function(gallivm, function);
> +}
> +
> +
> +/**
> + * Call the matching function for texture sampling.
> + * If there's no match, generate a new one.
> + */
> +static void
> +lp_build_sample_soa_func(struct gallivm_state *gallivm,
> + const struct lp_static_texture_state *static_texture_state,
> + const struct lp_static_sampler_state *static_sampler_state,
> + struct lp_sampler_dynamic_state *dynamic_state,
> + struct lp_type type,
> + boolean is_fetch,
> + unsigned texture_index,
> + unsigned sampler_index,
> + LLVMValueRef context_ptr,
> + const LLVMValueRef *coords,
> + const LLVMValueRef *offsets,
> + const struct lp_derivatives *derivs, /* optional */
> + LLVMValueRef lod_bias, /* optional */
> + LLVMValueRef explicit_lod, /* optional */
> + enum lp_sampler_lod_property lod_property,
> + LLVMValueRef texel_out[4])
> +{
> + LLVMBuilderRef builder = gallivm->builder;
> + LLVMModuleRef module = LLVMGetGlobalParent(LLVMGetBasicBlockParent(
> + LLVMGetInsertBlock(builder)));
> + LLVMValueRef function, inst;
> + LLVMValueRef args[LP_MAX_TEX_FUNC_ARGS];
> + LLVMBasicBlockRef bb;
> + LLVMValueRef tex_ret;
> + unsigned num_args = 0;
> + unsigned sampler_bits = 0;
> + char func_name[64];
> + unsigned i, num_coords, num_derivs, num_offsets, layer;
> +
> + get_target_info(static_texture_state->target,
> + &num_coords, &num_derivs, &num_offsets, &layer);
> +
> + /*
> + * texture function matches are found by name.
> + * Thus the name has to include both the texture and sampler unit
> + * (which covers all static state) plus the actual texture functions
> + * (which is determined here somewhat awkwardly by presence of the
> + * corresponding LLVMValueRefs). Additionally, lod_property also
> + * has to be included (it could change if the lod for instance comes
> + * from a shader uniform or a temp reg).
> + */
> + if (static_sampler_state->compare_mode != PIPE_TEX_COMPARE_NONE) {
> + sampler_bits |= LP_SAMPLER_FUNC_SHADOW;
> + }
> + if (offsets[0]) {
> + sampler_bits |= LP_SAMPLER_FUNC_OFFSETS;
> + }
> + if (lod_bias) {
> + sampler_bits |= LP_SAMPLER_FUNC_LOD_BIAS;
> + }
> + else if (explicit_lod) {
> + sampler_bits |= LP_SAMPLER_FUNC_LOD_EXPLICIT;
> + }
> + else if (derivs) {
> + sampler_bits |= LP_SAMPLER_FUNC_EXPLICITDERIVS;
> + }
> + sampler_bits |= lod_property << LP_SAMPLER_FUNC_LOD_PROPERTY_SHIFT;
> +
> + util_snprintf(func_name, sizeof(func_name), "texfunc_res_%d_sam_%d_%x",
> + texture_index, sampler_index, sampler_bits);
> +
> + function = LLVMGetNamedFunction(module, func_name);
> +
> + if(!function) {
> + LLVMTypeRef arg_types[LP_MAX_TEX_FUNC_ARGS];
> + LLVMTypeRef ret_type;
> + LLVMTypeRef function_type;
> + LLVMTypeRef val_type[4];
> + unsigned num_param = 0;
> +
> + /*
> + * Generate the function prototype.
> + */
> +
> + arg_types[num_param++] = LLVMTypeOf(context_ptr);
> + for (i = 0; i < num_coords; i++) {
> + arg_types[num_param++] = LLVMTypeOf(coords[0]);
> + assert(LLVMTypeOf(coords[0]) == LLVMTypeOf(coords[i]));
> + }
> + if (layer) {
> + arg_types[num_param++] = LLVMTypeOf(coords[layer]);
> + assert(LLVMTypeOf(coords[0]) == LLVMTypeOf(coords[layer]));
> + }
> + if (sampler_bits & LP_SAMPLER_FUNC_SHADOW) {
> + arg_types[num_param++] = LLVMTypeOf(coords[0]);
> + }
> + if (sampler_bits & LP_SAMPLER_FUNC_OFFSETS) {
> + for (i = 0; i < num_offsets; i++) {
> + arg_types[num_param++] = LLVMTypeOf(offsets[0]);
> + assert(LLVMTypeOf(offsets[0]) == LLVMTypeOf(offsets[i]));
> + }
> + }
> + if (sampler_bits & LP_SAMPLER_FUNC_LOD_BIAS) {
> + arg_types[num_param++] = LLVMTypeOf(lod_bias);
> + }
> + else if (sampler_bits & LP_SAMPLER_FUNC_LOD_EXPLICIT) {
> + arg_types[num_param++] = LLVMTypeOf(explicit_lod);
> + }
> + else if (sampler_bits & LP_SAMPLER_FUNC_EXPLICITDERIVS) {
> + for (i = 0; i < num_derivs; i++) {
> + arg_types[num_param++] = LLVMTypeOf(derivs->ddx[i]);
> + arg_types[num_param++] = LLVMTypeOf(derivs->ddy[i]);
> + assert(LLVMTypeOf(derivs->ddx[0]) == LLVMTypeOf(derivs->ddx[i]));
> + assert(LLVMTypeOf(derivs->ddy[0]) == LLVMTypeOf(derivs->ddy[i]));
> + }
> + }
> +
> + ret_type = LLVMVoidTypeInContext(gallivm->context);
> + val_type[0] = val_type[1] = val_type[2] = val_type[3] =
> + lp_build_vec_type(gallivm, type);
> + ret_type = LLVMStructTypeInContext(gallivm->context, val_type, 4, 0);
> + function_type = LLVMFunctionType(ret_type, arg_types, num_param, 0);
> + function = LLVMAddFunction(module, func_name, function_type);
> +
> + for (i = 0; i < num_param; ++i) {
> + if(LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind) {
> + LLVMAddAttribute(LLVMGetParam(function, i), LLVMNoAliasAttribute);
> + }
> + }
> +
> + LLVMSetFunctionCallConv(function, LLVMFastCallConv);
> + LLVMSetLinkage(function, LLVMPrivateLinkage);
> +
> + lp_build_sample_gen_func(gallivm,
> + static_texture_state,
> + static_sampler_state,
> + dynamic_state,
> + type,
> + is_fetch,
> + texture_index,
> + sampler_index,
> + function,
> + num_param,
> + sampler_bits,
> + lod_property);
> + }
> +
> + num_args = 0;
> + args[num_args++] = context_ptr;
> + for (i = 0; i < num_coords; i++) {
> + args[num_args++] = coords[i];
> + }
> + if (layer) {
> + args[num_args++] = coords[layer];
> + }
> + if (sampler_bits & LP_SAMPLER_FUNC_SHADOW) {
> + args[num_args++] = coords[4];
> + }
> + if (sampler_bits & LP_SAMPLER_FUNC_OFFSETS) {
> + for (i = 0; i < num_offsets; i++) {
> + args[num_args++] = offsets[i];
> + }
> + }
> + if (sampler_bits & LP_SAMPLER_FUNC_LOD_BIAS) {
> + args[num_args++] = lod_bias;
> + }
> + else if (sampler_bits & LP_SAMPLER_FUNC_LOD_EXPLICIT) {
> + args[num_args++] = explicit_lod;
> + }
> + else if (sampler_bits & LP_SAMPLER_FUNC_EXPLICITDERIVS) {
> + for (i = 0; i < num_derivs; i++) {
> + args[num_args++] = derivs->ddx[i];
> + args[num_args++] = derivs->ddy[i];
> + }
> + }
assert(num_args <= LP_MAX_TEX_FUNC_ARGS);
> +
> + tex_ret = LLVMBuildCall(builder, function, args, num_args, "");
> + bb = LLVMGetInsertBlock(builder);
> + inst = LLVMGetLastInstruction(bb);
> + LLVMSetInstructionCallConv(inst, LLVMFastCallConv);
> +
> + for (i = 0; i < 4; i++) {
> + texel_out[i] = LLVMBuildExtractValue(gallivm->builder, tex_ret, i, "");
> + }
> +}
> +
> +
> +/**
> + * Build texture sampling code.
> + * Either via a function call or inline it directly.
> + */
> +void
> +lp_build_sample_soa(struct gallivm_state *gallivm,
> + const struct lp_static_texture_state *static_texture_state,
> + const struct lp_static_sampler_state *static_sampler_state,
> + struct lp_sampler_dynamic_state *dynamic_state,
> + struct lp_type type,
> + boolean is_fetch,
> + unsigned texture_index,
> + unsigned sampler_index,
> + LLVMValueRef context_ptr,
> + const LLVMValueRef *coords,
> + const LLVMValueRef *offsets,
> + const struct lp_derivatives *derivs, /* optional */
> + LLVMValueRef lod_bias, /* optional */
> + LLVMValueRef explicit_lod, /* optional */
> + enum lp_sampler_lod_property lod_property,
> + LLVMValueRef texel_out[4])
We should consider passing all these parameters in a single structure.
> +{
> + if (USE_TEX_FUNC_CALL) {
> + lp_build_sample_soa_func(gallivm,
> + static_texture_state,
> + static_sampler_state,
> + dynamic_state,
> + type,
> + is_fetch,
> + texture_index,
> + sampler_index,
> + context_ptr,
> + coords,
> + offsets,
> + derivs,
> + lod_bias,
> + explicit_lod,
> + lod_property,
> + texel_out);
> + }
> + else {
> + lp_build_sample_soa_code(gallivm,
> + static_texture_state,
> + static_sampler_state,
> + dynamic_state,
> + type,
> + is_fetch,
> + texture_index,
> + sampler_index,
> + context_ptr,
> + coords,
> + offsets,
> + derivs,
> + lod_bias,
> + explicit_lod,
> + lod_property,
> + texel_out);
> + }
> +}
> +
> +
> void
> lp_build_size_query_soa(struct gallivm_state *gallivm,
> const struct lp_static_texture_state *static_state,
>
Otherwise looks good.
Jose
More information about the mesa-dev
mailing list