[Mesa-dev] [PATCH] Bounding box avx2 intrinsic algorithm for openGL/GLES
Roland Scheidegger
sroland at vmware.com
Fri Aug 31 14:40:30 UTC 2018
Can't comment much on what the patch is doing, but the message and build
changes make several references to avx2 whereas I only really see avx code?
Roland
Am 31.08.2018 um 07:25 schrieb kedar.j.karanje at intel.com:
> From: "J Karanje, Kedar" <kedar.j.karanje at intel.com>
>
> The feature is enabled by default during make however we need to
> add the following to drirc to enable the feature at runtime.
> <option name="enable_bounding_box_culling" value="true"/>
>
> vbo: Main algorithm & code to check for MVP & vertex position location
> Build Files: Flags to enable BBOX Code and check AVX version
> compiler: Code to recognize simple shader
> (gl_position is a simple function of mvp and vertex)
> i965 & util: dri query to check if feature is enabled
>
> vbo: Implements a bounding box algorithm for mesa,we hook into the default
> drawelements and drawrangelements and the MVP & vertex positions location
> and the corresponding program is got,we re-create the frustum planes
> using this data and also create a box around the object and use the 8
> vertices (box vertices) and check if the box is within the frustum or not,
> we drop the draw calls that are completely outside the view frustum and
> go for sub-boxes for objects that are intersecting with the frustum planes.
>
> The current patch has been verified on KBL+Ubuntu 16.04, we noticed
> 8~10% improvements in GFxBench TREX offscreen and ~2% for Manhattan offscreen,
> Platforms where avx2 is not supported shall still see ~6-8% improvement, the
> other KPIs were not impacted.
>
> Based on empirical data we have set minimum vertex count as 999 and the
> sub-box size as 198, this provides the best results, we have also implemented
> some level of caching for the box co-od and frustum plane co-od.
> we have also optimized some algorithms to use avx2 when a target supports it.
>
> Shader classification code is currently in hir and we have got review comments
> to move the same to NIR.
>
> Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar at intel.com>
> Signed-off-by: Yogesh Marathe <yogesh.marathe at intel.com>
> ---
> Android.common.mk | 19 +
> configure.ac | 34 +-
> src/compiler/glsl/ast_to_hir.cpp | 168 +++-
> src/compiler/glsl/glsl_parser_extras.cpp | 10 +
> src/compiler/glsl/glsl_parser_extras.h | 7 +
> src/compiler/glsl/linker.cpp | 18 +
> src/intel/common/gen_debug.c | 7 +
> src/mesa/Makefile.sources | 11 +
> src/mesa/drivers/dri/i965/brw_context.c | 17 +
> src/mesa/drivers/dri/i965/intel_screen.c | 4 +
> src/mesa/main/bufferobj.c | 19 +
> src/mesa/main/mtypes.h | 51 +
> src/mesa/program/Android.mk | 1 +
> src/mesa/program/program.c | 3 +
> src/mesa/vbo/vbo_bbox.c | 1538 ++++++++++++++++++++++++++++++
> src/mesa/vbo/vbo_bbox.h | 383 ++++++++
> src/mesa/vbo/vbo_bbox_cache.c | 195 ++++
> src/mesa/vbo/vbo_context.c | 11 +-
> src/mesa/vbo/vbo_exec_array.c | 37 +-
> src/util/00-mesa-defaults.conf | 4 +
> src/util/xmlpool/t_options.h | 5 +
> 21 files changed, 2535 insertions(+), 7 deletions(-)
> mode change 100644 => 100755 src/compiler/glsl/ast_to_hir.cpp
> create mode 100644 src/mesa/vbo/vbo_bbox.c
> create mode 100644 src/mesa/vbo/vbo_bbox.h
> create mode 100644 src/mesa/vbo/vbo_bbox_cache.c
>
> diff --git a/Android.common.mk b/Android.common.mk
> index aa1b266..efd6792 100644
> --- a/Android.common.mk
> +++ b/Android.common.mk
> @@ -21,6 +21,8 @@
> # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> # DEALINGS IN THE SOFTWARE.
>
> +MESA_BBOX_ENABLE=true
> +
> ifeq ($(LOCAL_IS_HOST_MODULE),true)
> LOCAL_CFLAGS += -D_GNU_SOURCE
> endif
> @@ -80,6 +82,10 @@ LOCAL_CFLAGS += \
> -fno-trapping-math \
> -Wno-sign-compare
>
> +ifeq ($(MESA_BBOX_ENABLE),true)
> +LOCAL_CFLAGS += -DMESA_BBOX_OPT
> +endif
> +
> LOCAL_CPPFLAGS += \
> -D__STDC_CONSTANT_MACROS \
> -D__STDC_FORMAT_MACROS \
> @@ -87,6 +93,10 @@ LOCAL_CPPFLAGS += \
> -Wno-error=non-virtual-dtor \
> -Wno-non-virtual-dtor
>
> +ifeq ($(MESA_BBOX_ENABLE),true)
> +LOCAL_CPPFLAGS += -DMESA_BBOX_OPT
> +endif
> +
> # mesa requires at least c99 compiler
> LOCAL_CONLYFLAGS += \
> -std=c99
> @@ -98,6 +108,15 @@ ifeq ($(filter 5 6 7 8 9, $(MESA_ANDROID_MAJOR_VERSION)),)
> LOCAL_CFLAGS += -DHAVE_TIMESPEC_GET
> endif
>
> +ifeq ($(MESA_BBOX_ENABLE),true)
> +#if defined(CONFIG_AS_AVX)
> +LOCAL_CONLYFLAGS += -mavx
> +#elif
> +LOCAL_CONLYFLAGS += -msse4.1
> +#endif
> +endif
> +
> +
> ifeq ($(strip $(MESA_ENABLE_ASM)),true)
> ifeq ($(TARGET_ARCH),x86)
> LOCAL_CFLAGS += \
> diff --git a/configure.ac b/configure.ac
> index 4d9d9e5..dcdbcf3 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -278,7 +278,8 @@ _SAVE_LDFLAGS="$LDFLAGS"
> _SAVE_CPPFLAGS="$CPPFLAGS"
>
> dnl Compiler macros
> -DEFINES="-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS"
> +DEFINES="-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DMESA_BBOX_OPT"
> +dnl DEFINES="-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS"
> AC_SUBST([DEFINES])
> android=no
> case "$host_os" in
> @@ -295,10 +296,38 @@ esac
>
> AM_CONDITIONAL(HAVE_ANDROID, test "x$android" = xyes)
>
> +
> +dnl Conditional parameters for enabling BBOX file compilation in Makefile
> +dnl bbox=yes
> +dnl MESA_BBOX_ENABLE=true
> +dnl AM_CONDITIONAL([MESA_BBOX_ENABLE], [test x$bbox= xyes])
> +
> dnl
> dnl Check compiler flags
> dnl
> -AX_CHECK_COMPILE_FLAG([-Wall], [CFLAGS="$CFLAGS -Wall"])
> +
> +AC_ARG_WITH([swr-archs],
> + [AS_HELP_STRING([--with-swr-archs@<:@=DIRS...@:>@],
> + [comma delimited swr architectures list, e.g.
> + "avx,avx2,sse4.1,sse4.2" @<:@default="sse4.1,sse4.2"@:>@])],
> + [with_swr_archs="$withval"],
> + [with_swr_archs="sse4.1,sse4.2"])
> +
> +swr_archs=`IFS=', '; echo $with_swr_archs`
> +if test "$swr_archs" = "avx"; then
> +AX_CHECK_COMPILE_FLAG([-Wall], [CFLAGS="$CFLAGS -Wall -msse4.1 -mavx"])
> +AX_CHECK_COMPILE_FLAG([-Wall], [CXXFLAGS="$CXXFLAGS -Wall -msse4.1 -mavx"])
> +elif test "$swr_archs" = "avx2"; then
> +AX_CHECK_COMPILE_FLAG([-Wall], [CFLAGS="$CFLAGS -Wall -msse4.1 -mavx -mavx2"])
> +AX_CHECK_COMPILE_FLAG([-Wall], [CXXFLAGS="$CXXFLAGS -Wall -msse4.1 -mavx -mavx2"])
> +elif test "$swr_archs" = "sse4.1"; then
> +AX_CHECK_COMPILE_FLAG([-Wall], [CFLAGS="$CFLAGS -Wall -msse4.1"])
> +AX_CHECK_COMPILE_FLAG([-Wall], [CXXFLAGS="$CXXFLAGS -Wall -msse4.1"])
> +elif test "$swr_archs" = "sse4.2"; then
> +AX_CHECK_COMPILE_FLAG([-Wall], [CFLAGS="$CFLAGS -Wall -msse4.1 -msse4.2"])
> +AX_CHECK_COMPILE_FLAG([-Wall], [CXXFLAGS="$CXXFLAGS -Wall -msse4.1 -msse4.2"])
> +fi
> +
> AX_CHECK_COMPILE_FLAG([-Werror=implicit-function-declaration], [CFLAGS="$CFLAGS -Werror=implicit-function-declaration"])
> AX_CHECK_COMPILE_FLAG([-Werror=missing-prototypes], [CFLAGS="$CFLAGS -Werror=missing-prototypes"])
> AX_CHECK_COMPILE_FLAG([-Wmissing-prototypes], [CFLAGS="$CFLAGS -Wmissing-prototypes"])
> @@ -313,7 +342,6 @@ dnl
> dnl Check C++ compiler flags
> dnl
> AC_LANG_PUSH([C++])
> -AX_CHECK_COMPILE_FLAG([-Wall], [CXXFLAGS="$CXXFLAGS -Wall"])
> AX_CHECK_COMPILE_FLAG([-fno-math-errno], [CXXFLAGS="$CXXFLAGS -fno-math-errno"])
> AX_CHECK_COMPILE_FLAG([-fno-trapping-math], [CXXFLAGS="$CXXFLAGS -fno-trapping-math"])
> AX_CHECK_COMPILE_FLAG([-fvisibility=hidden], [VISIBILITY_CXXFLAGS="-fvisibility=hidden"])
> diff --git a/src/compiler/glsl/ast_to_hir.cpp b/src/compiler/glsl/ast_to_hir.cpp
> old mode 100644
> new mode 100755
> index 5d3f10b..f4e8dea
> --- a/src/compiler/glsl/ast_to_hir.cpp
> +++ b/src/compiler/glsl/ast_to_hir.cpp
> @@ -62,7 +62,6 @@
> #include "builtin_functions.h"
>
> using namespace ir_builder;
> -
> static void
> detect_conflicting_assignments(struct _mesa_glsl_parse_state *state,
> exec_list *instructions);
> @@ -1325,6 +1324,124 @@ ast_expression::set_is_lhs(bool new_value)
> this->subexpressions[0]->set_is_lhs(new_value);
> }
>
> +#ifdef MESA_BBOX_OPT
> +static bool
> +is_simple_shader(exec_list *instructions, ast_expression *simple_ast_root,
> + struct _mesa_glsl_parse_state *state)
> +{
> + ast_expression * subex0 = simple_ast_root->subexpressions[0];
> + ast_expression * subex1 = simple_ast_root->subexpressions[1];
> +
> + char temp_identifier[100];
> +
> + subex0->set_is_lhs(true);
> +
> + if (subex1->oper == ast_mul)
> + {
> + ir_rvalue *rhsParts[3];
> + rhsParts[0] = subex1->subexpressions[0]->hir(instructions, state);
> + rhsParts[1] = subex1->subexpressions[1]->hir(instructions, state);
> +
> + if (rhsParts[0]->type->gl_type == GL_FLOAT_MAT4 &&
> + rhsParts[1]->type->gl_type == GL_FLOAT_VEC4) {
> + foreach_list_typed (ast_node, ast, link,
> + &subex1->subexpressions[1]->expressions) {
> + if (((ast_expression *)ast)->oper != ast_identifier &&
> + (((ast_expression *)ast)->oper != ast_int_constant) &&
> + (((ast_expression *)ast)->oper != ast_float_constant)) {
> + return false;
> + }
> + if (((ast_expression *)ast)->oper == ast_identifier &&
> + ((ast_expression *)ast)->primary_expression.identifier) {
> + strncpy((char *)state->stateVertPosition,
> + ((ast_expression *)ast)->primary_expression.identifier,
> + strlen(((ast_expression *)ast)->primary_expression.identifier));
> + }
> + }
> + }
> + if (subex0->oper == ast_identifier) {
> + if (!strcmp(subex0->primary_expression.identifier,"gl_Position")) {
> + if(subex0 && subex0->primary_expression.identifier) {
> + strncpy((char *)temp_identifier,
> + subex0->primary_expression.identifier,
> + strlen(subex0->primary_expression.identifier));
> + } else {
> + return false;
> + }
> + if(subex1 && subex1->subexpressions[0] &&
> + subex1->subexpressions[0]->primary_expression.identifier) {
> + strncpy((char *)state->stateMVP,
> + subex1->subexpressions[0]->primary_expression.identifier,
> + strlen(subex1->subexpressions[0]->primary_expression.identifier));
> + } else {
> + return false;
> + }
> +
> + return true;
> + } else {
> + if(subex0 && subex0->primary_expression.identifier) {
> + strncpy((char *)temp_identifier,
> + subex0->primary_expression.identifier,
> + strlen(subex0->primary_expression.identifier));
> + } else {
> + return true;
> + }
> +
> + if (subex1->subexpressions[0]->oper == ast_identifier) {
> + if(subex1 && subex1->subexpressions[0] &&
> + subex1->subexpressions[0]->primary_expression.identifier) {
> + strncpy((char *)state->stateMVP,
> + subex1->subexpressions[0]->primary_expression.identifier,
> + strlen(subex1->subexpressions[0]->primary_expression.identifier));
> + } else {
> + return true;
> + }
> + }
> + return false; //Return false to trigger further parsing.
> + }
> + } else {
> + return false;
> + }
> + } else {
> + if (subex0->primary_expression.identifier != NULL)
> + if (!strcmp(subex0->primary_expression.identifier,"gl_Position") &&
> + (strlen(temp_identifier) > 0)) { //gl_position = temp;
> + if (subex1->oper == ast_identifier) {
> + if (!strcmp((char *)temp_identifier,
> + subex1->primary_expression.identifier)) {
> + return true;
> + }
> + } else {
> + return false;
> + }
> + }
> + }
> + return false;
> +}
> +/*
> + * Function to check if LHS of assign is an input variable Eg: in_position
> + */
> +
> +static bool
> +is_attribute(struct _mesa_glsl_parse_state *state)
> +{
> + ir_variable *const var = state->symbols->get_variable(state->stateVertPosition);
> + ir_rvalue *result = NULL;
> + void *ctx = state;
> + if (var != NULL)
> + {
> + result = new(ctx) ir_dereference_variable(var);
> + (void)result;
> + if (var->data.mode == ir_var_shader_in)
> + return true;
> +
> + }
> +
> + return false;
> +
> +}
> +#endif //MESA_BBOX_OPT
> +
> ir_rvalue *
> ast_expression::do_hir(exec_list *instructions,
> struct _mesa_glsl_parse_state *state,
> @@ -1400,6 +1517,55 @@ ast_expression::do_hir(exec_list *instructions,
> unreachable("ast_aggregate: Should never get here.");
>
> case ast_assign: {
> +#ifdef MESA_BBOX_OPT
> + if (state->stage == MESA_SHADER_VERTEX &&
> + !state->state_bbox_simple_shader &&
> + is_simple_shader(instructions,this,state) &&
> + !state->state_shader_analysis_complete)
> + {
> + state->state_bbox_simple_shader = true;
> + }
> +
> + if (state->state_bbox_simple_shader &&
> + !state->state_shader_analysis_complete) {
> + if (!is_attribute(state)) {
> + if ((ir_dereference_variable *)op[0] != NULL &&
> + ((ir_dereference_variable *)op[0])->var->name != NULL &&
> + strlen(state->stateVertPosition) != 0)
> + if (!strcmp(((ir_dereference_variable *)op[0])->var->name,
> + state->stateVertPosition) ) {
> + if (((ir_instruction *)op[0])->ir_type == ir_type_variable ||
> + ((ir_instruction *)op[0])->ir_type ==
> + ir_type_dereference_variable &&
> + (ir_dereference_variable *)op[1] != NULL &&
> + ((ir_dereference_variable *)op[1])->ir_type ==
> + ir_type_dereference_variable) {
> + if ((((ir_instruction *)op[1])->ir_type == ir_type_variable ||
> + ((ir_instruction *)op[1])->ir_type ==
> + ir_type_dereference_variable) &&
> + ((ir_dereference_variable *)op[1])->var->name) {
> + strncpy((char *)state->stateVertPosition,
> + (char *)((ir_dereference_variable *)op[1])->var->name,
> + strlen(((ir_dereference_variable *)op[1])->var->name));
> + state->state_shader_analysis_complete = true;
> + }
> + }
> + else {
> + state->state_bbox_simple_shader = false;
> + state->state_shader_analysis_complete = true;
> + }
> + }
> + else {
> + state->state_bbox_simple_shader = false;
> + state->state_shader_analysis_complete = true;
> +
> + }
> + }
> + else {
> + state->state_shader_analysis_complete = true;
> + }
> + }
> +#endif //MESA_BBOX_OPT
> this->subexpressions[0]->set_is_lhs(true);
> op[0] = this->subexpressions[0]->hir(instructions, state);
> op[1] = this->subexpressions[1]->hir(instructions, state);
> diff --git a/src/compiler/glsl/glsl_parser_extras.cpp b/src/compiler/glsl/glsl_parser_extras.cpp
> index 42ba88f..c540b3a 100644
> --- a/src/compiler/glsl/glsl_parser_extras.cpp
> +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> @@ -2121,6 +2121,16 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader,
> if (!state->error && !state->translation_unit.is_empty())
> _mesa_ast_to_hir(shader->ir, state);
>
> +#ifdef MESA_BBOX_OPT
> + shader->shader_bbox_simple_shader = state->state_bbox_simple_shader;
> + if (shader->shader_bbox_simple_shader)
> + {
> + strncpy((char *)shader->shaderMVP,state->stateMVP,
> + strlen(state->stateMVP));
> + strncpy((char *)shader->shaderVertPosition,state->stateVertPosition,
> + strlen(state->stateVertPosition));
> + }
> +#endif //MESA_BBOX_OPT
> if (!state->error) {
> validate_ir_tree(shader->ir);
>
> diff --git a/src/compiler/glsl/glsl_parser_extras.h b/src/compiler/glsl/glsl_parser_extras.h
> index da44d37..b951a66 100644
> --- a/src/compiler/glsl/glsl_parser_extras.h
> +++ b/src/compiler/glsl/glsl_parser_extras.h
> @@ -887,6 +887,13 @@ struct _mesa_glsl_parse_state {
> * so we can check totals aren't too large.
> */
> unsigned clip_dist_size, cull_dist_size;
> +
> +#ifdef MESA_BBOX_OPT
> + bool state_shader_analysis_complete;
> + bool state_bbox_simple_shader;
> + char stateMVP[20] = {'\0'};
> + char stateVertPosition[100] = {'\0'};
> +#endif
> };
>
> # define YYLLOC_DEFAULT(Current, Rhs, N) \
> diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
> index 3ce78fe..210f37f 100644
> --- a/src/compiler/glsl/linker.cpp
> +++ b/src/compiler/glsl/linker.cpp
> @@ -4876,6 +4876,24 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
>
> switch (stage) {
> case MESA_SHADER_VERTEX:
> +#ifdef MESA_BBOX_OPT
> + if (shader_list[MESA_SHADER_VERTEX][0] != NULL) {
> + sh->linked_bbox_simple_shader =
> + shader_list[MESA_SHADER_VERTEX][0]->shader_bbox_simple_shader;
> + /*TBD: How do we handle multiple Vertex Shaders
> + * being linked ??
> + * MVP Name copied to get MVP in VBO
> + */
> + strncpy((char *)sh->linkedshaderMVP,
> + shader_list[MESA_SHADER_VERTEX][0]->shaderMVP,
> + strlen(shader_list[MESA_SHADER_VERTEX][0]->shaderMVP));
> + /* Vertex Position attribute name */
> + strncpy((char *)sh->linkedshaderVertPosition,
> + shader_list[MESA_SHADER_VERTEX][0]->shaderVertPosition,
> + strlen(
> + shader_list[MESA_SHADER_VERTEX][0]->shaderVertPosition));
> + }
> +#endif
> validate_vertex_shader_executable(prog, sh, ctx);
> break;
> case MESA_SHADER_TESS_CTRL:
> diff --git a/src/intel/common/gen_debug.c b/src/intel/common/gen_debug.c
> index a978f2f..c677208 100644
> --- a/src/intel/common/gen_debug.c
> +++ b/src/intel/common/gen_debug.c
> @@ -106,6 +106,13 @@ intel_debug_flag_for_shader_stage(gl_shader_stage stage)
> static void
> brw_process_intel_debug_variable_once(void)
> {
> +#if defined(__ANDROID__) || defined(ANDROID)
> + setenv("MESA_GLSL_CACHE_DISABLE","true",1);
> + setenv("MESA_BBOX_MIN_VERTEX_CNT", "999", 1);
> + setenv("MESA_BBOX_OPT_ENABLE", "3", 1);
> + setenv("MESA_OPT_SPLIT_SIZE", "198", 1);
> +#endif
> +
> INTEL_DEBUG = parse_debug_string(getenv("INTEL_DEBUG"), debug_control);
> }
>
> diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
> index 0d3c277..8148622 100644
> --- a/src/mesa/Makefile.sources
> +++ b/src/mesa/Makefile.sources
> @@ -1,4 +1,5 @@
> ### Lists of source files, included by Makefiles
> +MESA_BBOX_ENABLE = false
>
> # this is part of MAIN_FILES
> MAIN_ES_FILES = \
> @@ -400,6 +401,7 @@ TNL_FILES = \
> tnl/t_vp_build.c \
> tnl/t_vp_build.h
>
> +
> VBO_FILES = \
> vbo/vbo_attrib.h \
> vbo/vbo_attrib_tmp.h \
> @@ -422,6 +424,15 @@ VBO_FILES = \
> vbo/vbo_save.h \
> vbo/vbo_save_loopback.c
>
> +#ifeq($(MESA_BBOX_ENABLE), true)
> +
> +VBO_FILES += \
> + vbo/vbo_bbox_cache.c \
> + vbo/vbo_bbox.c \
> + vbo/vbo_bbox.h
> +
> +#endif
> +
> STATETRACKER_FILES = \
> state_tracker/st_atifs_to_tgsi.c \
> state_tracker/st_atifs_to_tgsi.h \
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
> index 6ba64e4..03cf861 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -48,6 +48,10 @@
>
> #include "vbo/vbo.h"
>
> +#ifdef MESA_BBOX_OPT
> +#include "vbo/vbo_bbox.h"
> +#endif
> +
> #include "drivers/common/driverfuncs.h"
> #include "drivers/common/meta.h"
> #include "utils.h"
> @@ -890,6 +894,11 @@ brw_process_driconf_options(struct brw_context *brw)
> ctx->Const.AllowGLSLCrossStageInterpolationMismatch =
> driQueryOptionb(options, "allow_glsl_cross_stage_interpolation_mismatch");
>
> +#ifdef MESA_BBOX_OPT
> + ctx->Const.EnableBoundingBoxCulling =
> + driQueryOptionb(options, "enable_bounding_box_culling");
> +#endif //MESA_BBOX_OPT
> +
> ctx->Const.dri_config_options_sha1 = ralloc_array(brw, unsigned char, 20);
> driComputeOptionsSha1(&brw->screen->optionCache,
> ctx->Const.dri_config_options_sha1);
> @@ -1001,6 +1010,14 @@ brwCreateContext(gl_api api,
>
> brw_process_driconf_options(brw);
>
> +#ifdef MESA_BBOX_OPT
> + if (ctx->Const.EnableBoundingBoxCulling) {
> + MESA_BBOX("EnableBoundingBoxCulling True\n");
> + vbo_bbox_init(ctx);
> + } else
> + MESA_BBOX("EnableBoundingBoxCulling False\n");
> +#endif //MESA_BBOX_OPT
> +
> if (INTEL_DEBUG & DEBUG_PERF)
> brw->perf_debug = true;
>
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c
> index eaf5a3b..35c7624 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.c
> +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> @@ -88,6 +88,10 @@ DRI_CONF_BEGIN
> DRI_CONF_ALLOW_HIGHER_COMPAT_VERSION("false")
> DRI_CONF_FORCE_GLSL_ABS_SQRT("false")
>
> +#ifdef MESA_BBOX_OPT
> + DRI_CONF_ENABLE_BOUNDING_BOX_CULLING("true")
> +#endif
> +
> DRI_CONF_OPT_BEGIN_B(shader_precompile, "true")
> DRI_CONF_DESC(en, "Perform code generation at shader link time.")
> DRI_CONF_OPT_END
> diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c
> index 1d1e51b..67a369b 100644
> --- a/src/mesa/main/bufferobj.c
> +++ b/src/mesa/main/bufferobj.c
> @@ -48,6 +48,9 @@
> #include "varray.h"
> #include "util/u_atomic.h"
>
> +#ifdef MESA_BBOX_OPT
> +#include "vbo/vbo_bbox.h"
> +#endif
>
> /* Debug flags */
> /*#define VBO_DEBUG*/
> @@ -2254,6 +2257,10 @@ buffer_sub_data(GLenum target, GLuint buffer, GLintptr offset,
>
> if (no_error || validate_buffer_sub_data(ctx, bufObj, offset, size, func))
> _mesa_buffer_sub_data(ctx, bufObj, offset, size, data);
> +
> +#ifdef MESA_BBOX_OPT
> + vbo_bbox_element_buffer_update(ctx,bufObj,data,offset,size);
> +#endif
> }
>
>
> @@ -2589,9 +2596,17 @@ validate_and_unmap_buffer(struct gl_context *ctx,
> #endif
>
> #ifdef VBO_DEBUG
> +#ifdef MESA_BBOX_OPT
> + if (bufObj->StorageFlags & GL_MAP_WRITE_BIT) {
> +#else
> if (bufObj->AccessFlags & GL_MAP_WRITE_BIT) {
> +#endif
> GLuint i, unchanged = 0;
> +#ifdef MESA_BBOX_OPT
> + GLubyte *b = (GLubyte *) bufObj->Data;
> +#else
> GLubyte *b = (GLubyte *) bufObj->Pointer;
> +#endif
> GLint pos = -1;
> /* check which bytes changed */
> for (i = 0; i < bufObj->Size - 1; i++) {
> @@ -3154,7 +3169,11 @@ map_buffer_range(struct gl_context *ctx, struct gl_buffer_object *bufObj,
> /* Access must be write only */
> if ((access & GL_MAP_WRITE_BIT) && (!(access & ~GL_MAP_WRITE_BIT))) {
> GLuint i;
> +#ifdef MESA_BBOX_OPT
> + GLubyte *b = (GLubyte *) bufObj->Data;
> +#else
> GLubyte *b = (GLubyte *) bufObj->Pointer;
> +#endif
> for (i = 0; i < bufObj->Size; i++)
> b[i] = i & 0xff;
> }
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index 9fd577d..c262173 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -1359,6 +1359,9 @@ struct gl_buffer_object
> bool MinMaxCacheDirty;
>
> bool HandleAllocated; /**< GL_ARB_bindless_texture */
> +#ifdef MESA_BBOX_OPT
> + int data_change_counter; //TBD: Same as RefCount to check
> +#endif
> };
>
>
> @@ -2535,6 +2538,12 @@ struct gl_linked_shader
> * stores information that is also needed during linking.
> */
> struct gl_shader_spirv_data *spirv_data;
> +
> +#ifdef MESA_BBOX_OPT
> + bool linked_bbox_simple_shader;
> + char linkedshaderMVP[20];
> + char linkedshaderVertPosition[100];
> +#endif
> };
>
>
> @@ -2631,6 +2640,11 @@ struct gl_shader
>
> /* ARB_gl_spirv related data */
> struct gl_shader_spirv_data *spirv_data;
> +#ifdef MESA_BBOX_OPT
> + bool shader_bbox_simple_shader;
> + char shaderMVP[20];
> + char shaderVertPosition[100];
> +#endif
> };
>
>
> @@ -2902,6 +2916,9 @@ struct gl_shader_program_data
> * ARB_gl_spirv extension.
> */
> bool spirv;
> +#ifdef MESA_BBOX_OPT
> + GLuint vbo_bbox_mvp_location;
> +#endif
> };
>
> /**
> @@ -3108,6 +3125,10 @@ struct gl_pipeline_shader_state
>
> /** Pipeline objects */
> struct _mesa_HashTable *Objects;
> +#ifdef MESA_BBOX_OPT
> + /* Bounding box draw optimization control structure */
> + struct mesa_bbox_opt *BboxOpt;
> +#endif
> };
>
> /**
> @@ -4076,6 +4097,12 @@ struct gl_constants
>
> /** GL_ARB_gl_spirv */
> struct spirv_supported_capabilities SpirVCapabilities;
> +
> +#ifdef MESA_BBOX_OPT
> + /** MESA_BBOX_OPT Runtime enable_bounding_box_culling*/
> + bool EnableBoundingBoxCulling;
> +#endif
> +
> };
>
>
> @@ -4720,6 +4747,21 @@ struct gl_semaphore_object
> GLuint Name; /**< hash table ID/name */
> };
>
> +#ifdef MESA_BBOX_OPT
> +/**
> + * Bounding volume classification types
> + */
> +typedef enum
> +{
> + BOUNDING_VOLUME_AABB = 0,
> + BOUNDING_VOLUME_OBB = 1,
> + BOUNDING_VOLUME_SPHERE = 2,
> + BOUNDING_VOLUME_DOP = 3,
> + BOUNDING_VOULME_MIXED = 4,
> + BOUNDING_VOLUME_MAX = 5,
> +} bounding_volume_type;
> +#endif //MESA_BBOX_OPT
> +
> /**
> * Mesa rendering context.
> *
> @@ -5096,6 +5138,15 @@ struct gl_context
> struct hash_table_u64 *ResidentTextureHandles;
> struct hash_table_u64 *ResidentImageHandles;
> /*@}*/
> +
> +#ifdef MESA_BBOX_OPT
> + /**
> + * Bounding volume type
> + *
> + */
> + bounding_volume_type volume_type;
> +#endif //MESA_BBOX_OPT
> +
> };
>
> /**
> diff --git a/src/mesa/program/Android.mk b/src/mesa/program/Android.mk
> index c6470e6..6489d3e 100644
> --- a/src/mesa/program/Android.mk
> +++ b/src/mesa/program/Android.mk
> @@ -75,6 +75,7 @@ $(intermediates)/program/lex.yy.c: $(LOCAL_PATH)/program_lexer.l
> LOCAL_C_INCLUDES := \
> $(MESA_TOP)/src/mapi \
> $(MESA_TOP)/src/mesa \
> + $(MESA_TOP)/src/mesa/vbo \
> $(MESA_TOP)/src/compiler/nir \
> $(MESA_TOP)/src/gallium/auxiliary \
> $(MESA_TOP)/src/gallium/include
> diff --git a/src/mesa/program/program.c b/src/mesa/program/program.c
> index 6ab1bf5..e0b3563 100644
> --- a/src/mesa/program/program.c
> +++ b/src/mesa/program/program.c
> @@ -43,6 +43,9 @@
> #include "util/ralloc.h"
> #include "util/u_atomic.h"
>
> +#ifdef MESA_BBOX_OPT
> +#include "vbo/vbo_bbox.h"
> +#endif
>
> /**
> * A pointer to this dummy program is put into the hash table when
> diff --git a/src/mesa/vbo/vbo_bbox.c b/src/mesa/vbo/vbo_bbox.c
> new file mode 100644
> index 0000000..f1e153d
> --- /dev/null
> +++ b/src/mesa/vbo/vbo_bbox.c
> @@ -0,0 +1,1538 @@
> +/*
> + * Copyright � 2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +/**
> + * \brief VBO BBOX module implementation
> + * \author Kedar Karanje
> + */
> +
> +#pragma GCC optimize (0)
> +#include "vbo_bbox.h"
> +#include <fcntl.h>
> +#define BBOX_MAX_FRAMES_TO_DYNAMICALLY_UPDATE (5)
> +#define BBOX_MIN_EFFECTIVE_DRAWS_TO_DYNAMICALLY_UPDATE (5)
> +#define BBOX_MIN_VAL_FOR_EFFECTIVE_DRAWS_COUNTER (-100)
> +#define BBOX_MAX_VAL_FOR_EFFECTIVE_DRAWS_COUNTER (100)
> +#define BBOX_MIN_GPU_HEAD_ROOM_TO_PROCESS_ELEMENT_BUFFER (100)
> +
> +/**
> + * Min part of split draw that we want to drop
> + */
> +#define BBOX_MIN_SPLITTED_DRAW_TO_DROP (1)
> +
> +#ifdef __AVX__
> +static __m256 fullMin;
> +static __m256 fullMax;
> +#else
> +static __m128 fullMin;
> +static __m128 fullMax;
> +#endif
> +
> +/* Segment functions */
> +
> +static inline GLboolean
> +intersect(const struct gl_segment *s_src,const struct gl_segment *s_tar)
> +{
> + return (s_src->Left < s_tar->Right && s_src->Right > s_tar->Left);
> +}
> +
> +
> +static inline GLboolean
> +subsegment(const struct gl_segment *s_src,const struct gl_segment *s_tar)
> +{
> + return (s_src->Left <= s_tar->Left && s_src->Right >= s_tar->Right);
> +}
> +
> +
> +static inline GLboolean
> +superset(const struct gl_segment *s_src,const struct gl_segment *s_tar)
> +{
> + return subsegment(s_tar,s_src);
> +}
> +
> +
> +static void
> +normalize(vbo_bbox_frustum_plane *fr_plane)
> +{
> + GLfloat a,b,c;
> + a = fr_plane->a;
> + b = fr_plane->b;
> + c = fr_plane->c;
> +
> + GLfloat norm = 1.0f/sqrt(a*a + b*b + c*c);
> +
> + fr_plane->a *= norm;
> + fr_plane->b *= norm;
> + fr_plane->c *= norm;
> + fr_plane->d *= norm;
> +};
> +
> +static inline
> +int vbo_bbox_get_delay(struct mesa_bbox_opt *opt)
> +{
> + if (mesa_bbox_env_variables.bbox_enable < MESA_BBOX_ENABLE_FORCE_RECALC) {
> + if (opt->calc_delay > BBOX_CALC_MAX_DELAY) {
> + opt->calc_delay = BBOX_CALC_MIN_DELAY;
> + }
> + return opt->calc_delay++;
> + }
> + else {
> + return 0;
> + }
> +}
> +
> +static inline
> +bool vbo_init_sub_bbox_array(int bbox_count, struct vbo_bbox_cache_data* data)
> +{
> + bool allocate_bbox = true;
> + if(data->sub_bbox_array != NULL) {
> + if(bbox_count == data->sub_bbox_cnt) {
> + allocate_bbox = false;
> + }
> + else {
> + free(data->sub_bbox_array);
> + data->sub_bbox_array = NULL;
> + }
> + }
> + if (allocate_bbox)
> + data->sub_bbox_array = (struct bounding_info*) malloc(
> + bbox_count * sizeof (struct bounding_info));
> +
> + if (!data->sub_bbox_array)
> + return false;
> +
> + data->sub_bbox_cnt = bbox_count;
> + return true;
> +}
> +
> +static inline GLint
> +vbo_bbox_get_mvp(struct gl_context *ctx)
> +{
> + struct gl_linked_shader * linked_shader =
> + ctx->Shader.ActiveProgram->_LinkedShaders[MESA_SHADER_VERTEX];
> +
> + return _mesa_GetUniformLocation(ctx->Shader.ActiveProgram->Name,
> + linked_shader->linkedshaderMVP);
> +}
> +
> +/*
> + * Gets the currently linked shaders flag for simple shader
> + *
> + */
> +static inline int
> +vbo_is_simple_shader(struct gl_context *ctx)
> +{
> + struct gl_linked_shader * linked_shader =
> + ctx->Shader.ActiveProgram->_LinkedShaders[MESA_SHADER_VERTEX];
> +
> + return linked_shader->linked_bbox_simple_shader;
> +}
> +
> +/**
> + * Get current VAO
> + */
> +static inline struct gl_vertex_array_object*
> +vbo_get_current_vao(struct gl_context *const gc)
> +{
> + assert(gc);
> + struct gl_vertex_array_object* vao = gc->Array.VAO;
> + assert(vao);
> +
> + return vao;
> +}
> +
> +/**
> + * Returns the location of the "position" in the attributes of the
> + * currently active program
> + */
> +static inline
> +int vbo_get_simple_vs_position_attr_location(
> + struct gl_context *const gc)
> +{
> + assert(gc);
> + struct gl_linked_shader * linked_shader =
> + gc->Shader.ActiveProgram->_LinkedShaders[MESA_SHADER_VERTEX];
> + GLint vertexPosLocation = _mesa_GetAttribLocation(
> + gc->Shader.ActiveProgram->Name,
> + linked_shader->linkedshaderVertPosition);
> + if (vertexPosLocation >= 0)
> + return vertexPosLocation;
> + else
> + return -1;
> +}
> +
> +/**
> + * Get element-buffer handle of the current VAO
> + */
> +static inline struct gl_buffer_object*
> +vbo_get_current_element_buffer(struct gl_context *const gc)
> +{
> + assert(gc);
> + struct gl_buffer_object* element_buffer =
> + vbo_get_current_vao(gc)->IndexBufferObj;
> + assert(element_buffer);
> + return element_buffer;
> +}
> +
> +/**
> + * Get vertex-binding of position from the current VAO
> + */
> +static inline struct gl_vertex_buffer_binding*
> +vbo_get_current_vertex_buffer_binding_of_position(struct gl_context *const gc)
> +{
> + assert(gc);
> + struct gl_vertex_array_object* vao = vbo_get_current_vao(gc);
> +
> + GLbitfield mask = vao->_Enabled & vao->VertexAttribBufferMask;
> + const struct gl_array_attributes *attrib_array =
> + &vao->VertexAttrib[ffs(mask) - 1];
> + struct gl_vertex_buffer_binding *buffer_binding =
> + &vao->BufferBinding[attrib_array->BufferBindingIndex];
> +
> + return buffer_binding;
> +}
> +
> +/**
> + * Get vertex-buffer handle of the current VAO
> + */
> +static inline struct gl_buffer_object*
> +vbo_get_current_vertex_buffer(struct gl_context *const gc)
> +{
> + assert(gc);
> + struct gl_buffer_object* pVertexBuffer =
> + vbo_get_current_vertex_buffer_binding_of_position(gc)->BufferObj;
> + assert(pVertexBuffer);
> + return pVertexBuffer;
> +}
> +
> +
> +/**
> + * Condition to enter bounding box optimization
> + */
> +static inline bool
> +vbo_bbox_check_supported_draw_call(struct gl_context *const gc,
> + GLenum mode, GLsizei count, GLenum type,
> + const GLvoid *indices, GLint basevertex)
> +{
> +
> + assert(gc);
> + int shader_scenario;
> + struct gl_linked_shader *_LinkedShaders;
> +
> + /* Check if the minimum vertex count is met. */
> + if (count < (GLsizei) mesa_bbox_env_variables.bbox_min_vrtx_count) {
> + /* Count is most common cause to bail out form optimization
> + * so should be first.
> + */
> + MESA_BBOX("Aborting MESA_BBOX :%d: Vertex count too small, minimum count = %d\n",
> + count,mesa_bbox_env_variables.bbox_min_vrtx_count);
> + return false;
> + }
> +
> + if (mode != GL_TRIANGLES) {
> + MESA_BBOX("Aborting MESA_BBOX :%d: Primitive mode is not GL_TRIANGLES, \
> + mode = %d\n", count, mode);
> + return false;
> + }
> +
> + /* Examine current shader */
> + if (!gc->_Shader->ActiveProgram) {
> + MESA_BBOX("Aborting MESA_BBOX:%d: No active GLSL program.\n", count);
> + return false;
> + }
> +
> + /* BASIC Shader scenario is when we have just VS & FS */
> + if (gc->_Shader->CurrentProgram[MESA_SHADER_VERTEX] != NULL &&
> + gc->_Shader->CurrentProgram[MESA_SHADER_FRAGMENT] != NULL &&
> + gc->_Shader->CurrentProgram[MESA_SHADER_TESS_CTRL] == NULL &&
> + gc->_Shader->CurrentProgram[MESA_SHADER_TESS_EVAL] == NULL &&
> + gc->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] == NULL) {
> + shader_scenario = 0;
> + }
> + else
> + shader_scenario = 1;
> +
> + if (shader_scenario) {
> + MESA_BBOX("Aborting MESA_BBOX:%d: GLSL program must contain only vertex and \
> + fragment shaders, shader scenario = \n", count );
> + return false;
> + }
> +
> + if (!vbo_is_simple_shader(gc)) {
> + MESA_BBOX("Aborting MESA_BBOX:%d: GLSL vertex shader does not have simple \
> + position calculation \n", count);
> + return false;
> + }
> + if (gc->Shader.ActiveProgram->_LinkedShaders[MESA_SHADER_VERTEX]) {
> + _LinkedShaders =
> + gc->Shader.ActiveProgram->_LinkedShaders[MESA_SHADER_VERTEX];
> +
> + MESA_BBOX("MVP:%s, VertPos:%s\n",_LinkedShaders->linkedshaderMVP,
> + _LinkedShaders->linkedshaderVertPosition);
> + }
> +
> + /* Examine element buffer */
> + struct gl_buffer_object* element_buffer = vbo_get_current_element_buffer(gc);
> + if ((!element_buffer) || element_buffer->Name == 0) {
> + MESA_BBOX("Aborting MESA_BBOX:%d: Element buffer name is 0\n", count);
> + return false;
> + }
> +
> + if (!(element_buffer->StorageFlags &
> + (GL_CLIENT_STORAGE_BIT | GL_DYNAMIC_STORAGE_BIT))){
> + MESA_BBOX("Aborting MESA_BBOX:%d: Element buffer not resident: %#x\n", count,
> + element_buffer->StorageFlags);
> + return false;
> + }
> +
> + /* Get VertexPosLocation */
> + int vertexPosLocation = 0;
> + if (gc->Shader.ActiveProgram)
> + vertexPosLocation = vbo_get_simple_vs_position_attr_location(gc);
> + if (vertexPosLocation < 0)
> + {
> + MESA_BBOX("Aborting MESA_BBOX:%d: VertexPosition Location is inValid:\n", count);
> + return false;
> + }
> +
> + struct gl_vertex_array_object* vao = vbo_get_current_vao(gc);
> + int posAttribMapMode =
> + _mesa_vao_attribute_map[vao->_AttributeMapMode][vertexPosLocation];
> +
> + if (((vao->_Enabled >> posAttribMapMode) & 0x1) != 1)
> + {
> + MESA_BBOX("Aborting MESA_BBOX:%d: Vertex data does not come from VBO , GL-API:%d\n", count,gc->API);
> +//#if !defined(__ANDROID__) || !defined(ANDROID)
> +//This is not specific to Android but to the GLES API
> + if (gc->API != API_OPENGLES && gc->API != API_OPENGLES2)
> + return false;
> +//#endif
> + }
> +
> + struct gl_buffer_object* vertexattrib_buffer =
> + vbo_get_current_vertex_buffer(gc);
> + if ((!vertexattrib_buffer) || vertexattrib_buffer->Name == 0) {
> + MESA_BBOX("Aborting MESA_BBOX:%d: Vertex buffer %p name is %d\n", count,
> + vertexattrib_buffer,vertexattrib_buffer->Name);
> +#if !defined(__ANDROID__) || !defined(ANDROID)
> + return false;
> +#endif
> + }
> +
> + if (!(vertexattrib_buffer->StorageFlags &
> + (GL_CLIENT_STORAGE_BIT | GL_DYNAMIC_STORAGE_BIT))){
> + MESA_BBOX("Aborting MESA_BBOX:%d:Vertex buffer not resident %#x \n", count,
> + vertexattrib_buffer->StorageFlags);
> + MESA_BBOX("Aborting MESA_BBOX:VAO AttributeMode is %d\n", vao->_AttributeMapMode);
> +#if !defined(__ANDROID__) || !defined(ANDROID)
> + return false;
> +#endif
> + }
> +
> + /* Examine vertex position attribute configuration */
> + if (vao->VertexAttrib[posAttribMapMode].Enabled) {
> + if (vao->VertexAttrib[posAttribMapMode].Size != 3)
> + {
> + MESA_BBOX("Aborting MESA_BBOX:%d: Vertex attrib size :%d, only 3 supported\n",
> + count, vao->VertexAttrib[VERT_ATTRIB_POS].Size);
> +#if !defined(__ANDROID__) || !defined(ANDROID)
> + return false;
> +#endif
> + }
> + if (vao->VertexAttrib[posAttribMapMode].Type != GL_FLOAT)
> + {
> + MESA_BBOX("Aborting MESA_BBOX:%d: Vertex attrib type is %d, only GL_FLOAT \
> + supported\n", count, vao->VertexAttrib[VERT_ATTRIB_POS].Type);
> + return false;
> + }
> + }
> +
> + if (type != GL_UNSIGNED_SHORT) {
> + MESA_BBOX("Aborting MESA_BBOX:%d: type is %d, only GL_UNSIGNED_SHORT \
> + supported\n", count, type);
> + return false;
> + }
> + if (basevertex != 0) {
> + MESA_BBOX("Aborting MESA_BBOX:%d: basevertex is 0 \n", count);
> + return false;
> + }
> +
> + /* If size ==3 and type == GL_FLOAT, then element stride must be 12. */
> + assert(vao->VertexAttrib[VERT_ATTRIB_POS].StrideB == 12);
> +
> + /* When transform feedback is capturing we cannot do early clipping since
> + * xfb must write unclipped vertices
> + * Note - we could check for IsCapturing() but that would require
> + * more elaborate checking for VBO modifications.
> + */
> + if (gc->TransformFeedback.CurrentObject->Active) {
> + MESA_BBOX("MESA_BBOX:%d: Transform feedback is active, \
> + cannot clip\n", count);
> + return false;
> + }
> + return true;
> +}
> +
> +
> +/**
> + * Check condition to enter bounding box optimization and if draw call
> + * is suitable prepare key describing given geometry.
> + */
> +static inline
> +void vbo_bbox_prepare_key(struct gl_context *const gc, GLenum mode,
> + GLsizei count, GLenum type, GLuint type_size,
> + const GLvoid *indices, GLint basevertex,
> + vbo_bbox_cache_key *key)
> +{
> + assert(gc);
> +
> + /* Examine element buffer */
> + struct gl_buffer_object* element_buffer = vbo_get_current_element_buffer(gc);
> + struct gl_buffer_object* vertexattrib_buffer =
> + vbo_get_current_vertex_buffer(gc);
> + struct gl_vertex_buffer_binding* vbinding =
> + vbo_get_current_vertex_buffer_binding_of_position(gc);
> +
> + memset(key,0,sizeof(vbo_bbox_cache_key));
> +
> + key->mode = mode;
> + key->count = count;
> + key->indices_type = type;
> + key->type_size = type_size;
> + key->indices = (GLint) (uintptr_t) indices;
> + key->basevertex = basevertex;
> +
> + key->element_buf_name = element_buffer->Name;
> + key->vertex_buf_name = vertexattrib_buffer->Name;
> +
> + key->offset = (GLint)vbinding->Offset;
> + key->stride = vbinding->Stride;
> +}
> +
> +
> +/**
> + * Create a bounding box descriptor in a form of 8 correctly
> + * ordered vertex coordinates. The order of coordinates is significant.
> + */
> +static
> +void vbo_bbox_create_bounding_box(float* const minVec3f, float* const maxVec3F,
> + vbo_vec4f* vertices4)
> +{
> + assert(minVec3f);
> + assert(maxVec3F);
> + assert(vertices4);
> +
> + float Xmin = minVec3f[0];
> + float Ymin = minVec3f[1];
> + float Zmin = minVec3f[2];
> + float Wmin = 1.0f;
> +
> + float Xmax = maxVec3F[0];
> + float Ymax = maxVec3F[1];
> + float Zmax = maxVec3F[2];
> + float Wmax = 1.0f;
> +
> + float* v = (float*)vertices4;
> + int i = 0;
> + v[i+0] = Xmin; v[i+1] = Ymin; v[i+2] = Zmin;v[i+3]=Wmin; i+=4;
> + v[i+0] = Xmax; v[i+1] = Ymin; v[i+2] = Zmin;v[i+3]=Wmin; i+=4;
> + v[i+0] = Xmin; v[i+1] = Ymax; v[i+2] = Zmin;v[i+3]=Wmin; i+=4;
> + v[i+0] = Xmax; v[i+1] = Ymax; v[i+2] = Zmin;v[i+3]=Wmin; i+=4;
> +
> + v[i+0] = Xmin; v[i+1] = Ymin; v[i+2] = Zmax;v[i+3]=Wmax; i+=4;
> + v[i+0] = Xmax; v[i+1] = Ymin; v[i+2] = Zmax;v[i+3]=Wmax; i+=4;
> + v[i+0] = Xmin; v[i+1] = Ymax; v[i+2] = Zmax;v[i+3]=Wmax; i+=4;
> + v[i+0] = Xmax; v[i+1] = Ymax; v[i+2] = Zmax;v[i+3]=Wmax; i+=4;
> +}
> +
> +#ifdef __AVX__
> +/* Calculate bbox Subbox Coordinates */
> +static void
> +vbo_bbox_calc_subbox_coordinates(unsigned int vertSubBox,unsigned int vertCount,
> + unsigned int first_idx,unsigned int second_idx,
> + unsigned short* indices,float* vertices,
> + unsigned int stride,
> + struct vbo_bbox_cache_data *data)
> +{
> +
> + /* Retrieving the starting offset of the first and second subbox */
> + unsigned int first = first_idx * vertSubBox;
> + unsigned int second = second_idx * vertSubBox;
> +
> + float tmpVertexBuf[8] = {0.0};
> + float tmp_buf_min[8] = {0.0};
> + float tmp_buf_max[8] = {0.0};
> +
> + __m256 subMin = _mm256_set1_ps(FLT_MAX);
> + __m256 subMax = _mm256_set1_ps(-FLT_MAX);
> +
> + /* Run both the subboxes for vertex count */
> + for(unsigned int iter = 0; iter < vertCount; iter++){
> + /* Calculate the vertex offset of first subbox */
> + unsigned short index1 = indices[first+iter];
> + /* Fetching the vertices for the first subbox */
> + float* vertex1 = (float *)((char *)(vertices) + stride*index1);
> + memcpy(tmpVertexBuf,vertex1, 3 * sizeof(float));
> +
> + /* Calculate the vertex offset of second subbox */
> + unsigned short index2 = indices[second+iter];
> +
> + /* Fetching the vertices for the second subbox */
> + float* vertex2 = (float *)((char *)(vertices) + stride*index2);
> + memcpy(tmpVertexBuf+4,vertex2, 3 * sizeof(float));
> +
> + __m256 tmp = _mm256_loadu_ps(tmpVertexBuf);
> + subMin = _mm256_min_ps(subMin, tmp);
> + subMax = _mm256_max_ps(subMax, tmp);
> + }
> +
> + /* compare full box values */
> + fullMin = _mm256_min_ps(fullMin, subMin);
> + fullMax = _mm256_max_ps(fullMax, subMax);
> +
> + /* store results */
> + _mm256_storeu_ps(tmp_buf_min, subMin);
> + _mm256_storeu_ps(tmp_buf_max, subMax);
> +
> +
> + /* Update the min and max values in sub box coordinates for both
> + * the subboxes
> + */
> + vbo_bbox_create_bounding_box(tmp_buf_min, tmp_buf_max,
> + &(data->sub_bbox_array[first_idx].bounding_volume.vert_vec4[0]));
> + vbo_bbox_create_bounding_box(tmp_buf_min+4, tmp_buf_max+4,
> + &(data->sub_bbox_array[second_idx].bounding_volume.vert_vec4[0]));
> +}
> +#else
> +static void
> +vbo_bbox_calc_subbox_coordinates(
> + unsigned int vertSubBox,
> + unsigned int vertCount,
> + unsigned int idx,
> + unsigned short *indices,
> + float *vertices,
> + unsigned int stride,
> + struct vbo_bbox_cache_data *data)
> +{
> + /* Retrieving the starting offset of the first and second subbox */
> + unsigned int first = idx * vertSubBox;
> +
> + float tmpVertexBuf[4] = {0.0};
> + float tmp_buf_min[4] = {0.0};
> + float tmp_buf_max[4] = {0.0};
> +
> + __m128 subMin = _mm_set1_ps(FLT_MAX);
> + __m128 subMax = _mm_set1_ps(-FLT_MAX);
> +
> + /* Run both the subboxes for vertex count */
> + for(unsigned int iter = 0; iter < vertCount; iter++){
> +
> + /* Calculate the vertex offset of first subbox */
> + unsigned short index = indices[first+iter];
> + /* Fetching the vertices for the first subbox */
> + float* vertex = (float *)((char *)(vertices) + stride*index);
> + memcpy(tmpVertexBuf,vertex, 3 * sizeof(float));
> +
> + __m128 tmp = _mm_loadu_ps(tmpVertexBuf);
> + subMin = _mm_min_ps(subMin, tmp);
> + subMax = _mm_max_ps(subMax, tmp);
> + }
> +
> + /* compare full box values */
> + fullMin = _mm_min_ps(fullMin, subMin);
> + fullMax = _mm_max_ps(fullMax, subMax);
> +
> + /* store results */
> + _mm_storeu_ps(tmp_buf_min, subMin);
> + _mm_storeu_ps(tmp_buf_max, subMax);
> +
> + /* Update the min and max values in sub box coordinates for both
> + * the subboxes
> + */
> + vbo_bbox_create_bounding_box(tmp_buf_min, tmp_buf_max,
> + &(data->sub_bbox_array[idx].bounding_volume.vert_vec4[0]));
> +}
> +#endif
> +
> +/**
> + * Get pointer to VBO data.
> + * Pointer should be suitable for fast data reading, not data change.
> + */
> +static
> +bool vbo_bbox_get_vbo_ptr(struct gl_context* gc, struct gl_buffer_object* vbo,
> + int offset, void** data, int* dataSize)
> +{
> + assert(gc);
> + assert(vbo);
> + GLubyte* vboDataPtr = NULL;
> +
> + if (offset >= vbo->Size) {
> + return false;
> + }
> + vboDataPtr = _mesa_MapNamedBuffer(vbo->Name,GL_WRITE_ONLY_ARB);
> + if (vboDataPtr == NULL) {
> + return false;
> + }
> + *data = vboDataPtr + offset;
> + *dataSize = vbo->Size - offset;
> +
> + return true;
> +}
> +
> +/**
> + * Unlock VBO
> + */
> +static inline
> +void vbo_bbox_release_vbo_ptr(struct gl_context* gc,
> + struct gl_buffer_object* vbo)
> +{
> + assert(gc);
> + assert(vbo);
> + _mesa_UnmapNamedBuffer_no_error(vbo->Name);
> +}
> +
> +/**
> + * Check if given range of indices contains only degenerate triangles.
> + */
> +static
> +bool vbo_bbox_is_degenerate(GLvoid *indices, GLuint count_in)
> +{
> + assert(indices);
> + assert(count_in % 3 == 0);
> +
> + GLuint triangle_count = count_in / 3;
> + GLuint input_idx = 0;
> +
> + GLushort* ptr = (GLushort*)indices;
> + for (GLuint i = 0; i < triangle_count; i++) {
> + GLushort a = ptr[input_idx++];
> + GLushort b = ptr[input_idx++];
> + GLushort c = ptr[input_idx++];
> + if (!(a == b || a == c || b == c)) {
> + return false;
> + }
> + }
> + return true;
> +}
> +
> +
> +/**
> + * Calculate bounding boxes for given geometry.
> + */
> +static
> +bool vbo_bbox_calculate_bounding_boxes_with_indices(struct gl_context *const gc,
> + const vbo_bbox_cache_key *key,
> + struct vbo_bbox_cache_data *data,
> + void* indexData,
> + int indexDataSize)
> +{
> + assert(gc);
> +
> + void* vertex_data = NULL;
> + int vertex_datasize = 0;
> + int vert_per_subBbox = mesa_bbox_env_variables.bbox_split_size;
> + int sub_bbox_cnt = (key->count + vert_per_subBbox -1)/vert_per_subBbox;
> + int *subbox_array;
> + int idx =0;
> + int non_degen_count = 0;
> +
> + assert(sub_bbox_cnt);
> +
> + struct gl_buffer_object* element_buffer = _mesa_lookup_bufferobj(gc,
> + key->element_buf_name);
> +
> + struct gl_buffer_object * vertexattrib_buffer = _mesa_lookup_bufferobj(gc,
> + key->vertex_buf_name);
> +
> + if (element_buffer == NULL || vertexattrib_buffer == NULL) {
> + return false;
> + }
> +
> + if (!vbo_bbox_get_vbo_ptr(gc, vertexattrib_buffer,(int) key->offset,
> + &vertex_data, &vertex_datasize)) {
> + return false;
> + }
> +
> + assert(vertex_data);
> + assert(vertex_datasize > 0);
> + assert(indexData);
> + assert(indexDataSize > 0);
> + assert(key->indices_type == GL_UNSIGNED_SHORT);
> + assert(indexDataSize > key->count * (int)key->type_size);
> +
> + /* Allocate memory for bounding boxes */
> + if (!vbo_init_sub_bbox_array(sub_bbox_cnt,data)) {
> + vbo_bbox_release_vbo_ptr(gc, vertexattrib_buffer);
> + return false;
> + }
> + /* Initialize size of bounding boxes */
> + for (int i = 0; i < sub_bbox_cnt; i++) {
> + data->sub_bbox_array[i].vert_count = (i==sub_bbox_cnt-1)?
> + (key->count - i*vert_per_subBbox):vert_per_subBbox;
> + data->sub_bbox_array[i].start_offset = i*vert_per_subBbox * key->type_size;
> + }
> +
> + subbox_array = malloc(sub_bbox_cnt * sizeof(int));
> +
> + /* Check if all triangles withing bbox are degenerate (i.e triangles with
> + zero area) */
> + for (int i = 0; i < sub_bbox_cnt; i++) {
> + GLubyte* ptr = (GLubyte *)indexData;
> + data->sub_bbox_array[i].is_degenerate =
> + vbo_bbox_is_degenerate((ptr + data->sub_bbox_array[i].start_offset),
> + data->sub_bbox_array[i].vert_count);
> +
> + if(!data->sub_bbox_array[i].is_degenerate)
> + {
> + subbox_array[idx++] = i;
> + non_degen_count++;
> + }
> + }
> +
> + float tmp_buf_min[8] = {0.0};
> + float tmp_buf_max[8] = {0.0};
> +
> +#ifdef __AVX__
> + int odd = non_degen_count % 2;
> + int num_iter = non_degen_count/2;
> + int iter;
> +
> + fullMin = _mm256_set1_ps(FLT_MAX);
> + fullMax = _mm256_set1_ps(-FLT_MAX);
> + idx = 0;
> + for(iter = 0; iter < num_iter;iter++){
> + idx = 2*iter;
> + if(data->sub_bbox_array[subbox_array[idx]].vert_count ==
> + data->sub_bbox_array[subbox_array[idx+1]].vert_count)
> + {
> + /* call the algorithm with the count */
> + vbo_bbox_calc_subbox_coordinates(
> + vert_per_subBbox,
> + data->sub_bbox_array[subbox_array[idx]].vert_count,
> + subbox_array[idx],
> + subbox_array[idx+1],
> + (GLushort*)indexData,
> + (GLfloat*)vertex_data,
> + key->stride,
> + data
> + );
> + }
> + else
> + {
> + /* call the first one separately */
> + vbo_bbox_calc_subbox_coordinates(vert_per_subBbox,
> + data->sub_bbox_array[subbox_array[idx]].vert_count,
> + subbox_array[idx],
> + subbox_array[idx],
> + (GLushort*)indexData,
> + (GLfloat*)vertex_data,
> + key->stride,
> + data);
> +
> + /* call the second one separately */
> + vbo_bbox_calc_subbox_coordinates(vert_per_subBbox,
> + data->sub_bbox_array[subbox_array[idx+1]].vert_count,
> + subbox_array[idx+1],
> + subbox_array[idx+1],
> + (GLushort*)indexData,
> + (GLfloat*)vertex_data,
> + key->stride,
> + data);
> +
> + }
> +
> + }
> +
> + if(odd)
> + {
> + idx = 2*iter;
> + /* call the last one separately */
> + vbo_bbox_calc_subbox_coordinates(vert_per_subBbox,
> + data->sub_bbox_array[subbox_array[idx]].vert_count,
> + subbox_array[idx],
> + subbox_array[idx],
> + (GLushort*)indexData,
> + (GLfloat*)vertex_data,
> + key->stride,
> + data);
> + }
> +
> + /* Finding the minimum from the full box 256 */
> + __m128 firstlane = _mm256_extractf128_ps(fullMin,0);
> + __m128 secondlane = _mm256_extractf128_ps(fullMin,1);
> + firstlane = _mm_min_ps(firstlane,secondlane);
> + _mm_storeu_ps(tmp_buf_min,firstlane);
> +
> + /* Finding the maximum from the full box 256 */
> + firstlane = _mm256_extractf128_ps(fullMax,0);
> + secondlane = _mm256_extractf128_ps(fullMax,1);
> + firstlane = _mm_max_ps(firstlane,secondlane);
> + _mm_storeu_ps(tmp_buf_max,firstlane);
> +
> +#else
> + fullMin = _mm_set1_ps(FLT_MAX);
> + fullMax = _mm_set1_ps(-FLT_MAX);
> +
> + for(unsigned int i=0; i< non_degen_count; i++){
> + //call the algorithm with the count
> + vbo_bbox_calc_subbox_coordinates(vert_per_subBbox,
> + data->sub_bbox_array[subbox_array[i]].vert_count,
> + subbox_array[i],
> + (GLushort*)indexData,
> + (GLfloat*)vertex_data,
> + key->stride,
> + data);
> + }
> + _mm_storeu_ps(tmp_buf_min, fullMin);
> + _mm_storeu_ps(tmp_buf_max, fullMax);
> +#endif
> +
> + /* Set up bounding box as 8 vertices and store in bbox data */
> + vbo_bbox_create_bounding_box(tmp_buf_min, tmp_buf_max,
> + &(data->full_box.bounding_volume.vert_vec4[0]));
> +
> + if(subbox_array)
> + free(subbox_array);
> + vbo_bbox_release_vbo_ptr(gc, vertexattrib_buffer);
> + data->valid = true;
> +
> + return true;
> +}
> +
> +
> +/**
> + * Calculate bounding boxes for given geometry.
> + */
> +static inline
> +bool vbo_bbox_calculate_bounding_boxes(struct gl_context *const gc,
> + const vbo_bbox_cache_key *key,
> + struct vbo_bbox_cache_data* data)
> +{
> + assert(gc);
> + assert(key->indices_type == GL_UNSIGNED_SHORT);
> +
> + void* pIndexData = NULL;
> + int indexDataSize = 0;
> +
> + struct gl_buffer_object* element_buffer = _mesa_lookup_bufferobj(gc,
> + key->element_buf_name);
> + if (element_buffer == NULL) {
> + return false;
> + }
> + if (!vbo_bbox_get_vbo_ptr(gc, element_buffer, (int) key->indices,
> + &pIndexData, &indexDataSize)) {
> + return false;
> + }
> +
> + bool ret = vbo_bbox_calculate_bounding_boxes_with_indices(gc, key, data,
> + pIndexData, indexDataSize);
> +
> + vbo_bbox_release_vbo_ptr(gc, element_buffer);
> +
> + return ret;
> +}
> +
> +
> +/**
> + * Create new bounding box cache entry
> + */
> +static
> +struct vbo_bbox_cache_data* vbo_bbox_create_data(struct gl_context *const gc,
> + const vbo_bbox_cache_key *key)
> +{
> + assert(gc);
> +
> + struct gl_buffer_object* element_buffer = vbo_get_current_element_buffer(gc);
> + struct gl_buffer_object* vertexattrib_buffer =
> + vbo_get_current_vertex_buffer(gc);
> +
> + if ((vertexattrib_buffer == NULL) ||
> + (element_buffer == NULL)){
> + return NULL;
> + }
> +
> + mesa_bbox_opt * BboxOpt = gc->Pipeline.BboxOpt;
> + assert(BboxOpt);
> +
> + struct vbo_bbox_cache_data* data = (struct vbo_bbox_cache_data *) malloc(
> + sizeof (struct vbo_bbox_cache_data));
> + data->full_box.is_degenerate = false;
> +
> + if (data == NULL){
> + return NULL;
> + }
> + /* Initialize the cache data and variables in cache data */
> + data->valid = false;
> + data->need_new_calculation = true;
> + data->init_delaycnt = 0;
> + data->init_delaylimit = 0;
> + data->vertpos_vbo_changecnt = 0;
> + data->indices_vbo_changecnt = 0;
> + data->sub_bbox_cnt = 0;
> + data->sub_bbox_array = NULL;
> + data->drawcnt_bbox_helped = 0;
> + data->last_use_frame = 0;
> + data->vertpos_vbo_changecnt = vertexattrib_buffer->data_change_counter;
> + data->indices_vbo_changecnt = element_buffer->data_change_counter;
> +
> + /* This defines for how many cache hits we wait before actually creating the
> + * data */
> + data->init_delaylimit = vbo_bbox_get_delay(BboxOpt);
> +
> + /* At this point data is not valid yet */
> + _mesa_bbox_cache_insert(gc,gc->Pipeline.BboxOpt->cache,
> + key,sizeof(vbo_bbox_cache_key),data);
> +
> + return data;
> +}
> +
> +
> +/**
> + * Check if contents of the VBO buffers have changed since data entry
> + * was created.
> + */
> +static inline
> +bool vbo_bbox_validate_data(
> + struct gl_context *const gc,
> + const vbo_bbox_cache_key *key,
> + struct vbo_bbox_cache_data* data)
> +{
> + assert(gc);
> +
> + struct gl_buffer_object* element_buffer = _mesa_lookup_bufferobj(gc,
> + key->element_buf_name);
> +
> + struct gl_buffer_object * vertexattrib_buffer = _mesa_lookup_bufferobj(gc,
> + key->vertex_buf_name);
> +
> + if (element_buffer == NULL || vertexattrib_buffer == NULL) {
> + return false;
> + }
> +
> + if ((element_buffer->data_change_counter != data->indices_vbo_changecnt) ||
> + (vertexattrib_buffer->data_change_counter != data->vertpos_vbo_changecnt)
> + ) {
> + return false;
> + }
> + return true;
> +}
> +
> +/**
> + * Retrieve bounding box data from cache.
> + */
> +static inline
> +struct vbo_bbox_cache_data* vbo_bbox_get_bounding_boxes(
> + struct gl_context *const gc,
> + const vbo_bbox_cache_key *key)
> + {
> +
> + assert(gc);
> + mesa_bbox_opt * BboxOpt = gc->Pipeline.BboxOpt;
> + assert(BboxOpt);
> + struct vbo_bbox_cache_data* data = _mesa_search_bbox_cache(BboxOpt->cache,
> + key, sizeof(vbo_bbox_cache_key));
> + if (data) {
> + if (data->need_new_calculation == false)
> + {
> + /* Data is initialized and valid */
> + if (data->valid) {
> + if (vbo_bbox_validate_data(gc, key, data)) {
> + data->mvp_valid = true;
> + return data;
> + }
> + }
> + else {
> + data->valid = false;
> + return NULL;
> + }
> + }
> + }
> + else {
> + /* Data does not exist, create it */
> + data = vbo_bbox_create_data(gc, key);
> + if (data == NULL)
> + {
> + return NULL;
> + }
> + }
> + if ((data->need_new_calculation) &&
> + (data->init_delaycnt++ >= data->init_delaylimit)) {
> + data->valid = false;
> + data->need_new_calculation = false;
> + data->mvp_valid = false;
> +
> + if (!vbo_bbox_validate_data(gc, key, data)) {
> + struct gl_buffer_object * element_buffer =
> + _mesa_lookup_bufferobj(gc,key->element_buf_name);
> +
> + struct gl_buffer_object * vertexattrib_buffer =
> + _mesa_lookup_bufferobj(gc,key->vertex_buf_name);
> +
> + if ((vertexattrib_buffer == NULL) ||
> + (element_buffer == NULL)) {
> + return NULL;
> + }
> + data->vertpos_vbo_changecnt = vertexattrib_buffer->data_change_counter;
> + data->indices_vbo_changecnt = element_buffer->data_change_counter;
> + }
> + if (gc->volume_type == BOUNDING_VOLUME_AABB) {
> + /* Calculate bounding boxes */
> + if (vbo_bbox_calculate_bounding_boxes(gc, key, data)) {
> + return data;
> + }
> + }
> + }
> + return NULL;
> +}
> +
> +/**
> + * This function is called when we updating the element buffer. because the
> + * element-buffer has changed we have to update the relevant bbox data:
> + */
> +void vbo_bbox_element_buffer_update(struct gl_context *const gc,
> + struct gl_buffer_object *buffer,
> + const void* data,
> + int offset,
> + int size)
> +{
> + mesa_bbox_opt * BboxOpt = gc->Pipeline.BboxOpt;
> +
> + if (BboxOpt) {
> + struct gl_segment updateSegment;
> + updateSegment.Left = offset;
> + updateSegment.Right = offset+size;
> +
> + mesa_bbox_cache * buffer_map = BboxOpt->cache;
> + if (buffer_map) {
> + struct vbo_bbox_cache_data *c;
> + GLuint i = 0;
> + for (i = 0; i < buffer_map->size; i++)
> + for (c = buffer_map->items[i]; c != NULL ; c = c->next) {
> + struct vbo_bbox_cache_data * bbox_data = buffer_map->items[i];
> + struct gl_buffer_object *buffObj = _mesa_lookup_bufferobj(gc,
> + bbox_data->key->element_buf_name);
> +
> + if(buffObj != buffer)
> + continue;
> +
> + if (!bbox_data) {
> +
> + assert(bbox_data);
> + return;
> + }
> +
> + bbox_data->indices_vbo_changecnt = buffer->data_change_counter;
> +
> + struct gl_segment element_bufferSegment;
> + element_bufferSegment.Left = bbox_data->key->indices;
> + element_bufferSegment.Right = bbox_data->key->indices +
> + (bbox_data->key->count * bbox_data->key->type_size);
> +
> + if (bbox_data->valid &&
> + intersect(&element_bufferSegment, &updateSegment)) {
> + bbox_data->valid = false;
> + bbox_data->need_new_calculation = true;
> +
> + if (superset(&element_bufferSegment,&updateSegment))
> + {
> + int offset_in_newdata = bbox_data->key->indices - offset;
> + int newsize = size - offset_in_newdata;
> +
> + assert(offset_in_newdata >= 0);
> + assert(newsize > 0);
> +
> + GLchar * start_data = (GLchar *)data + offset_in_newdata;
> +
> + if (vbo_bbox_calculate_bounding_boxes_with_indices(gc,
> + bbox_data->key, bbox_data, start_data, newsize)) {
> + bbox_data->need_new_calculation = false;
> + bbox_data->mvp_valid = false;
> + }
> + }
> + }
> + }
> + }
> + }
> +}
> +
> +/**
> + * Generate 6 clip planes from MVP.
> + */
> +static
> +void vbo_bbox_get_frustum_from_mvp(vbo_bbox_frustum *frustum, GLmatrix* mvpin)
> +{
> +
> + GLfloat in_mat[4][4] = {0};
> + {
> + #define M(row,col) m[col*4+row]
> + in_mat[0][0] = mvpin->M(0,0);
> + in_mat[0][1] = mvpin->M(0,1);
> + in_mat[0][2] = mvpin->M(0,2);
> + in_mat[0][3] = mvpin->M(0,3);
> +
> + in_mat[1][0] = mvpin->M(1,0);
> + in_mat[1][1] = mvpin->M(1,1);
> + in_mat[1][2] = mvpin->M(1,2);
> + in_mat[1][3] = mvpin->M(1,3);
> +
> + in_mat[2][0] = mvpin->M(2,0);
> + in_mat[2][1] = mvpin->M(2,1);
> + in_mat[2][2] = mvpin->M(2,2);
> + in_mat[2][3] = mvpin->M(2,3);
> +
> + in_mat[3][0] = mvpin->M(3,0);
> + in_mat[3][1] = mvpin->M(3,1);
> + in_mat[3][2] = mvpin->M(3,2);
> + in_mat[3][3] = mvpin->M(3,3);
> + #undef M
> + }
> +
> + /* Frustum plane calculation */
> +
> + /* Left plane */
> + frustum->plane[0].a = in_mat[3][0] + in_mat[0][0];
> + frustum->plane[0].b = in_mat[3][1] + in_mat[0][1];
> + frustum->plane[0].c = in_mat[3][2] + in_mat[0][2];
> + frustum->plane[0].d = in_mat[3][3] + in_mat[0][3];
> +
> + /* Right plane */
> + frustum->plane[1].a = in_mat[3][0] - in_mat[0][0];
> + frustum->plane[1].b = in_mat[3][1] - in_mat[0][1];
> + frustum->plane[1].c = in_mat[3][2] - in_mat[0][2];
> + frustum->plane[1].d = in_mat[3][3] - in_mat[0][3];
> +
> + /* Top plane */
> + frustum->plane[2].a = in_mat[3][0] - in_mat[1][0];
> + frustum->plane[2].b = in_mat[3][1] - in_mat[1][1];
> + frustum->plane[2].c = in_mat[3][2] - in_mat[1][2];
> + frustum->plane[2].d = in_mat[3][3] - in_mat[1][3];
> +
> + /* Bottom plane */
> + frustum->plane[3].a = in_mat[3][0] + in_mat[1][0];
> + frustum->plane[3].b = in_mat[3][1] + in_mat[1][1];
> + frustum->plane[3].c = in_mat[3][2] + in_mat[1][2];
> + frustum->plane[3].d = in_mat[3][3] + in_mat[1][3];
> +
> + /* Far plane */
> + frustum->plane[4].a = in_mat[3][0] - in_mat[2][0];
> + frustum->plane[4].b = in_mat[3][1] - in_mat[2][1];
> + frustum->plane[4].c = in_mat[3][2] - in_mat[2][2];
> + frustum->plane[4].d = in_mat[3][3] - in_mat[2][3];
> +
> + /* Near plane */
> + frustum->plane[5].a = in_mat[3][0] + in_mat[2][0];
> + frustum->plane[5].b = in_mat[3][1] + in_mat[2][1];
> + frustum->plane[5].c = in_mat[3][2] + in_mat[2][2];
> + frustum->plane[5].d = in_mat[3][3] + in_mat[2][3];
> +
> + /* Calculate octants */
> + for(int n = 0; n < 6; ++n) {
> + frustum->octant[n] = (frustum->plane[n].a >=0 ? 1 : 0) |
> + (frustum->plane[n].b >=0 ? 2 : 0) |
> + (frustum->plane[n].c >=0 ? 4 : 0);
> + normalize(&(frustum->plane[n]));
> + }
> +}
> +
> +
> +/**
> + * Calculate distance form a point to place
> + *
> + */
> +static inline
> +float vbo_bbox_dist_from_point_to_plane(
> + const vbo_bbox_frustum_plane *plane,
> + const vbo_vec4f *point)
> +{
> + return (plane->a * point->x + plane->b * point->y + plane->c *
> + point->z + plane->d);
> +}
> +
> +
> +/**
> + * Description:
> + * Bounding box clipping algorthm
> + * BBOX_CLIP_INSIDE - bounding box is fully inside frustum
> + * BBOX_CLIP_OUTSIDE - bounding box is fully outside frustum
> + * BBOX_CLIP_INTERSECT - bounding box intersects with frustum
> + */
> +static
> +enum vbo_bbox_clip_result vbo_bbox_fast_clipping_test(bounding_info* bbox,
> + const vbo_bbox_frustum *frustum)
> +{
> + assert(bbox);
> + vbo_vec4f *aabb = bbox->bounding_volume.vert_vec4;
> +
> + enum vbo_bbox_clip_result result = BBOX_CLIP_INSIDE;
> + for (int i = 0; i < 6; ++i)
> + {
> + unsigned char normalOctant = frustum->octant[i];
> +
> + /* Test near and far vertices of AABB according to plane normal.
> + * Plane equation can be normalized to save some divisions.
> + */
> + float farDistance = vbo_bbox_dist_from_point_to_plane(
> + &(frustum->plane[i]),
> + &(aabb[normalOctant]));
> + if (farDistance < 0.0f) {
> + return BBOX_CLIP_OUTSIDE;
> + }
> +
> + float nearDistance = vbo_bbox_dist_from_point_to_plane(
> + &(frustum->plane[i]),
> + &(aabb[normalOctant ^ 7]));
> + if (nearDistance < 0.0f) {
> + result = BBOX_CLIP_INTERSECT;
> + }
> + }
> +
> + return result;
> +}
> +
> +/**
> + * Wrapper for clip algorithm.
> + */
> +static inline
> +enum vbo_bbox_clip_result vbo_bbox_clip(bounding_info* bbox,
> + const vbo_bbox_frustum *frustum)
> +{
> + assert(bbox);
> + if (bbox->is_degenerate) {
> + return BBOX_CLIP_DEGEN;
> + }
> + return vbo_bbox_fast_clipping_test(bbox, frustum);
> +}
> +
> +/**
> + * Bounding box drawelements implementation
> + */
> +void
> +vbo_bbox_drawelements(struct gl_context *ctx, GLenum mode,
> + GLboolean index_bounds_valid, GLuint start, GLuint end,
> + GLsizei count, GLenum type, const GLvoid * indices,
> + GLint basevertex, GLuint numInstances,
> + GLuint baseInstance)
> +{
> + assert(ctx);
> + GLuint type_size;
> + /* BOUNDING VOLUME: Checks would remain same I guess */
> + bool draw_call_supported = vbo_bbox_check_supported_draw_call(ctx,
> + mode,
> + count,
> + type,
> + indices,
> + basevertex);
> + if (!draw_call_supported ||
> + (mesa_bbox_env_variables.bbox_enable < MESA_BBOX_ENABLE_FORCE_CLIPPING)) {
> + MESA_BBOX("Aborting MESA_BBOX : BBOX Is not ENABLED !!! \n");
> + vbo_validated_drawrangeelements(ctx, mode, index_bounds_valid, start,
> + end, count, type, indices, basevertex, numInstances, baseInstance);
> + return;
> + }
> +
> + type_size = sizeof(GLushort);
> + vbo_bbox_cache_key key;/* Need not initialize key its done in prepare key */
> + vbo_bbox_prepare_key(ctx, mode, count, type, type_size, indices, basevertex,
> + &key);
> + /*BOUNDING VOLUME: Call bounding volume creation based on bounding volume
> + * type */
> +
> + struct vbo_bbox_cache_data* cached_bbox = vbo_bbox_get_bounding_boxes(ctx,
> + &key);
> +
> + if (cached_bbox == NULL) {
> + MESA_BBOX("Aborting MESA_BBOX : New Object not in Cache!!! \n");
> + vbo_validated_drawrangeelements(ctx, mode, index_bounds_valid, start, end,
> + count, type, indices, basevertex,
> + numInstances, baseInstance);
> + return;
> + }
> +
> + GLint loc = vbo_bbox_get_mvp(ctx);
> + if (loc < 0) {
> + MESA_BBOX("MVP Location Error\n");
> + /* TBD: Free cache here */
> + vbo_validated_drawrangeelements(ctx, mode, index_bounds_valid, start,
> + end, count, type, indices, basevertex,
> + numInstances, baseInstance);
> + return;
> + }
> +
> + GLfloat *mvp_ptr = (GLfloat *)
> + ctx->_Shader->ActiveProgram->data->UniformStorage[loc].storage;
> + vbo_bbox_frustum frustum;
> + bool recalculate_subbox_clip = false;
> +
> + if(cached_bbox->mvp_valid == false ||
> + memcmp(cached_bbox->mvp, mvp_ptr,16*sizeof(GLfloat))) {
> + memcpy(&(cached_bbox->mvp), mvp_ptr, 16*sizeof(GLfloat));
> + cached_bbox->mvpin.m = cached_bbox->mvp;
> + vbo_bbox_get_frustum_from_mvp(&frustum,&(cached_bbox->mvpin));
> +
> + /* BOUNDING VOLUME: Call specific function to calculate clip results */
> +
> + if (ctx->volume_type == BOUNDING_VOLUME_AABB) {
> + cached_bbox->full_box.clip_result =
> + vbo_bbox_clip(&(cached_bbox->full_box), &frustum);
> + }
> + recalculate_subbox_clip = true;
> + }
> +
> + /* Calculate frustum planes */
> + MESA_BBOX("MESA_BBOX: Full Box ClipResult:%d \n",
> + cached_bbox->full_box.clip_result);
> + switch (cached_bbox->full_box.clip_result) {
> + case BBOX_CLIP_OUTSIDE:
> + /* Geometry outside view frustum, dont draw it */
> + return;
> + case BBOX_CLIP_INSIDE:
> + vbo_validated_drawrangeelements(ctx, mode, index_bounds_valid, start,
> + end, count, type, indices, basevertex,
> + numInstances, baseInstance);
> + return;
> + /* case BBOX_CLIP_INTERSECT: */
> + default:
> + MESA_BBOX("MESA_BBOX: Vertices INTERSECTING with the frustum, going"
> + " for Sub Bboxes: \n");
> + break;
> + }
> +
> + GLsizei count_to_send = 0;
> + GLsizei count_to_drop = 0;
> + GLvoid* offset_to_send = NULL;
> + bool clipped = false;
> + unsigned potential_clipped = 0;
> +
> + for (int i = 0; i < cached_bbox->sub_bbox_cnt; i++) {
> + int new_count = cached_bbox->sub_bbox_array[i].vert_count;
> +
> + if(recalculate_subbox_clip) {
> + if (ctx->volume_type == BOUNDING_VOLUME_AABB) {
> + cached_bbox->sub_bbox_array[i].clip_result =
> + vbo_bbox_clip(&(cached_bbox->sub_bbox_array[i]), &frustum);
> + }
> + }
> +
> + switch (cached_bbox->sub_bbox_array[i].clip_result)
> + {
> + case BBOX_CLIP_OUTSIDE:
> + count_to_drop += new_count;
> + potential_clipped += new_count;
> +
> + break;
> + case BBOX_CLIP_DEGEN:
> + count_to_drop += new_count;
> + potential_clipped += new_count;
> + break;
> + default:
> + /* Sub bounding box intersects with view, draw/save it
> + * for later draw
> + */
> + if (count_to_send == 0) {
> + /* Starting new batch */
> + count_to_send = new_count;
> + offset_to_send = (char*)indices +
> + cached_bbox->sub_bbox_array[i].start_offset;
> + }
> + else {
> + if (count_to_drop >=
> + (int)(mesa_bbox_env_variables.bbox_split_size *
> + BBOX_MIN_SPLITTED_DRAW_TO_DROP)) {
> +
> + /* Draw accumulated geometry */
> + vbo_validated_drawrangeelements(ctx, mode,
> + index_bounds_valid, start, end, count_to_send, type,
> + offset_to_send, basevertex, numInstances, baseInstance);
> +
> + /* Reset accumulated draws */
> + count_to_send = 0;
> + offset_to_send = (char*)indices +
> + cached_bbox->sub_bbox_array[i].start_offset;
> +
> + clipped = true;
> + }
> + else
> + {
> + count_to_send += count_to_drop;
> + }
> +
> + /* append to current batch of sent primitives */
> + count_to_send += new_count;
> + }
> +
> + count_to_drop = 0;
> + break;
> + }
> + }
> +
> + if (count_to_send > 0)
> + {
> + vbo_validated_drawrangeelements(ctx, mode, index_bounds_valid, start,
> + end, count_to_send, type,
> + offset_to_send, basevertex,
> + numInstances, baseInstance);
> + }
> +
> +
> + clipped |= (count_to_drop >= (int)(mesa_bbox_env_variables.bbox_split_size *
> + BBOX_MIN_SPLITTED_DRAW_TO_DROP));
> +
> + if (clipped)
> + {
> + cached_bbox->drawcnt_bbox_helped =
> + MIN(cached_bbox->drawcnt_bbox_helped + 1,
> + BBOX_MAX_VAL_FOR_EFFECTIVE_DRAWS_COUNTER);
> + }
> + else
> + {
> + if (potential_clipped == 0)
> + {
> + cached_bbox->drawcnt_bbox_helped =
> + MAX(cached_bbox->drawcnt_bbox_helped - 1,
> + BBOX_MIN_VAL_FOR_EFFECTIVE_DRAWS_COUNTER);
> + }
> + }
> +}
> +
> +/**
> + * Initialization for bounding box optimization
> + */
> +void vbo_bbox_init(struct gl_context* const gc)
> +{
> + assert(gc);
> +
> + const char * mesa_bbox_opt = getenv("MESA_BBOX_OPT_ENABLE");
> + if (mesa_bbox_opt !=NULL)
> + mesa_bbox_env_variables.bbox_enable = atoi(mesa_bbox_opt);
> +
> + mesa_bbox_opt = getenv("MESA_OPT_SPLIT_SIZE");
> + if (mesa_bbox_opt !=NULL)
> + mesa_bbox_env_variables.bbox_split_size = atoi(mesa_bbox_opt);
> +
> + mesa_bbox_opt = getenv("MESA_BBOX_MIN_VERTEX_CNT");
> + if (mesa_bbox_opt != NULL)
> + mesa_bbox_env_variables.bbox_min_vrtx_count = atoi(mesa_bbox_opt);
> +
> + if (!gc->Const.EnableBoundingBoxCulling) {
> + mesa_bbox_env_variables.bbox_enable = MESA_BBOX_ENABLE_OFF;
> + mesa_bbox_env_variables.bbox_split_size = 0x7fffffff;
> + mesa_bbox_env_variables.bbox_min_vrtx_count = 0;
> + }
> +
> + mesa_bbox_opt = getenv("MESA_OPT_TRACE_LEVEL");
> + if (mesa_bbox_opt !=NULL)
> + mesa_bbox_env_variables.bbox_trace_level = atoi(mesa_bbox_opt);
> +
> + if (mesa_bbox_env_variables.bbox_enable == MESA_BBOX_ENABLE_AUTO) {
> + /* Android/Linux: enable of Gen7.5, Gen8 and Gen9 */
> + #if (IGFX_GEN == IGFX_GEN9)
> + mesa_bbox_env_variables.bbox_enable = MESA_BBOX_ENABLE_SMART;
> + #else
> + mesa_bbox_env_variables.bbox_enable = MESA_BBOX_ENABLE_OFF;
> + #endif
> + if (mesa_bbox_env_variables.bbox_enable == MESA_BBOX_ENABLE_SMART && 1)
> + {
> + mesa_bbox_env_variables.bbox_enable =
> + MESA_BBOX_ENABLE_FORCE_CLIPPING;
> + }
> + }
> + /* BOUNDING VOLUME: Add initializations based on bounding volume here */
> + switch (gc->volume_type) {
> + case BOUNDING_VOLUME_AABB:
> + if (mesa_bbox_env_variables.bbox_enable) {
> + mesa_bbox_env_variables.bbox_split_size =
> + (mesa_bbox_env_variables.bbox_split_size / 3) * 3;
> + mesa_bbox_env_variables.bbox_min_vrtx_count =
> + (mesa_bbox_env_variables.bbox_min_vrtx_count / 3) * 3;
> +
> + assert(gc->Pipeline.BboxOpt == NULL);
> +
> + gc->Pipeline.BboxOpt = malloc(sizeof(mesa_bbox_opt));
> + if (!gc->Pipeline.BboxOpt) {
> + /* No memory, disable bbox optimization */
> + mesa_bbox_env_variables.bbox_enable =
> + MESA_BBOX_ENABLE_OFF;
> + mesa_bbox_env_variables.bbox_split_size = 0x7fffffff;
> + mesa_bbox_env_variables.bbox_min_vrtx_count = 0;
> + return;
> + }
> + else {
> + gc->Pipeline.BboxOpt->calc_delay = BBOX_CALC_MIN_DELAY;
> + gc->Pipeline.BboxOpt->cache = _mesa_new_bbox_cache();
> +
> + if (!gc->Pipeline.BboxOpt->cache) {
> + free(gc->Pipeline.BboxOpt);
> + MESA_BBOX("MESA_BBOX: Cache creation failed\n");
> + return;
> + }
> + }
> + if (mesa_bbox_env_variables.bbox_trace_level > 0) {
> + MESA_BBOX("\nMESA BBOX OPT config: \
> + bboxOptEnable = %d, bboxOptMinVertexCount = %d,\
> + bboxOptSplitSize = %d\n",
> + mesa_bbox_env_variables.bbox_enable,
> + mesa_bbox_env_variables.bbox_min_vrtx_count,
> + mesa_bbox_env_variables.bbox_split_size);
> + }
> + }
> + /* Initialize some function pointers so that we dont have to
> + * check bounding volume type for every draw call */
> + break;
> + case BOUNDING_VOLUME_OBB:
> + /* Init for OBB bounding volume */
> + break;
> + case BOUNDING_VOLUME_SPHERE:
> + /* Init for SPHERE bounding volume */
> + break;
> + case BOUNDING_VOLUME_DOP:
> + /* Init for DOP bounding volume */
> + break;
> + default:
> + MESA_BBOX("BOUNDING VOLUME TYPE IS INCORRECT\n");
> + break;
> + }
> +}
> +
> +
> +/**
> + * Free resources associated with bounding box optimization.
> + * To be called when context is destroyed
> + */
> +void vbo_bbox_free(struct gl_context* const gc)
> +{
> + assert(gc);
> +
> + if (gc->Pipeline.BboxOpt) {
> +
> + if(gc->Pipeline.BboxOpt->cache)
> + {
> + _mesa_delete_bbox_cache(gc,gc->Pipeline.BboxOpt->cache);
> + gc->Pipeline.BboxOpt->cache = NULL;
> + }
> +
> + if(gc->Pipeline.BboxOpt) {
> + free(gc->Pipeline.BboxOpt);
> + gc->Pipeline.BboxOpt = NULL;
> + }
> + }
> +}
> diff --git a/src/mesa/vbo/vbo_bbox.h b/src/mesa/vbo/vbo_bbox.h
> new file mode 100644
> index 0000000..1dd98f2
> --- /dev/null
> +++ b/src/mesa/vbo/vbo_bbox.h
> @@ -0,0 +1,383 @@
> +/*
> + * Copyright � 2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +/**
> + * \brief VBO BBOX module implementation
> + * \author Kedar Karanje
> + */
> +
> +#ifndef _VBO_BBOX_H_
> +#define _VBO_BBOX_H_
> +
> +#include <stdio.h>
> +#include "main/arrayobj.h"
> +#include "main/glheader.h"
> +#include "main/context.h"
> +#include "main/state.h"
> +#include "main/varray.h"
> +#include "main/bufferobj.h"
> +#include "main/arrayobj.h"
> +#include "main/enums.h"
> +#include "main/macros.h"
> +#include "main/transformfeedback.h"
> +#include "main/mtypes.h"
> +#include "compiler/glsl/ir_uniform.h"
> +#include "main/shaderapi.h"
> +#include "main/uniforms.h"
> +#include "sys/param.h"
> +#include "program/prog_cache.h"
> +/* For Intrinsic functions */
> +#include <smmintrin.h>
> +#include <tmmintrin.h>
> +#include <mmintrin.h>
> +#include <immintrin.h>
> +
> +#if defined(__ANDROID__) || defined(ANDROID)
> +#include <cutils/log.h>
> +#endif
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define FRUSTUM_PLANE_COUNT 6
> +#define BBOX_CALC_MIN_DELAY 2
> +#define BBOX_CALC_MAX_DELAY 5
> +
> +#undef LOG_TAG
> +#define LOG_TAG "MESA_BBOX_LOG"
> +
> +/* Default, driver will select mode for given GPU and OS */
> +#define MESA_BBOX_ENABLE_AUTO -1
> +/* Disable BBOX */
> +#define MESA_BBOX_ENABLE_OFF 0
> +/* Enable BBOX, bot clipping */
> +#define MESA_BBOX_ENABLE_SMART 1
> +/* Enable BBOX, clipping will be done regardless of GPU utilization */
> +#define MESA_BBOX_ENABLE_FORCE_CLIPPING 2
> +/* Enable BBOX, force immediate bbox recalculation and clipping */
> +#define MESA_BBOX_ENABLE_FORCE_RECALC 3
> +
> +/**
> + * MESA BBOX PRINTS
> + * Uncomment below line to enable debugging logs
> + */
> +#define MESA_BBOX_DEBUG 0
> +
> +#if defined(__ANDROID__) || defined (ANDROID)
> +
> +#if MESA_BBOX_DEBUG == 2
> +#define MESA_BBOX_PRINT(...) ALOGE(__VA_ARGS__)
> +#define MESA_BBOX(...) ALOGE(__VA_ARGS__)
> +
> +#elif MESA_BBOX_DEBUG == 1
> +#define MESA_BBOX_PRINT(...) ALOGE(__VA_ARGS__)
> +#define MESA_BBOX(...)
> +
> +#else
> +#define MESA_BBOX_PRINT(...)
> +#define MESA_BBOX(...)
> +#endif //MESA_BBOX_DEBUG
> +
> +#else //ANDROID
> +
> +#if MESA_BBOX_DEBUG == 2
> +#define MESA_BBOX_PRINT(...) printf(__VA_ARGS__)
> +#define MESA_BBOX(...) printf(__VA_ARGS__)
> +
> +#elif MESA_BBOX_DEBUG == 1
> +#define MESA_BBOX_PRINT(...) printf(__VA_ARGS__)
> +#define MESA_BBOX(...)
> +
> +#else
> +#define MESA_BBOX_PRINT(...)
> +#define MESA_BBOX(...)
> +#endif //MESA_BBOX_DEBUG
> +
> +#endif //ANDROID
> +
> +/**
> + * MESA Bbox options environment variables
> + */
> +int env_opt_val;
> +const char *env_opt;
> +
> +typedef struct vbo_bbox_env_variable {
> + GLuint bbox_min_vrtx_count;
> + GLuint bbox_enable;
> + GLuint bbox_split_size;
> + GLuint bbox_trace_level;
> +} bbox_env;
> +bbox_env mesa_bbox_env_variables;
> +
> +void
> +vbo_bbox_init(struct gl_context *const gc);
> +
> +void
> +vbo_bbox_free(struct gl_context *const gc);
> +
> +
> +void
> +vbo_bbox_element_buffer_update(struct gl_context *const gc,
> + struct gl_buffer_object *buffer,
> + const void* data,
> + int offset,
> + int size);
> +
> +void
> +vbo_validated_drawrangeelements(struct gl_context *ctx,
> + GLenum mode,
> + GLboolean index_bounds_valid,
> + GLuint start,
> + GLuint end,
> + GLsizei count,
> + GLenum type,
> + const GLvoid * indices,
> + GLint basevertex,
> + GLuint numInstances,
> + GLuint baseInstance);
> +
> +void
> +vbo_bbox_drawelements(struct gl_context *ctx,
> + GLenum mode,
> + GLboolean index_bounds_valid,
> + GLuint start,
> + GLuint end,
> + GLsizei count,
> + GLenum type,
> + const GLvoid * indices,
> + GLint basevertex,
> + GLuint numInstances,
> + GLuint baseInstance);
> +
> +/**
> + * Segment Functions
> + */
> +typedef struct gl_segment
> +{
> + GLint Left;
> + GLint Right;
> +
> +}segment;
> +
> +/**
> + * Clip algorithm result
> + */
> +enum vbo_bbox_clip_result
> +{
> + BBOX_CLIP_OUTSIDE=0,
> + BBOX_CLIP_INTERSECT=1,
> + BBOX_CLIP_INSIDE=2,
> + BBOX_CLIP_DEGEN = 3,
> + BBOX_CLIP_ERROR = 4,
> +};
> +
> +typedef struct gl_matrixRec
> +{
> + GLfloat melem[4][4];
> + GLenum matrixType;
> +} gl_matrix;
> +
> +/**
> + * Structure to describe plane
> + */
> +typedef struct vbo_bbox_frustum_plane
> +{
> + GLfloat a, b, c, d;
> +} vbo_bbox_frustum_plane;
> +
> +
> +/**
> + * Planes and octants with their normals
> + */
> +typedef struct vbo_bbox_frustum
> +{
> + const unsigned char planeCount;
> + struct vbo_bbox_frustum_plane plane[FRUSTUM_PLANE_COUNT];
> + unsigned char octant[FRUSTUM_PLANE_COUNT];
> +} vbo_bbox_frustum;
> +
> +/*
> + * Axis Aligned Bounding Box
> + */
> +typedef union vbo_vec4f {
> + GLfloat data[4];
> + struct {
> + GLfloat x, y, z, w;
> + };
> +} vbo_vec4f;
> +
> +/*
> + * Oriented Bounding Box
> + */
> +typedef struct oriented_bounding_box {
> + GLfloat x,y,z,w;
> +} oriented_box;
> +
> +/*
> + * Spherical Bounding volume
> + */
> +typedef struct spherical_bounding_volume {
> + GLfloat x,y,z,r; // x^2+y^2+z^2 = r^2
> +} spherical_volume;
> +
> +/*
> + * 8-Discrete oriented polytopes
> + */
> +typedef struct dop_bounding_volume {
> + GLfloat x,y,z,r,a,b;//TBD Not sure of the representation for 8-DOP yet!
> +} dop_volume;
> +
> +/*
> + * Bounding volumes for AABB, OBB, SPHERE, DOP etc
> + */
> +typedef union bounding_volume_info {
> + vbo_vec4f vert_vec4[8]; /* Bbox mix man coordinates */
> + oriented_box obb[8];
> + spherical_volume sphere;
> + dop_volume dop[8];
> +} bounding_volume_info;
> +
> +
> +typedef struct vbo_bbox_cache_key
> +{
> + /* From API call */
> + GLenum mode; /* GL_TRAINGLES are only mode supported currently */
> + GLsizei count; /* Number if indices in draw call */
> + GLenum indices_type; /* must be GL_UNSIGNED_SHORT for now */
> + GLuint type_size;
> + GLuint type_sizeShift;
> + GLint indices; /* Offset to index VBO */
> + GLint basevertex; /* Only 0 supported for now. */
> +
> + /* VBO objects names */
> + GLuint element_buf_name;
> + GLuint vertex_buf_name;
> +
> + /* Vertex position attribute configuration */
> + GLint offset;
> + GLint size; /* Size of attribute, must be 3 for now */
> + GLenum vertDataType; /* Must be GL_FLOAT */
> + GLsizei stride; /* Any */
> + GLuint elementStride;
> +} vbo_bbox_cache_key;
> +
> +typedef struct bounding_info
> +{
> + int vert_count; /* Number of vertices this bbox covers */
> + int start_offset; /* Start offset for this bbox */
> + bool is_degenerate; /* Triangle can not be formed */
> + enum vbo_bbox_clip_result clip_result;
> +
> + bounding_volume_info bounding_volume; /* Bbox mix man coordinates */
> +
> +} bounding_info;
> +
> +/**
> + * Cached information about (multiple) bounding boxes
> + */
> +struct vbo_bbox_cache_data
> +{
> + /* Valid data indicator, will be set to false if VBO have been
> + * modified by application
> + */
> + bool valid;
> +
> + /* Indicates if bounding boxes were calculated for this geometry */
> + bool need_new_calculation;
> +
> + /* Controls delay with which bounding boxes are calculated */
> + int init_delaycnt;
> + int init_delaylimit;
> +
> + /* Data change indicator for VBOs */
> + int vertpos_vbo_changecnt;
> + int indices_vbo_changecnt;
> +
> + /* How many bounding boxes are stored for this geometry */
> + int sub_bbox_cnt;
> +
> + /* How many draws call the bbox was effective */
> + int drawcnt_bbox_helped;
> +
> + int last_use_frame;
> +
> + GLuint hash;
> +
> + unsigned keysize;
> +
> + vbo_bbox_cache_key *key;
> +
> + bool mvp_valid;
> + GLmatrix mvpin;
> + GLfloat mvp[16];
> + vbo_bbox_frustum frustum;
> +
> + /* Pointer to array of bboxes */
> + bounding_info* sub_bbox_array;
> +
> + /* Bounding box that covers whole geometry */
> + bounding_info full_box;
> +
> + struct vbo_bbox_cache_data *next;
> +};
> +
> +typedef struct mesa_bbox_cache {
> + struct vbo_bbox_cache_data **items;
> + struct vbo_bbox_cache_data *last;
> + GLuint size, n_items;
> +} mesa_bbox_cache;
> +
> +typedef struct mesa_bbox_opt
> +{
> + mesa_bbox_cache * cache;
> + GLuint calc_delay;
> +} mesa_bbox_opt;
> +
> +/**
> + * VBO BBox Cache functions, replica of program cache functions
> + *
> + */
> +struct mesa_bbox_cache *
> +_mesa_new_bbox_cache(void);
> +
> +void
> +_mesa_delete_bbox_cache(struct gl_context *ctx,
> + struct mesa_bbox_cache *cache);
> +
> +struct vbo_bbox_cache_data *
> +_mesa_search_bbox_cache(struct mesa_bbox_cache *cache,
> + const void *key, GLuint keysize);
> +
> +void
> +_mesa_bbox_cache_insert(struct gl_context *ctx,
> + struct mesa_bbox_cache *cache,
> + const void *key,
> + GLuint keysize,
> + struct vbo_bbox_cache_data *CachedData);
> +
> +#ifdef __cplusplus
> +} // extern "C"
> +#endif
> +
> +#endif //_VBO_BBOX_H_
> diff --git a/src/mesa/vbo/vbo_bbox_cache.c b/src/mesa/vbo/vbo_bbox_cache.c
> new file mode 100644
> index 0000000..09bd1dd
> --- /dev/null
> +++ b/src/mesa/vbo/vbo_bbox_cache.c
> @@ -0,0 +1,195 @@
> +
> +/*
> + * Copyright � 2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +/**
> + * \brief VBO BBOX module implementation
> + * \author Kedar Karanje
> + */
> +
> +#include "main/imports.h"
> +#include "main/mtypes.h"
> +#include "vbo/vbo_bbox.h"
> +
> +#define CACHE_SIZE 17
> +
> +/**
> + * Compute hash index from state key.
> + */
> +static GLuint
> +bbox_hash_key(const void *key, GLuint key_size)
> +{
> + const GLuint *ikey = (const GLuint *) key;
> + GLuint hash = 0, i;
> +
> + assert(key_size >= 4);
> +
> + /* Make a slightly better attempt at a hash function:
> + */
> + for (i = 0; i < key_size / sizeof(*ikey); i++)
> + {
> + hash += ikey[i];
> + hash += (hash << 10);
> + hash ^= (hash >> 6);
> + }
> +
> + return hash;
> +}
> +
> +
> +/**
> + * Rebuild/expand the hash table to accommodate more entries
> + */
> +static void
> +bbox_rehash(struct mesa_bbox_cache *cache)
> +{
> + struct vbo_bbox_cache_data **items;
> + struct vbo_bbox_cache_data *c, *next;
> + GLuint size, i;
> +
> + cache->last = NULL;
> +
> + size = cache->size * 3;
> + items = calloc(size, sizeof(*items));
> +
> + for (i = 0; i < cache->size; i++)
> + for (c = cache->items[i]; c; c = next) {
> + next = c->next;
> + c->next = items[c->hash % size];
> + items[c->hash % size] = c;
> + }
> +
> + free(cache->items);
> + cache->items = items;
> + cache->size = size;
> +}
> +
> +
> +static void
> +bbox_clear_cache(struct gl_context *ctx, mesa_bbox_cache *cache)
> +{
> + struct vbo_bbox_cache_data *c, *next;
> + GLuint i;
> +
> + cache->last = NULL;
> +
> + for (i = 0; i < cache->size; i++) {
> + for (c = cache->items[i]; c; c = next) {
> + next = c->next;
> + free(c->key);
> + free(c->sub_bbox_array);
> + free(c);
> + }
> + cache->items[i] = NULL;
> + }
> + cache->n_items = 0;
> +}
> +
> +
> +
> +mesa_bbox_cache *
> +_mesa_new_bbox_cache(void)
> +{
> + mesa_bbox_cache *cache = CALLOC_STRUCT(mesa_bbox_cache);
> + if (cache) {
> + cache->size = CACHE_SIZE;
> + cache->items = calloc(cache->size, sizeof(struct vbo_bbox_cache_data));
> + if (!cache->items) {
> + MESA_BBOX("Func:%s cache-size=%d "
> + "Cannot allocate items freeing cache\n",
> + __func__,cache->size);
> + free(cache);
> + return NULL;
> + }
> + MESA_BBOX("Func:%s cache:%#x cache->size=%d \n",
> + __func__,cache,cache->size);
> + return cache;
> + }
> + else {
> + MESA_BBOX("cache is Null in Func:%s\n",__func__);
> + return cache;
> + }
> +}
> +
> +
> +void
> +_mesa_delete_bbox_cache(struct gl_context *ctx, mesa_bbox_cache *cache)
> +{
> + bbox_clear_cache(ctx, cache);
> + free(cache->items);
> + free(cache);
> +}
> +
> +struct vbo_bbox_cache_data *
> +_mesa_search_bbox_cache(mesa_bbox_cache *cache,
> + const void *key, GLuint keysize)
> +{
> + MESA_BBOX("Func:%s cache:%#x \n",__func__,cache);
> + if (cache->last &&
> + cache->last->key->mode == ((vbo_bbox_cache_key *)key)->mode &&
> + cache->last->key->count == ((vbo_bbox_cache_key *)key)->count &&
> + cache->last->key->indices == ((vbo_bbox_cache_key *)key)->indices) {
> + return cache->last;
> + }
> + else {
> + const GLuint hash = bbox_hash_key(key, keysize);
> + struct vbo_bbox_cache_data *c;
> + MESA_BBOX("cache:%#x,hash:%d,cache->size:%d\n",cache,hash,cache->size);
> + for (c = cache->items[hash % cache->size]; c; c = c->next) {
> + if (c->hash == hash &&
> + c->key->mode == ((vbo_bbox_cache_key *)key)->mode &&
> + c->key->count == ((vbo_bbox_cache_key *)key)->count &&
> + c->key->indices == ((vbo_bbox_cache_key *)key)->indices) {
> + cache->last = c;
> + return c;
> + }
> + }
> + return NULL;
> + }
> +}
> +
> +
> +void
> +_mesa_bbox_cache_insert(struct gl_context *ctx,struct mesa_bbox_cache *cache,
> + const void *key, GLuint keysize,
> + struct vbo_bbox_cache_data *CachedData)
> +{
> + const GLuint hash = bbox_hash_key(key, keysize);
> +
> + CachedData->hash = hash;
> +
> + CachedData->key = calloc(1, keysize);
> + memcpy(CachedData->key, key, keysize);
> + CachedData->keysize = keysize;
> +
> + if (cache->n_items > cache->size * 1.5) {
> + if (cache->size < 1000)
> + bbox_rehash(cache);
> + else
> + bbox_clear_cache(ctx, cache);
> + }
> +
> + cache->n_items++;
> + CachedData->next = cache->items[hash % cache->size];
> + cache->items[hash % cache->size] = CachedData;
> +}
> diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c
> index cf9405d..8c608c2 100644
> --- a/src/mesa/vbo/vbo_context.c
> +++ b/src/mesa/vbo/vbo_context.c
> @@ -36,6 +36,9 @@
> #include "vbo.h"
> #include "vbo_private.h"
>
> +#ifdef MESA_BBOX_OPT
> +#include "vbo_bbox.h"
> +#endif
>
> static GLuint
> check_size(const GLfloat *attr)
> @@ -199,6 +202,10 @@ _vbo_CreateContext(struct gl_context *ctx)
> if (ctx->API == API_OPENGL_COMPAT)
> vbo_save_init(ctx);
>
> +#ifdef MESA_BBOX_OPT //Hard coded to AABB for now
> + ctx->volume_type = BOUNDING_VOLUME_AABB;
> +#endif
> +
> vbo->VAO = _mesa_new_vao(ctx, ~((GLuint)0));
> /* The exec VAO assumes to have all arributes bound to binding 0 */
> for (unsigned i = 0; i < VERT_ATTRIB_MAX; ++i)
> @@ -219,7 +226,9 @@ _vbo_DestroyContext(struct gl_context *ctx)
> _ae_destroy_context(ctx);
> ctx->aelt_context = NULL;
> }
> -
> +#ifdef MESA_BBOX_OPT
> + vbo_bbox_free(ctx);
> +#endif
> if (vbo) {
>
> _mesa_reference_buffer_object(ctx, &vbo->binding.BufferObj, NULL);
> diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
> index 51c000e..3321a21 100644
> --- a/src/mesa/vbo/vbo_exec_array.c
> +++ b/src/mesa/vbo/vbo_exec_array.c
> @@ -39,6 +39,10 @@
> #include "main/macros.h"
> #include "main/transformfeedback.h"
>
> +#ifdef MESA_BBOX_OPT
> +#include "vbo_bbox.h"
> +#endif
> +
> typedef struct {
> GLuint count;
> GLuint primCount;
> @@ -784,6 +788,16 @@ skip_draw_elements(struct gl_context *ctx, GLsizei count,
> * Do the rendering for a glDrawElements or glDrawRangeElements call after
> * we've validated buffer bounds, etc.
> */
> +#ifdef MESA_BBOX_OPT
> +void
> +vbo_validated_drawrangeelements(struct gl_context *ctx, GLenum mode,
> + GLboolean index_bounds_valid,
> + GLuint start, GLuint end,
> + GLsizei count, GLenum type,
> + const GLvoid * indices,
> + GLint basevertex, GLuint numInstances,
> + GLuint baseInstance)
> +#else
> static void
> vbo_validated_drawrangeelements(struct gl_context *ctx, GLenum mode,
> GLboolean index_bounds_valid,
> @@ -792,6 +806,7 @@ vbo_validated_drawrangeelements(struct gl_context *ctx, GLenum mode,
> const GLvoid * indices,
> GLint basevertex, GLuint numInstances,
> GLuint baseInstance)
> +#endif
> {
> struct _mesa_index_buffer ib;
> struct _mesa_prim prim;
> @@ -997,6 +1012,11 @@ vbo_exec_DrawElements(GLenum mode, GLsizei count, GLenum type,
> _mesa_debug(ctx, "glDrawElements(%s, %u, %s, %p)\n",
> _mesa_enum_to_string(mode), count,
> _mesa_enum_to_string(type), indices);
> +#ifdef MESA_BBOX_OPT
> + MESA_BBOX_PRINT("glDrawElements(%s, %u, %s, %p)\n",
> + _mesa_enum_to_string(mode), count,
> + _mesa_enum_to_string(type), indices);
> +#endif
>
> FLUSH_FOR_DRAW(ctx);
>
> @@ -1011,9 +1031,13 @@ vbo_exec_DrawElements(GLenum mode, GLsizei count, GLenum type,
> if (!_mesa_validate_DrawElements(ctx, mode, count, type, indices))
> return;
> }
> -
> +#ifdef MESA_BBOX_OPT
> + vbo_bbox_drawelements(ctx, mode, GL_FALSE, 0, ~0,
> + count, type, indices, 0, 1, 0);
> +#else
> vbo_validated_drawrangeelements(ctx, mode, GL_FALSE, 0, ~0,
> count, type, indices, 0, 1, 0);
> +#endif
> }
>
>
> @@ -1045,8 +1069,13 @@ vbo_exec_DrawElementsBaseVertex(GLenum mode, GLsizei count, GLenum type,
> return;
> }
>
> +#ifdef MESA_BBOX_OPT
> + vbo_bbox_drawelements(ctx, mode, GL_FALSE, 0, ~0,
> + count, type, indices, basevertex, 1, 0);
> +#else
> vbo_validated_drawrangeelements(ctx, mode, GL_FALSE, 0, ~0,
> count, type, indices, basevertex, 1, 0);
> +#endif
> }
>
>
> @@ -1078,9 +1107,13 @@ vbo_exec_DrawElementsInstanced(GLenum mode, GLsizei count, GLenum type,
> indices, numInstances))
> return;
> }
> -
> +#ifdef MESA_BBOX_OPT
> + vbo_bbox_drawelements(ctx, mode, GL_FALSE, 0, ~0,
> + count, type, indices, 0, numInstances, 0);
> +#else
> vbo_validated_drawrangeelements(ctx, mode, GL_FALSE, 0, ~0,
> count, type, indices, 0, numInstances, 0);
> +#endif
> }
>
>
> diff --git a/src/util/00-mesa-defaults.conf b/src/util/00-mesa-defaults.conf
> index ad59efb..15f064b 100644
> --- a/src/util/00-mesa-defaults.conf
> +++ b/src/util/00-mesa-defaults.conf
> @@ -197,6 +197,10 @@ TODO: document the other workarounds.
> <option name="force_glsl_extensions_warn" value="true" />
> </application>
>
> + <application name="GLBenchmark" executable="GLBenchmark">
> + <option name="enable_bounding_box_culling" value="true"/>
> + </application>
> +
> <!-- The GL thread whitelist is below, workarounds are above.
> Keep it that way. -->
>
> diff --git a/src/util/xmlpool/t_options.h b/src/util/xmlpool/t_options.h
> index ecf495a..c3f8928 100644
> --- a/src/util/xmlpool/t_options.h
> +++ b/src/util/xmlpool/t_options.h
> @@ -135,6 +135,11 @@ DRI_CONF_OPT_BEGIN_B(allow_glsl_cross_stage_interpolation_mismatch, def) \
> DRI_CONF_DESC(en,gettext("Allow interpolation qualifier mismatch across shader stages")) \
> DRI_CONF_OPT_END
>
> +#define DRI_CONF_ENABLE_BOUNDING_BOX_CULLING(def) \
> +DRI_CONF_OPT_BEGIN_B(enable_bounding_box_culling, def) \
> + DRI_CONF_DESC(en,gettext("Enable bounding box culling in CPU")) \
> +DRI_CONF_OPT_END
> +
> /**
> * \brief Image quality-related options
> */
>
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-dev&data=02%7C01%7Csroland%40vmware.com%7C88128f1bdb6a4e2412a408d60f0833db%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636712925260322656&sdata=UJpBrJBpPuQdXprpIU5YmI3c8GwQiI3B0bYDdH5zNY8%3D&reserved=0
>
More information about the mesa-dev
mailing list