[Mesa-dev] [PATCH v2 12/16] glsl: Add a lowering pass to handle advanced blending modes.
Francisco Jerez
currojerez at riseup.net
Tue Aug 23 17:50:53 UTC 2016
Kenneth Graunke <kenneth at whitecape.org> writes:
> On Monday, August 22, 2016 5:50:49 PM PDT Francisco Jerez wrote:
>> Kenneth Graunke <kenneth at whitecape.org> writes:
>>
>> > Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and
>> > need to emulate it in the pixel shader. This lowering pass implements
>> > all the necessary math for advanced blending. It fetches the existing
>> > framebuffer value using the MESA_shader_framebuffer_fetch built-in
>> > variables, and the previous commit's state var uniform to select
>> > which equation to use.
>> >
>> > This is done at the GLSL IR level to make it easy for all drivers to
>> > implement the GL_KHR_blend_equation_advanced extension and share code.
>> >
>> > Drivers need to hook up MESA_shader_framebuffer_fetch functionality:
>> > 1. Hook up the fb_fetch_output variable
>> > 2. Implement BlendBarrier()
>> >
>> > Then to get KHR_blend_equation_advanced, they simply need to:
>> > 3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled
>> > 4. Call this lowering pass.
>> >
>> > Very little driver specific code should be required.
>> >
>> > v2: Handle multiple output variables per render target (which may exist
>> > due to ARB_enhanced_layouts), and array variables (even with one
>> > render target, we might have out vec4 color[1]), and non-vec4
>> > variables (it's easier than finding spec text to justify not
>> > handling it). Thanks to Francisco Jerez for the feedback.
>> >
>> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> > ---
>> > src/compiler/Makefile.sources | 1 +
>> > src/compiler/glsl/ir_optimization.h | 1 +
>> > .../glsl/lower_blend_equation_advanced.cpp | 557 +++++++++++++++++++++
>> > 3 files changed, 559 insertions(+)
>> > create mode 100644 src/compiler/glsl/lower_blend_equation_advanced.cpp
>> >
>> > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
>> > index 0ff9b23..fe26132 100644
>> > --- a/src/compiler/Makefile.sources
>> > +++ b/src/compiler/Makefile.sources
>> > @@ -77,6 +77,7 @@ LIBGLSL_FILES = \
>> > glsl/loop_analysis.h \
>> > glsl/loop_controls.cpp \
>> > glsl/loop_unroll.cpp \
>> > + glsl/lower_blend_equation_advanced.cpp \
>> > glsl/lower_buffer_access.cpp \
>> > glsl/lower_buffer_access.h \
>> > glsl/lower_const_arrays_to_uniforms.cpp \
>> > diff --git a/src/compiler/glsl/ir_optimization.h b/src/compiler/glsl/ir_optimization.h
>> > index c29260a..3bd6928 100644
>> > --- a/src/compiler/glsl/ir_optimization.h
>> > +++ b/src/compiler/glsl/ir_optimization.h
>> > @@ -151,6 +151,7 @@ void optimize_dead_builtin_variables(exec_list *instructions,
>> > bool lower_tess_level(gl_linked_shader *shader);
>> >
>> > bool lower_vertex_id(gl_linked_shader *shader);
>> > +bool lower_blend_equation_advanced(gl_linked_shader *shader);
>> >
>> > bool lower_subroutine(exec_list *instructions, struct _mesa_glsl_parse_state *state);
>> > void propagate_invariance(exec_list *instructions);
>> > diff --git a/src/compiler/glsl/lower_blend_equation_advanced.cpp b/src/compiler/glsl/lower_blend_equation_advanced.cpp
>> > new file mode 100644
>> > index 0000000..5632865
>> > --- /dev/null
>> > +++ b/src/compiler/glsl/lower_blend_equation_advanced.cpp
>> > @@ -0,0 +1,557 @@
>> > +/*
>> > + * Copyright © 2016 Intel Corporation
>> > + *
>> > + * Permission is hereby granted, free of charge, to any person obtaining a
>> > + * copy of this software and associated documentation files (the "Software"),
>> > + * to deal in the Software without restriction, including without limitation
>> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> > + * and/or sell copies of the Software, and to permit persons to whom the
>> > + * Software is furnished to do so, subject to the following conditions:
>> > + *
>> > + * The above copyright notice and this permission notice (including the next
>> > + * paragraph) shall be included in all copies or substantial portions of the
>> > + * Software.
>> > + *
>> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> > + * DEALINGS IN THE SOFTWARE.
>> > + */
>> > +
>> > +#include "ir.h"
>> > +#include "ir_builder.h"
>> > +#include "ir_optimization.h"
>> > +#include "ir_hierarchical_visitor.h"
>> > +#include "program/prog_instruction.h"
>> > +#include "program/prog_statevars.h"
>> > +#include "util/bitscan.h"
>> > +
>> > +using namespace ir_builder;
>> > +
>> > +#define imm1(x) new(mem_ctx) ir_constant((float) x, 1)
>> > +#define imm3(x) new(mem_ctx) ir_constant((float) x, 3)
>> > +
>>
>> The x argument should probably be between parentheses to avoid surprises
>> in the future -- Or even better, make them static inline functions
>> instead of macros.
>
> Good call, I'll add parenthesis. I made them macros rather than static
> inlines so you can do imm3(0.5) rather than imm3(mem_ctx, 0.5)...it's
> shorter (which is the main point).
>
> [snip]
>
>> > + blend_comps[0], blend_comps[1],
>> > + blend_comps[2], blend_comps[3]);
>> > + }
>> > +
>> > + ir_function_signature *main = get_main(sh);
>> > + ir_factory f(&main->body, mem_ctx);
>> > +
>> > + ir_variable *result_dest =
>> > + calc_blend_result(f, mode, fb, blend_source, sh->info.BlendSupport);
>> > +
>> > + /* Copy the result back to the original values. It would be simpler
>> > + * to demote the program's output variables, and create a new vec4
>> > + * output for our result, but this pass runs before we create the
>> > + * ARB_program_interface_query resource list. So we have to leave
>> > + * the original outputs in place and use them.
>> > + */
>> > + for (int i = 0; i < 4; i++) {
>> > + if (!outputs[i])
>> > + continue;
>> > +
>> > + f.emit(assign(deref_output(outputs[i]), swizzle(result_dest, i, 1),
>> > + 1 << i));
>> > + }
>> > +
>>
>> It looks like this relies on the main function having a single exit
>> point, which AFAICT may not be the case if there are multiple return
>> statements and the lowering pass is called before do_lower_jumps() (as
>> is the case in your i965 code). Several options occur to me:
>>
>> - Rename the shader's main function and generate a new main function
>> that calls into the original and implements the advanced blending
>> epilogue. This has the advantage that it wouldn't force drivers to
>> lower returns if they can support them natively, and is likely to
>> result in better code generation than the next option.
>>
>> - Make the lowering pass re-emit the blending epilogue for each exit
>> point of the main function.
>>
>> - Call do_lower_jumps(lower_main_return=true) manually at the top of
>> lower_blend_equation_advanced() if support for any advanced blending
>> modes was requested in order to make sure the function has the right
>> form. This seems a bit of a hack but I guess it would be the easiest
>> solution.
>
> Ugh. Good point.
>
> Option #2 seems unfortunate - I'd hate to emit this multiple times.
>
> In a world where all functions are inlined, options #1 and #3 seem
> basically equivalent - function inlining lowers early returns. So
> the renamed main would have early returns lowered anyway.
>
> I think I'll take option #3 and change the function to:
>
> bool
> lower_blend_equation_advanced(struct gl_linked_shader *sh)
> {
> if (sh->info.BlendSupport == 0)
> return false;
>
> /* Lower early returns in main() so there's a single exit point
> * where we can insert our lowering code.
> */
> do_lower_jumps(sh->ir, false, false, true, false, false);
I guess option #1 has some slight advantage over option #3 because it
doesn't impose return lowering on the driver and it doesn't cause the
jump lowering pass to be run multiple times, but #3 is so much simpler
that it's hard to justify the additional effort for option #1, so this
sounds like a good plan to me. With the comments above taken into
account patch is:
Reviewed-by: Francisco Jerez <currojerez at riseup.net>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160823/3b3ddb7e/attachment-0001.sig>
More information about the mesa-dev
mailing list