[Mesa-dev] [PATCH v2 12/16] glsl: Add a lowering pass to handle advanced blending modes.

Tue Aug 23 17:50:53 UTC 2016

Kenneth Graunke <kenneth at whitecape.org> writes:

> On Monday, August 22, 2016 5:50:49 PM PDT Francisco Jerez wrote:
>> Kenneth Graunke <kenneth at whitecape.org> writes:
>> 
>> > Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and
>> > need to emulate it in the pixel shader.  This lowering pass implements
>> > all the necessary math for advanced blending.  It fetches the existing
>> > framebuffer value using the MESA_shader_framebuffer_fetch built-in
>> > variables, and the previous commit's state var uniform to select
>> > which equation to use.
>> >
>> > This is done at the GLSL IR level to make it easy for all drivers to
>> > implement the GL_KHR_blend_equation_advanced extension and share code.
>> >
>> > Drivers need to hook up MESA_shader_framebuffer_fetch functionality:
>> > 1. Hook up the fb_fetch_output variable
>> > 2. Implement BlendBarrier()
>> >
>> > Then to get KHR_blend_equation_advanced, they simply need to:
>> > 3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled
>> > 4. Call this lowering pass.
>> >
>> > Very little driver specific code should be required.
>> >
>> > v2: Handle multiple output variables per render target (which may exist
>> >     due to ARB_enhanced_layouts), and array variables (even with one
>> >     render target, we might have out vec4 color[1]), and non-vec4
>> >     variables (it's easier than finding spec text to justify not
>> >     handling it).  Thanks to Francisco Jerez for the feedback.
>> >
>> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> > ---
>> >  src/compiler/Makefile.sources                      |   1 +
>> >  src/compiler/glsl/ir_optimization.h                |   1 +
>> >  .../glsl/lower_blend_equation_advanced.cpp         | 557 +++++++++++++++++++++
>> >  3 files changed, 559 insertions(+)
>> >  create mode 100644 src/compiler/glsl/lower_blend_equation_advanced.cpp
>> >
>> > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
>> > index 0ff9b23..fe26132 100644
>> > --- a/src/compiler/Makefile.sources
>> > +++ b/src/compiler/Makefile.sources
>> > @@ -77,6 +77,7 @@ LIBGLSL_FILES = \
>> >  	glsl/loop_analysis.h \
>> >  	glsl/loop_controls.cpp \
>> >  	glsl/loop_unroll.cpp \
>> > +	glsl/lower_blend_equation_advanced.cpp \
>> >  	glsl/lower_buffer_access.cpp \
>> >  	glsl/lower_buffer_access.h \
>> >  	glsl/lower_const_arrays_to_uniforms.cpp \
>> > diff --git a/src/compiler/glsl/ir_optimization.h b/src/compiler/glsl/ir_optimization.h
>> > index c29260a..3bd6928 100644
>> > --- a/src/compiler/glsl/ir_optimization.h
>> > +++ b/src/compiler/glsl/ir_optimization.h
>> > @@ -151,6 +151,7 @@ void optimize_dead_builtin_variables(exec_list *instructions,
>> >  bool lower_tess_level(gl_linked_shader *shader);
>> >  
>> >  bool lower_vertex_id(gl_linked_shader *shader);
>> > +bool lower_blend_equation_advanced(gl_linked_shader *shader);
>> >  
>> >  bool lower_subroutine(exec_list *instructions, struct _mesa_glsl_parse_state *state);
>> >  void propagate_invariance(exec_list *instructions);
>> > diff --git a/src/compiler/glsl/lower_blend_equation_advanced.cpp b/src/compiler/glsl/lower_blend_equation_advanced.cpp
>> > new file mode 100644
>> > index 0000000..5632865
>> > --- /dev/null
>> > +++ b/src/compiler/glsl/lower_blend_equation_advanced.cpp
>> > @@ -0,0 +1,557 @@
>> > +/*
>> > + * Copyright © 2016 Intel Corporation
>> > + *
>> > + * Permission is hereby granted, free of charge, to any person obtaining a
>> > + * copy of this software and associated documentation files (the "Software"),
>> > + * to deal in the Software without restriction, including without limitation
>> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> > + * and/or sell copies of the Software, and to permit persons to whom the
>> > + * Software is furnished to do so, subject to the following conditions:
>> > + *
>> > + * The above copyright notice and this permission notice (including the next
>> > + * paragraph) shall be included in all copies or substantial portions of the
>> > + * Software.
>> > + *
>> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> > + * DEALINGS IN THE SOFTWARE.
>> > + */
>> > +
>> > +#include "ir.h"
>> > +#include "ir_builder.h"
>> > +#include "ir_optimization.h"
>> > +#include "ir_hierarchical_visitor.h"
>> > +#include "program/prog_instruction.h"
>> > +#include "program/prog_statevars.h"
>> > +#include "util/bitscan.h"
>> > +
>> > +using namespace ir_builder;
>> > +
>> > +#define imm1(x) new(mem_ctx) ir_constant((float) x, 1)
>> > +#define imm3(x) new(mem_ctx) ir_constant((float) x, 3)
>> > +
>> 
>> The x argument should probably be between parentheses to avoid surprises
>> in the future -- Or even better, make them static inline functions
>> instead of macros.
>
> Good call, I'll add parenthesis.  I made them macros rather than static
> inlines so you can do imm3(0.5) rather than imm3(mem_ctx, 0.5)...it's
> shorter (which is the main point).
>
> [snip]
>
>> > +                                    blend_comps[0], blend_comps[1],
>> > +                                    blend_comps[2], blend_comps[3]);
>> > +   }
>> > +
>> > +   ir_function_signature *main = get_main(sh);
>> > +   ir_factory f(&main->body, mem_ctx);
>> > +
>> > +   ir_variable *result_dest =
>> > +      calc_blend_result(f, mode, fb, blend_source, sh->info.BlendSupport);
>> > +
>> > +   /* Copy the result back to the original values.  It would be simpler
>> > +    * to demote the program's output variables, and create a new vec4
>> > +    * output for our result, but this pass runs before we create the
>> > +    * ARB_program_interface_query resource list.  So we have to leave
>> > +    * the original outputs in place and use them.
>> > +    */
>> > +   for (int i = 0; i < 4; i++) {
>> > +      if (!outputs[i])
>> > +         continue;
>> > +
>> > +      f.emit(assign(deref_output(outputs[i]), swizzle(result_dest, i, 1),
>> > +                    1 << i));
>> > +   }
>> > +
>> 
>> It looks like this relies on the main function having a single exit
>> point, which AFAICT may not be the case if there are multiple return
>> statements and the lowering pass is called before do_lower_jumps() (as
>> is the case in your i965 code).  Several options occur to me:
>> 
>>  - Rename the shader's main function and generate a new main function
>>    that calls into the original and implements the advanced blending
>>    epilogue.  This has the advantage that it wouldn't force drivers to
>>    lower returns if they can support them natively, and is likely to
>>    result in better code generation than the next option.
>> 
>>  - Make the lowering pass re-emit the blending epilogue for each exit
>>    point of the main function.
>> 
>>  - Call do_lower_jumps(lower_main_return=true) manually at the top of
>>    lower_blend_equation_advanced() if support for any advanced blending
>>    modes was requested in order to make sure the function has the right
>>    form.  This seems a bit of a hack but I guess it would be the easiest
>>    solution.
>
> Ugh.  Good point.
>
> Option #2 seems unfortunate - I'd hate to emit this multiple times.
>
> In a world where all functions are inlined, options #1 and #3 seem
> basically equivalent - function inlining lowers early returns.  So
> the renamed main would have early returns lowered anyway.
>
> I think I'll take option #3 and change the function to:
>
> bool
> lower_blend_equation_advanced(struct gl_linked_shader *sh)
> {
>    if (sh->info.BlendSupport == 0)
>       return false;
>
>    /* Lower early returns in main() so there's a single exit point
>     * where we can insert our lowering code.
>     */
>    do_lower_jumps(sh->ir, false, false, true, false, false);

I guess option #1 has some slight advantage over option #3 because it
doesn't impose return lowering on the driver and it doesn't cause the
jump lowering pass to be run multiple times, but #3 is so much simpler
that it's hard to justify the additional effort for option #1, so this
sounds like a good plan to me.  With the comments above taken into
account patch is:

Reviewed-by: Francisco Jerez <currojerez at riseup.net>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160823/3b3ddb7e/attachment-0001.sig>