<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Nov 14, 2016 at 2:35 AM, Timothy Arceri <span dir="ltr"><<a href="mailto:timothy.arceri@collabora.com" target="_blank">timothy.arceri@collabora.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Sat, 2016-11-12 at 13:34 -0800, Jason Ekstrand wrote:<br>
<br>
In this patch we no longer do:<br>
<br>
ps.RenderTargetResolveEnable = false;<br>
<br>
It would be nice to be consistent and either initialise everything or<br>
just drop all the lines setting things to false/0 and let the memset do<br>
its job.<br></blockquote><div><br></div><div>Yeah... This gets interesting. I don't really like deleting all zero-initialized things because it makes it harder to read if zero is an enum value. If you want to know "what do we set this to?" then you look and don't see it, you know it's zero and you don't know what that means. For instance, the floating point mode, you want to see that it's IEEE but, unless you know that IEEE is the zero default, you don't know which. Also, setting more fields explicitly means it's easier to not forget something; this was a problem on the past.<br><br></div><div>On the other hand... The case you mentioned here is one where there is a field (RenderTargetResolveEnable) that we really don't care about because we don't support fast-clearing through the regular VkPipeline object. In most of these cases, a default value of 0 does exactly what we want so we don't technically need to set it. We could set it, but the fields change from one generation to another so doing so would mean lots of #ifing so that we can override the default value of 0 with different forms of 0 on different gens. It's simpler (and easier to read) to just omit the fields.<br><br></div><div>In the end, I've taken a "set the important/interesting ones" approach which I'll be the first to admit is incredibly fuzzy. However, I think I've been fairly successful at getting pretty good readability. Unfortunately, for people who aren't me, it doesn't provide a particularly good rule-of-thumb. I'm open to suggestions about how to resolve things in a non-nasty way.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> ---<br>
> src/intel/vulkan/gen7_<wbr>pipeline.c | 57 +----------------------<br>
<div><div class="h5">> -----<br>
> src/intel/vulkan/gen8_<wbr>pipeline.c | 38 +------------------<br>
> src/intel/vulkan/genX_<wbr>pipeline_util.h | 70<br>
> ++++++++++++++++++++++++++++++<wbr>+++++<br>
> 3 files changed, 72 insertions(+), 93 deletions(-)<br>
><br>
> diff --git a/src/intel/vulkan/gen7_<wbr>pipeline.c<br>
> b/src/intel/vulkan/gen7_<wbr>pipeline.c<br>
> index 40e1d81..dbec828 100644<br>
> --- a/src/intel/vulkan/gen7_<wbr>pipeline.c<br>
> +++ b/src/intel/vulkan/gen7_<wbr>pipeline.c<br>
> @@ -107,6 +107,7 @@ genX(graphics_pipeline_create)<wbr>(<br>
> emit_3dstate_vs(pipeline);<br>
> emit_3dstate_gs(pipeline);<br>
> emit_3dstate_sbe(pipeline)<wbr>;<br>
> + emit_3dstate_ps(pipeline);<br>
> <br>
> if (!anv_pipeline_has_stage(<wbr>pipeline, MESA_SHADER_FRAGMENT)) {<br>
> anv_batch_emit(&<wbr>pipeline->batch, GENX(3DSTATE_WM), wm) {<br>
> @@ -117,16 +118,7 @@ genX(graphics_pipeline_create)<wbr>(<br>
> wm.<wbr>EarlyDepthStencilControl <wbr> = EDSC_NORMAL;<br>
> wm.<wbr>PointRasterizationRule <wbr> =<br>
> RASTRULE_UPPER_RIGHT;<br>
> }<br>
> -<br>
> - /* Even if no fragments are ever dispatched, the hardware<br>
> hangs if we<br>
> - * don't at least set the maximum number of threads.<br>
> - */<br>
> - anv_batch_emit(&<wbr>pipeline->batch, GENX(3DSTATE_PS), ps) {<br>
> - ps.<wbr>MaximumNumberofThreads = devinfo->max_wm_threads - 1;<br>
> - }<br>
> } else {<br>
> - const struct anv_shader_bin *fs_bin =<br>
> - pipeline->shaders[<wbr>MESA_SHADER_FRAGMENT];<br>
> const struct brw_wm_prog_data *wm_prog_data =<br>
> get_wm_prog_data(pipeline);<br>
> <br>
> if (wm_prog_data->urb_setup[<wbr>VARYING_SLOT_BFC0] != -1 ||<br>
> @@ -135,53 +127,6 @@ genX(graphics_pipeline_create)<wbr>(<br>
> if (wm_prog_data->urb_setup[<wbr>VARYING_SLOT_PRIMITIVE_ID] != -1)<br>
> anv_finishme("<wbr>primitive_id needs sbe swizzling setup");<br>
> <br>
> - anv_batch_emit(&<wbr>pipeline->batch, GENX(3DSTATE_PS), ps) {<br>
> - ps.<wbr>KernelStartPointer0 <wbr>= fs_bin->kernel.offset;<br>
> - ps.<wbr>KernelStartPointer1 <wbr>= 0;<br>
> - ps.<wbr>KernelStartPointer2 <wbr>= fs_bin->kernel.offset +<br>
> - <wbr> wm_prog_data-<br>
> >prog_offset_2;<br>
> -<br>
> - ps.<wbr>ScratchSpaceBasePointer = (struct anv_address) {<br>
> - .bo = anv_scratch_pool_alloc(device, &device-<br>
> >scratch_pool,<br>
> - <wbr> MESA_SHADER_<wbr>FRAGMENT,<br>
> - <wbr> wm_prog_data-<br>
> >base.total_scratch),<br>
> - .offset = 0,<br>
> - };<br>
> - ps.<wbr>PerThreadScratchSpace <wbr>=<br>
> scratch_space(&wm_prog_data-><wbr>base);<br>
> -<br>
> - ps.SamplerCount <wbr> =<br>
> get_sampler_count(fs_bin);<br>
> - ps.<wbr>BindingTableEntryCount <wbr>=<br>
> get_binding_table_entry_count(<wbr>fs_bin);<br>
> -<br>
> - ps.<wbr>MaximumNumberofThreads <wbr>= devinfo->max_wm_threads<br>
> - 1;<br>
> - ps.<wbr>PushConstantEnable <wbr>= wm_prog_data-<br>
> >base.nr_params > 0;<br>
> - ps.AttributeEnable <wbr> = wm_prog_data-<br>
> >num_varying_inputs > 0;<br>
> - ps.<wbr>oMaskPresenttoRenderTarget <wbr>= wm_prog_data-<br>
> >uses_omask;<br>
> -<br>
> - ps.<wbr>RenderTargetFastClearEnable <wbr>= false;<br>
> - ps.<wbr>DualSourceBlendEnable <wbr>= false;<br>
> - ps.<wbr>RenderTargetResolveEnable <wbr>= false;<br>
> -<br>
> - ps.<wbr>PositionXYOffsetSelect <wbr>= wm_prog_data-<br>
> >uses_pos_offset ?<br>
> - <wbr> POSOFFSET_<wbr>SAMPLE :<br>
> POSOFFSET_NONE;<br>
> -<br>
> - ps._<wbr>32PixelDispatchEnable = false;<br>
> - ps._<wbr>16PixelDispatchEnable = wm_prog_data-<br>
> >dispatch_16;<br>
> - ps._<wbr>8PixelDispatchEnable = wm_prog_data-<br>
> >dispatch_8;<br>
> -<br>
> - ps.<wbr>DispatchGRFStartRegisterForCon<wbr>stantSetupData0 =<br>
> - wm_prog_data-><wbr>base.dispatch_grf_start_reg,<br>
> - ps.<wbr>DispatchGRFStartRegisterForCon<wbr>stantSetupData1 = 0,<br>
> - ps.<wbr>DispatchGRFStartRegisterForCon<wbr>stantSetupData2 =<br>
> - wm_prog_data-><wbr>dispatch_grf_start_reg_2;<br>
> -<br>
> - /* Haswell requires the sample mask to be set in this<br>
> packet as well as<br>
> - * in 3DSTATE_SAMPLE_MASK; the values should match. */<br>
> - /* _NEW_BUFFERS, _NEW_MULTISAMPLE */<br>
> -#if GEN_IS_HASWELL<br>
> - ps.SampleMask <wbr> = 0xff;<br>
> -#endif<br>
> - }<br>
> -<br>
> uint32_t samples = pCreateInfo->pMultisampleState ?<br>
> <wbr>pCreateInfo-><wbr>pMultisampleState-<br>
> >rasterizationSamples : 1;<br>
> <br>
> diff --git a/src/intel/vulkan/gen8_<wbr>pipeline.c<br>
> b/src/intel/vulkan/gen8_<wbr>pipeline.c<br>
> index f2499dc..56eb032 100644<br>
> --- a/src/intel/vulkan/gen8_<wbr>pipeline.c<br>
> +++ b/src/intel/vulkan/gen8_<wbr>pipeline.c<br>
> @@ -112,49 +112,13 @@ genX(graphics_pipeline_create)<wbr>(<br>
> emit_3dstate_gs(pipeline);<br>
> emit_3dstate_vs(pipeline);<br>
> emit_3dstate_sbe(pipeline)<wbr>;<br>
> + emit_3dstate_ps(pipeline);<br>
> <br>
> - const int num_thread_bias = GEN_GEN == 8 ? 2 : 1;<br>
> if (!anv_pipeline_has_stage(<wbr>pipeline, MESA_SHADER_FRAGMENT)) {<br>
> - anv_batch_emit(&<wbr>pipeline->batch, GENX(3DSTATE_PS), ps);<br>
> anv_batch_emit(&<wbr>pipeline->batch, GENX(3DSTATE_PS_EXTRA),<br>
> extra) {<br>
> extra.<wbr>PixelShaderValid = false;<br>
> }<br>
> } else {<br>
> - const struct anv_shader_bin *fs_bin =<br>
> - pipeline->shaders[<wbr>MESA_SHADER_FRAGMENT];<br>
> -<br>
> - anv_batch_emit(&<wbr>pipeline->batch, GENX(3DSTATE_PS), ps) {<br>
> - ps.<wbr>KernelStartPointer0 = fs_bin->kernel.offset;<br>
> - ps.<wbr>KernelStartPointer1 = 0;<br>
> - ps.<wbr>KernelStartPointer2 = fs_bin->kernel.offset +<br>
> - <wbr> wm_prog_data->prog_<wbr>offset_2;<br>
> - ps._<wbr>8PixelDispatchEnable = wm_prog_data->dispatch_8;<br>
> - ps._<wbr>16PixelDispatchEnable = wm_prog_data->dispatch_16;<br>
> - ps._<wbr>32PixelDispatchEnable = false;<br>
> - ps.<wbr>SingleProgramFlow = false;<br>
> - ps.VectorMaskEnable <wbr> = true;<br>
> - ps.SamplerCount <wbr> = get_sampler_count(fs_bin);<br>
> - ps.<wbr>BindingTableEntryCount =<br>
> get_binding_table_entry_count(<wbr>fs_bin);<br>
> - ps.<wbr>PushConstantEnable = wm_prog_data->base.nr_params ><br>
> 0;<br>
> - ps.<wbr>PositionXYOffsetSelect = wm_prog_data->uses_pos_offset<br>
> ?<br>
> - POSOFFSET_SAMPLE: POSOFFSET_NONE;<br>
> -<br>
> - ps.<wbr>MaximumNumberofThreadsPerPSD = 64 - num_thread_bias;<br>
> -<br>
> - ps.<wbr>ScratchSpaceBasePointer = (struct anv_address) {<br>
> - .bo = anv_scratch_pool_alloc(device, &device-<br>
> >scratch_pool,<br>
> - <wbr> MESA_SHADER_<wbr>FRAGMENT,<br>
> - <wbr> wm_prog_data-<br>
> >base.total_scratch),<br>
> - .offset = 0,<br>
> - };<br>
> - ps.<wbr>PerThreadScratchSpace = scratch_space(&wm_prog_data-<br>
> >base);<br>
> -<br>
> - ps.<wbr>DispatchGRFStartRegisterForCon<wbr>stantSetupData0 =<br>
> - wm_prog_data-><wbr>base.dispatch_grf_start_reg;<br>
> - ps.<wbr>DispatchGRFStartRegisterForCon<wbr>stantSetupData1 = 0;<br>
> - ps.<wbr>DispatchGRFStartRegisterForCon<wbr>stantSetupData2 =<br>
> - wm_prog_data-><wbr>dispatch_grf_start_reg_2;<br>
> - }<br>
> <br>
> anv_batch_emit(&<wbr>pipeline->batch, GENX(3DSTATE_PS_EXTRA), ps) {<br>
> ps.PixelShaderValid <wbr> = true;<br>
> diff --git a/src/intel/vulkan/genX_<wbr>pipeline_util.h<br>
> b/src/intel/vulkan/genX_<wbr>pipeline_util.h<br>
> index 3906529..1215e9b 100644<br>
> --- a/src/intel/vulkan/genX_<wbr>pipeline_util.h<br>
> +++ b/src/intel/vulkan/genX_<wbr>pipeline_util.h<br>
> @@ -1134,4 +1134,74 @@ emit_3dstate_gs(struct anv_pipeline *pipeline)<br>
> }<br>
> }<br>
> <br>
> +static void<br>
> +emit_3dstate_ps(struct anv_pipeline *pipeline)<br>
> +{<br>
> + MAYBE_UNUSED const struct gen_device_info *devinfo = &pipeline-<br>
> >device->info;<br>
> + const struct anv_shader_bin *fs_bin =<br>
> + pipeline->shaders[MESA_<wbr>SHADER_FRAGMENT];<br>
> +<br>
> + if (!anv_pipeline_has_stage(<wbr>pipeline, MESA_SHADER_FRAGMENT)) {<br>
> + anv_batch_emit(&<wbr>pipeline->batch, GENX(3DSTATE_PS), ps) {<br>
> +#if GEN_GEN == 7<br>
> + /* Even if no fragments are ever dispatched, gen7 hardware<br>
> hangs if<br>
> + * we don't at least set the maximum number of threads.<br>
> + */<br>
> + ps.<wbr>MaximumNumberofThreads = devinfo->max_wm_threads - 1;<br>
> +#endif<br>
> + }<br>
> + return;<br>
> + }<br>
> +<br>
> + const struct brw_wm_prog_data *wm_prog_data =<br>
> get_wm_prog_data(pipeline);<br>
> +<br>
> + anv_batch_emit(&pipeline-><wbr>batch, GENX(3DSTATE_PS), ps) {<br>
> + ps.KernelStartPointer0 <wbr> = fs_bin->kernel.offset;<br>
> + ps.KernelStartPointer1 <wbr> = 0;<br>
> + ps.KernelStartPointer2 <wbr> = fs_bin->kernel.offset +<br>
> + <wbr> wm_prog_data->prog_<wbr>offset_2;<br>
> + ps._<wbr>8PixelDispatchEnable = wm_prog_data->dispatch_8;<br>
> + ps._<wbr>16PixelDispatchEnable = wm_prog_data->dispatch_16;<br>
> + ps._<wbr>32PixelDispatchEnable = false;<br>
> +<br>
> + ps.SingleProgramFlow <wbr> = false;<br>
> + ps.VectorMaskEnable <wbr> = true;<br>
> + ps.SamplerCount <wbr> = get_sampler_count(fs_bin);<br>
> + ps.<wbr>BindingTableEntryCount =<br>
> get_binding_table_entry_count(<wbr>fs_bin);<br>
> + ps.PushConstantEnable <wbr> = wm_prog_data->base.nr_params > 0;<br>
> + ps.<wbr>PositionXYOffsetSelect = wm_prog_data->uses_pos_offset ?<br>
> + <wbr> POSOFFSET_SAMPLE: POSOFFSET_NONE;<br>
> +#if GEN_GEN < 8<br>
> + ps.AttributeEnable <wbr> = wm_prog_data->num_varying_<wbr>inputs<br>
> > 0;<br>
> + ps.<wbr>oMaskPresenttoRenderTarget = wm_prog_data->uses_omask;<br>
<br>
<br>
</div></div>Again if we are going to both trying to align the = column now is the<br>
time to get this right while we are already moving everything.<span class=""><br></span></blockquote><div><br></div><div>I'll fix that up<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
> + ps.<wbr>DualSourceBlendEnable = wm_prog_data->dual_src_blend;<br>
<br>
<br>
</span>This was previously false for gen7. If this is a fix maybe mention it<br>
in the commit message or even a separate patch?<span class=""><br></span></blockquote><div><br></div><div>That should go in its own patch.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
> +#endif<br>
> +<br>
> +#if GEN_IS_HASWELL<br>
> + /* Haswell requires the sample mask to be set in this packet<br>
> as well<br>
> + * as in 3DSTATE_SAMPLE_MASK; the values should match.<br>
> + */<br>
> + ps.SampleMask <wbr> = 0xff;<br>
> +#endif<br>
> +<br>
> +#if GEN_GEN >= 9<br>
> + ps.<wbr>MaximumNumberofThreadsPerPSD <wbr>= 64 - 1;<br>
> +#elif GEN_GEN >= 8<br>
<br>
<br>
</span>This would make more sense as GEN_GEN == 8<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
> + ps.<wbr>MaximumNumberofThreadsPerPSD <wbr>= 64 - 2;<br>
> +#else<br>
> + ps.<wbr>MaximumNumberofThreads <wbr>= devinfo->max_wm_threads -<br>
> 1;<br>
> +#endif<br>
> +<br>
> + ps.<wbr>DispatchGRFStartRegisterForCon<wbr>stantSetupData0 =<br>
> + wm_prog_data->base.<wbr>dispatch_grf_start_reg;<br>
> + ps.<wbr>DispatchGRFStartRegisterForCon<wbr>stantSetupData1 = 0;<br>
> + ps.<wbr>DispatchGRFStartRegisterForCon<wbr>stantSetupData2 =<br>
> + wm_prog_data-><wbr>dispatch_grf_start_reg_2;<br>
> +<br>
> + ps.<wbr>PerThreadScratchSpace = get_scratch_space(fs_bin);<br>
> + ps.<wbr>ScratchSpaceBasePointer =<br>
> + get_scratch_address(<wbr>pipeline, MESA_SHADER_FRAGMENT,<br>
> fs_bin);<br>
> + }<br>
> +}<br>
> +<br>
> #endif /* GENX_PIPELINE_UTIL_H */<br>
</div></div></blockquote></div><br></div></div>