[Mesa-dev] [PATCH] intel/compiler/gen9: Pixel shader header only workaround

Eero Tamminen eero.t.tamminen at intel.com
Tue Oct 3 14:29:12 UTC 2017


Hi,

Is there some reason/problem why this isn't in upstream repo yet?

If not, could somebody push it?  (Topi is currently on vacation)


	- Eero

On 25.09.2017 17:41, Eero Tamminen wrote:
> Ran several tens of rounds on 2 different 18 EU BXT devices with the 
> patch -> no hangs. Earlier that (internal) test-case always hung within 
> 1-3 rounds.
> 
> Tested-by: Eero Tamminen <eero.t.tamminen at intel.com>
> 
> On 25.09.2017 16:39, Topi Pohjolainen wrote:
>> Fixes intermittent GPU hangs on Broxton with an Intel internal
>> test case.
>>
>> There are plenty of similar fragment shaders in piglit that do
>> not use any varyings and any uniforms. According to the
>> documentation special timing is needed between pipeline stages.
>> Apparently we just don't hit that with piglit. Even with the
>> failing test case one doesn't always get the hang.
>>
>> Moreover, according to the error states the hang happens
>> significantly later than the execution of the problematic shader.
>> There are multiple render cycles (primitive submissions) in between.
>> I've also seen error states where the ACTHD points outside the
>> batch. Almost as if the hardware writes somewhere that gets used
>> later on. That would also explain why piglit doesn't suffer from
>> this - most tests kick off one render cycle and any corruption
>> is left unseen.
>>
>> For clarity I chose to make the decision in the compiler only and
>> mark it with a boolean. In principle, constant loaders could make
>> the same decision by examing num_varying_inputs along with push
>> constant details.
>>
>> Alternatively tweaking nr_params in compiler would allow GL driver
>> to be kept as is if one did, for example:
>>
>>     static const gl_constant_value zero = { 0 };
>>     wm_prog_data->base.param[0] = &zero;
>>     wm_prog_data->base.nr_params = 1;
>>
>> This, however, doesn't work for Vulkan which would still need
>> some logic to be added in anv_cmd_buffer_push_constants().
>>
>> In the end I thought future debugging is probably easier when
>> the explicit boolean tells about this corner case.
>>
>> CC: Jason Ekstrand <jason at jlekstrand.net>
>> CC: Eero Tamminen <eero.t.tamminen at intel.com>
>> Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
>> ---
>>   src/intel/compiler/brw_compiler.h               |  7 ++++
>>   src/intel/compiler/brw_fs.cpp                   | 46 
>> +++++++++++++++++++++++++
>>   src/intel/vulkan/anv_cmd_buffer.c               | 22 ++++++++++--
>>   src/intel/vulkan/genX_pipeline.c                |  6 +++-
>>   src/mesa/drivers/dri/i965/gen6_constant_state.c | 17 +++++++--
>>   src/mesa/drivers/dri/i965/genX_state_upload.c   |  3 +-
>>   6 files changed, 93 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_compiler.h 
>> b/src/intel/compiler/brw_compiler.h
>> index 6753a8daf0..8a1c8c85ac 100644
>> --- a/src/intel/compiler/brw_compiler.h
>> +++ b/src/intel/compiler/brw_compiler.h
>> @@ -622,6 +622,13 @@ struct brw_wm_prog_data {
>>      bool contains_noperspective_varying;
>>      /**
>> +    * Tell constant uplaoders, gen6_upload_push_constants() and
>> +    * anv_cmd_buffer_push_constants(), that workaround is needed.
>> +    * See gen9_ps_header_only_workaround().
>> +    */
>> +   bool needs_gen9_ps_header_only_workaround;
>> +
>> +   /**
>>       * Mask of which interpolation modes are required by the fragment 
>> shader.
>>       * Used in hardware setup on gen6+.
>>       */
>> diff --git a/src/intel/compiler/brw_fs.cpp 
>> b/src/intel/compiler/brw_fs.cpp
>> index eb9b4c3890..5f4271fb59 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -6159,6 +6159,48 @@ fs_visitor::run_gs()
>>      return !failed;
>>   }
>> +/* From the SKL PRM, Volume 16, Workarounds:
>> + *
>> + *   0877  3D   Pixel Shader Hang possible when pixel shader 
>> dispatched with
>> + *              only header phases (R0-R2)
>> + *
>> + *   WA: Enable a non-header phase (e.g. push constant) when dispatch 
>> would
>> + *       have been header only.
>> + *
>> + * Additionally from the SKL PRM, Volume 2a, Command Reference,
>> + * 3DSTATE_PS and Push Constant Enable:
>> + *
>> + *   This field must be enabled if the sum of the PS Constant Buffer 
>> [3:0]
>> + *   Read Length fields in 3DSTATE_CONSTANT_PS is nonzero, and must be
>> + *   disabled if the sum is zero.
>> + *
>> + * Therefore one needs to prepare register space for minimum amount of
>> + * constants to be uploaded.
>> + *
>> + * Here it is assumed that assign_curb_setup() has determined the 
>> total amount
>> + * of constants (uniforms + ubos) and therefore it is safe to examine 
>> if the
>> + * workaround is needed.
>> + */
>> +static void
>> +gen9_ps_header_only_workaround(struct brw_wm_prog_data *wm_prog_data,
>> +                               int *first_non_payload_grf)
>> +{
>> +   if (wm_prog_data->num_varying_inputs)
>> +      return;
>> +
>> +   if (wm_prog_data->base.curb_read_length)
>> +      return;
>> +
>> +   assert(wm_prog_data->base.nr_params == 0);
>> +
>> +   wm_prog_data->needs_gen9_ps_header_only_workaround = true;
>> +
>> +   const unsigned wa_upload_size = DIV_ROUND_UP(1, 8);
>> +
>> +   wm_prog_data->base.curb_read_length = wa_upload_size;
>> +   *first_non_payload_grf += wa_upload_size;
>> +}
>> +
>>   bool
>>   fs_visitor::run_fs(bool allow_spilling, bool do_rep_send)
>>   {
>> @@ -6222,6 +6264,10 @@ fs_visitor::run_fs(bool allow_spilling, bool 
>> do_rep_send)
>>         optimize();
>>         assign_curb_setup();
>> +
>> +      if (devinfo->gen >= 9)
>> +         gen9_ps_header_only_workaround(wm_prog_data, 
>> &first_non_payload_grf);
>> +
>>         assign_urb_setup();
>>         fixup_3src_null_dest();
>> diff --git a/src/intel/vulkan/anv_cmd_buffer.c 
>> b/src/intel/vulkan/anv_cmd_buffer.c
>> index 3b59af8f6f..07d45bd5d4 100644
>> --- a/src/intel/vulkan/anv_cmd_buffer.c
>> +++ b/src/intel/vulkan/anv_cmd_buffer.c
>> @@ -641,9 +641,25 @@ anv_cmd_buffer_push_constants(struct 
>> anv_cmd_buffer *cmd_buffer,
>>      const struct brw_stage_prog_data *prog_data =
>>         cmd_buffer->state.pipeline->shaders[stage]->prog_data;
>> -   /* If we don't actually have any push constants, bail. */
>> -   if (data == NULL || prog_data == NULL || prog_data->nr_params == 0)
>> -      return (struct anv_state) { .offset = 0 };
>> +   if (data == NULL || prog_data == NULL || prog_data->nr_params == 0) {
>> +      /* If we don't actually have any push constants, bail. */
>> +      if (stage != MESA_SHADER_FRAGMENT)
>> +         return (struct anv_state) { .offset = 0 };
>> +
>> +      const struct brw_wm_prog_data *wm_prog_data =
>> +         (const struct brw_wm_prog_data *)prog_data;
>> +      if (!wm_prog_data->needs_gen9_ps_header_only_workaround)
>> +         return (struct anv_state) { .offset = 0 };
>> +
>> +      assert(cmd_buffer->device->info.gen >= 9);
>> +      assert(wm_prog_data->num_varying_inputs == 0);
>> +
>> +      struct anv_state workaround_state =
>> +         anv_cmd_buffer_alloc_dynamic_state(cmd_buffer, 1 * 
>> sizeof(float),
>> +                                            32 /* bottom 5 bits MBZ */);
>> +      anv_state_flush(cmd_buffer->device, workaround_state);
>> +      return workaround_state;
>> +   }
>>      struct anv_state state =
>>         anv_cmd_buffer_alloc_dynamic_state(cmd_buffer,
>> diff --git a/src/intel/vulkan/genX_pipeline.c 
>> b/src/intel/vulkan/genX_pipeline.c
>> index c2fa9c0ff7..42854b06b3 100644
>> --- a/src/intel/vulkan/genX_pipeline.c
>> +++ b/src/intel/vulkan/genX_pipeline.c
>> @@ -1506,9 +1506,13 @@ emit_3dstate_ps(struct anv_pipeline *pipeline,
>>         ps.VectorMaskEnable           = true;
>>         ps.SamplerCount               = get_sampler_count(fs_bin);
>>         ps.BindingTableEntryCount     = 
>> get_binding_table_entry_count(fs_bin);
>> -      ps.PushConstantEnable         = wm_prog_data->base.nr_params > 0;
>>         ps.PositionXYOffsetSelect     = wm_prog_data->uses_pos_offset ?
>>                                         POSOFFSET_SAMPLE: POSOFFSET_NONE;
>> +
>> +      ps.PushConstantEnable =
>> +         wm_prog_data->base.nr_params > 0 ||
>> +         wm_prog_data->needs_gen9_ps_header_only_workaround;
>> +
>>   #if GEN_GEN < 8
>>         ps.AttributeEnable            = 
>> wm_prog_data->num_varying_inputs > 0;
>>         ps.oMaskPresenttoRenderTarget = wm_prog_data->uses_omask;
>> diff --git a/src/mesa/drivers/dri/i965/gen6_constant_state.c 
>> b/src/mesa/drivers/dri/i965/gen6_constant_state.c
>> index 72f00d5640..4723cda778 100644
>> --- a/src/mesa/drivers/dri/i965/gen6_constant_state.c
>> +++ b/src/mesa/drivers/dri/i965/gen6_constant_state.c
>> @@ -52,9 +52,9 @@ gen6_upload_push_constants(struct brw_context *brw,
>>      const struct gen_device_info *devinfo = &brw->screen->devinfo;
>>      struct gl_context *ctx = &brw->ctx;
>> -   if (prog_data->nr_params == 0) {
>> -      stage_state->push_const_size = 0;
>> -   } else {
>> +   stage_state->push_const_size = 0;
>> +
>> +   if (prog_data->nr_params) {
>>         /* Updates the ParamaterValues[i] pointers for all parameters 
>> of the
>>          * basic type of PROGRAM_STATE_VAR.
>>          */
>> @@ -119,6 +119,17 @@ gen6_upload_push_constants(struct brw_context *brw,
>>          * The other shader stages all match the VS's limits.
>>          */
>>         assert(stage_state->push_const_size <= 32);
>> +   } else if (stage_state->stage == MESA_SHADER_FRAGMENT &&
>> +              devinfo->gen >= 9) {
>> +      const struct brw_wm_prog_data *wm_prog_data =
>> +         (const struct brw_wm_prog_data *)prog_data;
>> +
>> +      if (wm_prog_data->needs_gen9_ps_header_only_workaround) {
>> +         intel_upload_space(brw, 1 * sizeof(gl_constant_value), 32,
>> +                            &stage_state->push_const_bo,
>> +                            &stage_state->push_const_offset);
>> +         stage_state->push_const_size = ALIGN(1, 8) / 8;
>> +      }
>>      }
>>      stage_state->push_constants_dirty = true;
>> diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
>> b/src/mesa/drivers/dri/i965/genX_state_upload.c
>> index 612761601a..fdb5decf86 100644
>> --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
>> +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
>> @@ -3831,7 +3831,8 @@ genX(upload_ps)(struct brw_context *brw)
>>   #endif
>>         if (prog_data->base.nr_params > 0 ||
>> -          prog_data->base.ubo_ranges[0].length > 0)
>> +          prog_data->base.ubo_ranges[0].length > 0 ||
>> +          prog_data->needs_gen9_ps_header_only_workaround)
>>            ps.PushConstantEnable = true;
>>   #if GEN_GEN < 8
>>
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev



More information about the mesa-dev mailing list