[Mesa-dev] [Mesa-stable] [PATCH] i965/gen6: Fix HiZ hang in WebGL Google maps

Fri Dec 20 15:56:37 PST 2013

On 12/20/2013 03:09 PM, Chad Versace wrote:
> On 12/20/2013 01:36 PM, Kenneth Graunke wrote:
>> On 12/20/2013 04:47 AM, Chad Versace wrote:
>>> We need to emit depth stall flushes before depth and hiz resolves.
>>> Placing them at the top of blorp's state emission fixes the hang.
>>>
>>> Fixes HiZ hang in the new WebGL Google maps on Sandybridge Chrome OS.
>>> Tested by zooming in and out continuously for 2 hours.
>>>
>>> This patch is based on
>>> https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/8bc07bb70163c3706fb4ba5f980e57dc942f56dd
>>>
>>>
>>> CC: mesa-stable at lists.freedesktop.org
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70740
>>> Signed-off-by: Stéphane Marchesin <marcheu at chromium.org>
>>> Signed-off-by: Chad Versace <chad.versace at linux.intel.com>
>>> ---
>>>   src/mesa/drivers/dri/i965/gen6_blorp.cpp | 12 ++++++++++++
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp
>>> b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
>>> index 6a5841f..3a0e7ec 100644
>>> --- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp
>>> +++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
>>> @@ -1012,6 +1012,16 @@ gen6_blorp_emit_primitive(struct brw_context
>>> *brw,
>>>      ADVANCE_BATCH();
>>>   }
>>>
>>> +static void
>>> +gen6_emit_hiz_workaround(struct brw_context *brw, enum gen6_hiz_op
>>> hiz_op)
>>> +{
>>> +   if (hiz_op == GEN6_HIZ_OP_DEPTH_RESOLVE ||
>>> +       hiz_op == GEN6_HIZ_OP_HIZ_RESOLVE) {
>>> +      brw->batch.need_workaround_flush = true;
>>> +      intel_emit_post_sync_nonzero_flush(brw);
>>> +      intel_emit_depth_stall_flushes(brw);
>>> +   }
>>> +}
>>>
>>>   /**
>>>    * \brief Execute a blit or render pass operation.
>>> @@ -1034,6 +1044,8 @@ gen6_blorp_exec(struct brw_context *brw,
>>>      uint32_t wm_bind_bo_offset = 0;
>>>
>>>      uint32_t prog_offset = params->get_wm_prog(brw, &prog_data);
>>> +
>>> +   gen6_emit_hiz_workaround(brw, params->hiz_op);
>>>      gen6_emit_3dstate_multisample(brw, params->dst.num_samples);
>>>      gen6_emit_3dstate_sample_mask(brw,
>>>                                    params->dst.num_samples > 1 ?
>>>
>>
>> I'm fine with landing this as is, since it's trivial and getting
>> Sandybridge stable is really critical.  I'm glad to see this won't
>> happen on Gen7+.
>>
>> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> 
> Ok. Your's is the second r-b for the patch as-is. And this fixes hangs
> affecting users on all distros.  So I'll commit it as-is, with refinements
> to follow as I validate them.
> 
> Validation takes a long time, multiple hours, so I think it unwise to
> postpone
> committing this bugfix until its perfect. The needed build-test-refine
> cycles to
> attain a perfect fix may take a long time. (Thankfully the reproduction
> steps
> are automated now).
> 
>>
>> That said, it would be nice to refine it a little.
>>
>> The intel_emit_post_sync_nonzero_flush(brw) call should not be
>> necessary.  The very first BLORP call, gen6_emit_3dstate_multisample,
>> already calls it.  However, it may not have taken effect in the past,
>> since intel_emit_post_sync_nonzero_flush checks need_workaround_flush.
>>
>> I also am dubious whether we need the depth stall flushes, since we
>> already do them before 3DSTATE_DEPTH_BUFFER, which is documented as
>> necessary.
>>
>> So here's what I think is actually going on:
>>
>> We've seen strong evidence that it's 100% required to do the post-sync
>> nonzero workaround before non-pipelined commands.  (My patches which
>> added missing post-sync workarounds greatly reduced the number of GPU
>> hangs.)
>>
>> I believe that the root of the problem is that needs_workaround_flush is
>> not getting set to true sufficiently often, so the post-sync non-zero
>> workaround isn't happening when necessary.  Currently, we flag it at the
>> start of each batch, and /after/ each BLORP operation.
>>
>> That means it's missing in two cases:
>> 1. Switching from normal rendering to BLORP drawing.
>>     (...fixed by this patch.)
>> 2. Between successive draws (3DPRIMITIVEs) within a batch.
>>     (I believe this is also necessary, and is not fixed by this patch.)
>>
> 
> After thinking about this more this morning, I also suspected that the
> bug's root problem was that need_workaround_flush needed to be set
> at the top of blorp (your #1).
> 
> Why do you suspect #2? Experimentation? Hardware docs?

Intuition?  In many places, the documentation says that "you must do
this kind of flush...unless software can guarantee that the pipeline is
already flushed."  Executing a 3DPRIMITIVE command kicks off rendering,
at which point I'm fairly sure the pipeline is NOT "already flushed."

Basically, needs_workaround_flush is an attempt to avoid flushes because
we've already done them.  Some GPU commands clearly invalidate the
"already flushed" state, but it's not clear which.  3DPRIMITIVE makes
the most sense to me, as it actually makes drawing happen, using the
state that was last programmed.  Subsequent commands begin modifying
that state, so flushing seems necessary.

> 
>> I would love to see an alternate patch which sets need_workaround_flush
>> in two places:
>> 1. Batchbuffer init (for the very first draw)
>> 2. After each 3DPRIMITIVE command.
>>
>> I believe that would fix this hang and potentially other future hangs.
> 
> Agreed. I'll follow-up with patches for that.

Thanks, Chad.