[Mesa-dev] [PATCH] [RFC] i965: Rewrite the HiZ op

Chad Versace chad.versace at linux.intel.com
Sat Feb 4 09:48:30 PST 2012


Comments to your comments, and the new patch is at hiz-kill-meta-op-v12.
I'm still working on gen7.

On 02/03/2012 06:16 PM, Kenneth Graunke wrote:
> On 02/03/2012 03:58 PM, Chad Versace wrote:

>>   /**
>> - * Initialize static data needed for HiZ operations.
>> + * \return true on success
> 
> No it doesn't, it's void! :)

Comment fixed.

>> -static void
>> -gen6_hiz_init(struct brw_context *brw)
>> +void
>> +gen6_hiz_init(struct intel_context *intel)
>>   {
>> +   struct gl_context *ctx =&intel->ctx;
>> +   struct brw_context *brw = brw_context(ctx);
>>      struct brw_hiz_state *hiz =&brw->hiz;
>>
>> +   bool ok;
>> +
>> +   if (hiz->mem_ctx != NULL) {
>> +      /* Already initialized. */
>>         return;
>> +   }
> 
> Hmm.  I thought this would only be called once, at context creation time, so this could be an assert.  But it looks like you're calling it from MakeCurrent, so...I guess that's fine.

I should have added comments explaining why I called at MakeCurrent instead of the obvious places:
at CreateContext or at first call to gen6_hiz_init. I coudln't create programs in
intelCreateContext (even at the function's tail) because the context is not fully created
yet. And I can't at the first call the gen6_hiz_exec because it is illegal to call
glLinkProgram and friends during glBegin/glEnd.

So, I had to do it after eglCreateContext but do it before the first glBegin. That
left eglMakeCurrent as the best choice.

Anyway, that's all moot since that I've killed the shaders. I now just
call gen6_hiz_init() from inside gen6_hiz_exec().

>> +   /* Link GLSL program. */
>> +   struct gl_shader_program *gl_shader_program;
>> +   {
>> +      /* Compile vertex shader. */
>> +      const char *vs_source =
>> +         "attribute vec4 position;\n"
>> +         "void main()\n"
>> +         "{\n"
>> +         "   gl_Position = position;\n"
>> +         "}\n";
>> +      GLuint vs = _mesa_CreateShaderObjectARB(GL_VERTEX_SHADER);
>> +      _mesa_ShaderSourceARB(vs, 1,&vs_source, NULL);
>> +      _mesa_CompileShaderARB(vs);
> 
> Just put the VS in passthrough mode by removing the GEN6_VS_ENABLE bit from your 3DSTATE_VS packet.  Then, it will do the basic gl_Position = gl_Vertex shader for you without all the hassle of shaders.

This was *much* more difficult than just disabling 3DSTATE_VS.VsEnable. The pass-through clipper on gen6
expects to fetch some additional vertex attributes, and the vertices in the vbo happened to squash them.
I finally got this to work by stuffing the attributes into the URB by hacking 3DSTATE_VERTEX_ELEMENTS.

>> +   /* Upload programs into cache.
>> +    *
>> +    * The vs_offset and wm_offset are offsets from the cache's base address,
>> +    * to which CMD_STATE_BASE_ADDRESS.InstructionBaseAddress is set.
>> +    * Therefore, for the offsets to be valid, the cache must be initialized
>> +    * before emitting CMD_STATE_BASE_ADDRESS.
>> +    */
>> +   uint32_t vs_offset;
>> +   uint32_t wm_offset;
>> +   {
>> +      void *junk;
>> +      brw_upload_cache(&brw->cache,
>> +                       BRW_VS_PROG,
>> +                       hiz->vs_key, sizeof(*hiz->vs_key),
>> +                       hiz->vs_instr, hiz->vs_size,
>> +                       NULL, 0,
>> +&vs_offset,&junk);
>> +      brw_upload_cache(&brw->cache,
>> +                       BRW_WM_PROG,
>> +                       hiz->wm_key, sizeof(*hiz->wm_key),
>> +                       hiz->wm_instr, hiz->wm_size,
>> +                       NULL, 0,
>> +&wm_offset,&junk);
>> +   }
> 
> You can drop all this shader stuff.

I've dropped the shaders. Maybe we'll get a little performance boost by cutting
out the vs threads.

> 
>> -   /* Save state. */
>> -   GLint save_drawbuffer;
>> -   GLint save_renderbuffer;
>> -   _mesa_meta_begin(ctx, gen6_hiz_meta_save);
>> -   _mesa_GetIntegerv(fb_get_enum,&save_drawbuffer);
>> -   _mesa_GetIntegerv(GL_RENDERBUFFER_BINDING,&save_renderbuffer);
>> +   /* To ensure that the batch contains only the resolve, flush the batch
>> +    * before beginning and after finishing emitting the resolve packets.
>> +    *
>> +    * Ideally, we would not need to flush for the resolve op. But, I suspect
>> +    * that it's unsafe for CMD_PIPELINE_SELECT to occur multiple times in
>> +    * a single batch, and there is no safe way to ensure that other than by
>> +    * fencing the resolve with flushes. Ideally, we would just detect if
>> +    * a batch is in progress and do the right thing, but that would require
>> +    * the ability to safely manipulate brw_context::state::dirty::brw outside
>> +    * of brw_state_init().
>> +    */
>> +   intel_flush(ctx);
>> +
>> +   /* Emit the following packets:
>> +    *     CMD_PIPELINE_SELECT
>> +    *     3DSTATE_MULTISAMPLE
>> +    *     3DSTATE_SAMPLE_MASK
>> +    *     3DSTATE_GS_SVB_INDEX
>> +    *     CMD_STATE_SIP
>> +    *     CMD_VF_STATISTICS
>> +    */
>> +   brw_invariant_state.emit(brw);
> 
> Unnecessary, perhaps even harmful.  The BRW_NEW_CONTEXT dirty bit is already flagged on every new batch, so brw_invariant_state will already be re-emitted.  

You're assuming that the hiz op is emitted as a consequence of a draw call already in progress, but that's not guaranteed. So we can't assume that
brw_invariant_state.emit() has already been called.

> I think intel_flush followed by invarient_state.emit() will actually cause CMD_PIPELINE_SELECT to be emitted twice, which is the exact thing you were trying to avoid.

> Flushing prior to a resolve is conservative, which may be the right choice, but I don't think you need to.

Since there's no guarantee that CMD_PIPELINE_SELECT has been emitted yet, the only to ensure it is to emit it here.
But the only way to ensure that it's not emitted twice is to flush before doing so. That is, unless we rework
the way dirty bits and brw_state_init() works before Monday.

Or maybe I'm overlooking something here... you've fixed a lot of dirty bit bugs
recectly, and maybe you see something I'm missing.

PIPELINE_SELECT is already set to RENDER.  VF_STATISTICS doesn't need to change.  STATE_SIP is only for debugging and already 0.  The only interesting ones are:
> 
> - 3DSTATE_GS_SVB_INDEX (transform feedback state)
> - 3DSTATE_MULTISAMPLE
> - 3DSTATE_SAMPLE_MASK
> 
> and until we implement MSAA, the last two aren't interesting either.

You're right. I could trim some of the unneeded packets away.

> You definitely -do- need to flush /after/ a resolve, because you've just nerfed all the state and it needs to get put back.

Right. That's also why I flag all the dirty bits.

> 
>> -   /* Initialize context data for HiZ operations. */
>> -   gen6_hiz_init(brw);
>> +   /* CMD_STATE_BASE_ADDRESS
>> +    *
>> +    * From the Sandy Bridge PRM, Volume 1, Part 1, Table STATE_BASE_ADDRESS:
>> +    *
>> +    *     The following commands must be reissued following any change to the
>> +    *     base addresses:
>> +    *         3DSTATE_CC_POINTERS
>> +    *         3DSTATE_BINDING_TABLE_POINTERS
>> +    *         3DSTATE_SAMPLER_STATE_POINTERS
>> +    *         3DSTATE_VIEWPORT_STATE_POINTERS
>> +    *         MEDIA_STATE_POINTERS
>> +    */
>> +   {
>> +      BEGIN_BATCH(10);
>> +      OUT_BATCH(CMD_STATE_BASE_ADDRESS<<  16 | (10 - 2));
>> +      OUT_BATCH(1); /* GeneralStateBaseAddressModifyEnable */
>> +      /* SurfaceStateBaseAddress */
>> +      OUT_RELOC(intel->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0, 1);
>> +      /* DynamicStateBaseAddress */
>> +      OUT_RELOC(intel->batch.bo, (I915_GEM_DOMAIN_RENDER |
>> +                                  I915_GEM_DOMAIN_INSTRUCTION), 0, 1);
>> +      OUT_BATCH(1); /* IndirectObjectBaseAddress */
>> +      /* InstructionBaseAddress */
>> +      OUT_RELOC(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0, 1);
>> +      OUT_BATCH(1); /* GeneralStateUpperBound */
>> +      OUT_BATCH(1); /* DynamicStateUpperBound */
>> +      OUT_BATCH(1); /* IndirectObjectUpperBound*/
>> +      OUT_BATCH(1); /* InstructionAccessUpperBound */
>> +      ADVANCE_BATCH();
>> +   }
> 
> You won't need this if you drop the shader programs.

I still do, because I have to upload DEPTH_STATE_STATE, whose offset is
relative to DynamicStateBaseAddress.

I wonder if it's safe to set SurfaceStateBaseAddress to null.

>> +   /* 3DSTATE_URB
>> +    *
>> +    * Assign the entire URB to the VS.
>> +    *
>> +    * A warning appears in the Sandybridge PRM Volume 2 Part 1 Section 1.4.7
>> +    * 3DSTATE_URB, but here we can safely ignore it:
>> +    *     Because of URB corruption caused by allocating a previous GS unit
>> +    *     URB entry to the VS unit, software is required to send a “GS NULL
>> +    *     Fence” (Send URB fence with VS URB size == 1 and GS URB size == 0)
>> +    *     plus a dummy DRAW call before any case where VS will be taking over
>> +    *     GS URB space.
>> +    * We can ignore it because this batch contains only one draw call.
>> +    */
>> +   {
>> +      /* TODO(chad): gen7 */
>> +      assert(intel->gen == 6);
>> +
>> +      BEGIN_BATCH(3);
>> +      OUT_BATCH(_3DSTATE_URB<<  16 | (3 - 2));
>> +      OUT_BATCH(brw->urb.max_vs_entries<<  GEN6_URB_VS_ENTRIES_SHIFT);
>> +      OUT_BATCH(0);
>> +      ADVANCE_BATCH();
>>      }
> 
> I'm suspicious that this doesn't set the VS URB entry size.  Presumably it works out because, for your VS program above, it ends up being 1.
> 
> If you drop the VS program, you should be able to safely drop this too.

Nope. The VS URB size must be nonzero to prevent a GPU hang.
The spec and simulator agree. I'll add a comment quoting the
spec in the final patch.

>> +   /* 3DSTATE_SAMPLER_STATE_POINTERS
>> +    *
>> +    * The HiZ program uses no samplers.
>> +    */
>> +   {
>> +      /* TODO(chad): gen7 */
>> +      assert(intel->gen == 6);
>> +      BEGIN_BATCH(4);
>> +      OUT_BATCH(_3DSTATE_SAMPLER_STATE_POINTERS<<  16 |
>> +                VS_SAMPLER_STATE_CHANGE |
>> +                GS_SAMPLER_STATE_CHANGE |
>> +                PS_SAMPLER_STATE_CHANGE |
>> +                (4 - 2));
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      ADVANCE_BATCH();
>> +   }
> 
> I'm pretty sure it's harmless to leave the samplers programmed.  Along with SURFACE_STATE, SAMPLER_STATE makes some buffers available to your shaders, if they want to access them.  Nobody says they -have- to use them.

I removed the sampler packet from the new patch.
> 
>> +   /* 3DSTATE_CONSTANT_VS
>> +    *
>> +    * The HiZ program uses no constants.
>> +    */
>> +   {
>> +      /* TODO(chad): gen7 */
>> +      assert(intel->gen == 6);
>> +      assert(hiz->vs_data->nr_params == 0);
>> +      BEGIN_BATCH(5);
>> +      OUT_BATCH(_3DSTATE_CONSTANT_VS<<  16 | (5 - 2));
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      ADVANCE_BATCH();
>> +   }
> 
> Can probably drop this.

Dropped it.


>> +
>> +   /* 3DSTATE_CONSTANT_GS
>> +    *
>> +    * The HiZ program has no geometry shader.
>> +    */
>> +   {
>> +      /* TODO(chad): gen7 */
>> +      assert(intel->gen == 6);
>> +
>> +      BEGIN_BATCH(5);
>> +      OUT_BATCH(_3DSTATE_CONSTANT_GS<<  16 | (5 - 2));
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      ADVANCE_BATCH();
>> +   }
> 
> Can probably drop this.

Dropped that too.

> 
>> +   /* 3DSTATE_GS
>> +    *
>> +    * The HiZ program has no geometry shader.
>> +    */
>> +   {
>> +      /* TODO(chad): gen7 */
>> +      assert(intel->gen == 6);
>> +
>> +      BEGIN_BATCH(7);
>> +      OUT_BATCH(_3DSTATE_GS<<  16 | (7 - 2));
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0<<  GEN6_GS_SAMPLER_COUNT_SHIFT |
>> +                0<<  GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT);
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(1<<  GEN6_GS_DISPATCH_START_GRF_SHIFT |
>> +                0<<  GEN6_GS_URB_READ_LENGTH_SHIFT |
>> +                0<<  GEN6_GS_URB_ENTRY_READ_OFFSET_SHIFT);
>> +      OUT_BATCH(0<<  GEN6_GS_MAX_THREADS_SHIFT |
>> +                GEN6_GS_STATISTICS_ENABLE |
>> +                GEN6_GS_RENDERING_ENABLE);
>> +      OUT_BATCH(0);
>> +      ADVANCE_BATCH();
>> +   }
> 
> I would drop all the shifting for unused values.  Pretty sure you don't need STATISTICS_ENABLE or RENDERING_ENABLE either.

I've removed all that cruft and just have a sequence of OUT_BATCH(0)'s now.

>> +   /* 3DSTATE_CONSTANT_PS
>> +    *
>> +    * Disable the push constant buffer.
>> +    */
>> +   {
>> +      /* TODO(chad): gen7 */
>> +      assert(intel->gen == 6);
>> +
>> +      assert(hiz->wm_data->nr_params == 0);
>> +      BEGIN_BATCH(5);
>> +      OUT_BATCH(_3DSTATE_CONSTANT_PS<<  16 | (5 - 2));
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      OUT_BATCH(0);
>> +      ADVANCE_BATCH();
>> +   }
> 
> Can probably drop this.

Dropped.

> 
>> +   /* 3DSTATE_WM */
>> +   {
>> +      uint32_t dw4 = 0;
>> +      uint32_t dw5 = 0;
>> +
>> +      dw4 |= GEN6_WM_STATISTICS_ENABLE;
>> +
>> +      switch (op) {
>> +      case GEN6_HIZ_OP_DEPTH_CLEAR:
>> +         assert(!"not implemented");
>> +         dw4 |= GEN6_WM_DEPTH_CLEAR;
>> +         break;
>> +      case GEN6_HIZ_OP_DEPTH_RESOLVE:
>> +         dw4 |= GEN6_WM_DEPTH_RESOLVE;
>> +         break;
>> +      case GEN6_HIZ_OP_HIZ_RESOLVE:
>> +         dw4 |= GEN6_WM_HIERARCHICAL_DEPTH_RESOLVE;
>> +         break;
>> +      default:
>> +         assert(0);
>> +         break;
>> +      }
>> +
>> +      dw5 |= (brw->max_wm_threads - 1)<<  GEN6_WM_MAX_THREADS_SHIFT;
>> +
>> +      if (hiz->wm_data->dispatch_width == 8) {
>> +         dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
>> +      } else {
>> +         dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
>> +      }
>> +
>> +      if (hiz->wm_data->prog_offset_16) {
>> +         dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
>> +      }
> 
> Drop the 8/16-wide dispatch enables; you're missing the bit to dispatch any threads at all, so these do nothing.

Oops. I can't believe I missed that. I've set all the dispatch bits to 0.

>> @@ -891,6 +891,11 @@ intelMakeCurrent(__DRIcontext * driContextPriv,
>>         _mesa_make_current(NULL, NULL, NULL);
>>      }
>>
>> +   if (intel->has_hiz) {
>> +      /* Explain why this is here. */
> 
> Yeah! :)
> 
>> +      intel->vtbl.hiz_init(intel);
>> +   }

I killed it, so no need to explain anymore :)


> 
> I like this approach.  Looking forward to seeing a simplified version. Thanks for your hard work, Chad.
> 

Thanks for taking the time to read it. Your suggestion to kill
the shaders has simplified the code a lot.


----
Chad Versace
chad.versace at linux.intel.com


More information about the mesa-dev mailing list