[Mesa-dev] [PATCH] intel: Fix stencil buffer to be W tiled

Ian Romanick idr at freedesktop.org
Mon Jul 18 15:34:06 PDT 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/18/2011 02:24 PM, Chad Versace wrote:
> On 07/18/2011 02:02 PM, Ian Romanick wrote:
>> On 07/18/2011 01:54 PM, Chad Versace wrote:
>>> On 07/18/2011 11:49 AM, Ian Romanick wrote:
>>>> On 07/18/2011 12:55 AM, Chad Versace wrote:
>>>>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>>>>> in several locations the PRM states that it is. However, it is actually
>>>>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>>>>> 4.5.2.1 W-Major Format:
>>>>>     W-Major Tile Format is used for separate stencil.
>>>>
>>>>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>>>>> I915_TILING_NONE and decode the tile's layout in software.
>>>>
>>>>> This fix touches the following portions of code:
>>>>>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>>>>>       buffer with I915_TILING_NONE.
>>>>>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>>>>>       not tiled.
>>>>>     - In the stencil buffer's span functions, the tile's layout must be
>>>>>       decoded in software.
>>>>
>>>>> This commit mutually depends on the xf86-video-intel commit
>>>>>     dri: Do not tile stencil buffer
>>>>>     Author: Chad Versace <chad at chad-versace.us>
>>>>>     Date:   Mon Jul 18 00:38:00 2011 -0700
>>>>
>>>>> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>>>>>     bugs/fdo23670-drawpix_stencil
>>>>>     general/stencil-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>>>>>     spec/EXT_packed_depth_stencil/readpixels-24_8
>>>>
>>>>> Note: This is a candidate for the 7.11 branch.
>>>>
>>>>> CC: Eric Anholt <eric at anholt.net>
>>>>> CC: Kenneth Graunke <kenneth at whitecape.org>
>>>>> Signed-off-by: Chad Versace <chad at chad-versace.us>
>>>>> ---
>>>>>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>>>>>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>>>>>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>>>>>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>>>>>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>>>>>  5 files changed, 89 insertions(+), 33 deletions(-)
>>>>
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
>>>>> index dfca03c..5ab9873 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_clear.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
>>>>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>>>>>  	     */
>>>>>              tri_mask |= BUFFER_BIT_STENCIL;
>>>>>           }
>>>>> +	 else if (intel->has_separate_stencil &&
>>>>> +	       stencilRegion->tiling == I915_TILING_NONE) {
>>>>> +	    /* The stencil buffer is actually W tiled, which the hardware
>>>>> +	     * cannot blit to. */
>>>>> +	    tri_mask |= BUFFER_BIT_STENCIL;
>>>>> +	 }
>>>>>           else {
>>>>>              /* clearing all stencil bits, use blitting */
>>>>>              blit_mask |= BUFFER_BIT_STENCIL;
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
>>>>> index 2ba1363..fe8be08 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_context.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_context.c
>>>>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>>>>>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
>>>>
>>>>> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
>>>>> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
>>>>> +	 /*
>>>>> +	  * The stencil buffer is actually W tiled. The region's tiling is
>>>>> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
>>>>> +	  * fencing.
>>>>> +	  */
>>>>>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>>>>>  	 return;
>>>>>        } else {
>>>>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>>         * Presently, however, no verification or clean up is necessary, and
>>>>>         * execution should not reach here. If the framebuffer still has a hiz
>>>>>         * region, then we have already set dri2_has_hiz to true after
>>>>> -       * confirming above that the stencil buffer is Y tiled.
>>>>> +       * confirming above that the stencil buffer is W tiled.
>>>>>         */
>>>>>        assert(0);
>>>>>     }
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>>> index 1669af2..507cc33 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>>
>>>>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>>>>        /*
>>>>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>>>>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>>>>> +       *
>>>>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>>>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>>>>         *    The pitch must be set to 2x the value computed based on width, as
>>>>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>>>>         * region's cpp and halving its height.
>>>>>         *
>>>>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>>>>> -       * maps the memory incorrectly.
>>>>> +       * If we neglect to double the pitch, then render corruption occurs.
>>>>>         */
>>>>>        irb->region = intel_region_alloc(intel->intelScreen,
>>>>> -				       I915_TILING_Y,
>>>>> +				       I915_TILING_NONE,
>>>>>  				       cpp * 2,
>>>>> -				       width,
>>>>> -				       height / 2,
>>>>> +				       ALIGN(width, 64),
>>>>> +				       /* XXX: Maybe align to 128? */
>>>>> +				       ALIGN(height / 2, 64),
>>>>>  				       GL_TRUE);
>>>>>        if (!irb->region)
>>>>>  	return false;
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
>>>>> index b2013af..9dd6a52 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_screen.h
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
>>>>> @@ -63,9 +63,12 @@
>>>>>   * x8_z24 and s8).
>>>>>   *
>>>>>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
>>>>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
>>>>> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
>>>>> - * nothing happend.
>>>>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
>>>>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
>>>>> + * true and continue as if nothing happend.
>>>>> + *
>>>>> + * [1] The stencil buffer is actually W tiled. However, we request from the
>>>>> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>>>>>   *
>>>>>   * If the buffers are X tiled, however, the handshake has failed and we must
>>>>>   * clean up.
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>>>>> index 153803f..d306432 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>>>>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>>>>     int miny = 0;							\
>>>>>     int maxx = rb->Width;						\
>>>>>     int maxy = rb->Height;						\
>>>>> -   int stride = rb->RowStride;						\
>>>>> -   uint8_t *buf = rb->Data;						\
>>>>> +									\
>>>>> +   /*									\
>>>>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>>>>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>>>>> +    * manually, the region's *real* base address and stride is		\
>>>>> +    * required.								\
>>>>> +    */									\
>>>>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>>>>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>>>>> +   unsigned stride = irb->region->pitch;				\
>>>>> +   unsigned height = 2 * irb->region->height;				\
>>>>> +   bool flip = rb->Name == 0;						\
>>>>
>>>>> -/* Don't flip y. */
>>>>>  #undef Y_FLIP
>>>>> -#define Y_FLIP(y) y
>>>>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
>>>>
>>>>>  /**
>>>>>   * \brief Get pointer offset into stencil buffer.
>>>>>   *
>>>>> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
>>>>> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
>>>>> - * with positive stride (s=stride).
>>>>> - *
>>>>> - *     x    | y     | byte offset
>>>>> - *     --------------------------
>>>>> - *     0    | 0     | 0
>>>>> - *     0    | 1     | 1
>>>>> - *     1    | 0     | 2
>>>>> - *     1    | 1     | 3
>>>>> - *     ...  | ...   | ...
>>>>> - *     0    | 2     | s
>>>>> - *     0    | 3     | s + 1
>>>>> - *     1    | 2     | s + 2
>>>>> - *     1    | 3     | s + 3 
>>>>> - *
>>>>> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
>>>>> + * must decode the tile's layout in software.
>>>>>   *
>>>>> + * See
>>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
>>>>> + *     Format.
>>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>>>>>   */
>>>>> -static inline intptr_t
>>>>> -intel_offset_S8(int stride, GLint x, GLint y)
>>>>> +static inline uintptr_t
>>>>
>>>> Why the type change?
>>
>>> Two reasons.
>>
>>> 1. I redeclared the parameters as unsigned so to generate better code.
>>> Since x, y, and stride are unsigned, the division and modulo operators
>>> generate shift and bit-logic instructions rather than the slower arithmetic
>>> instructions.
>>
>> Is stride always unsigned now? 
> 
> intelSpanRenderStart still sets rb->RowStride to be negative for system
> stencil buffers. But we ignore that and use a positive stride instead.
> To quote the hunk above:
> 
> +   /*									\
> +    * Here we ignore rb->Data and rb->RowStride as set by		\
> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
> +    * manually, the region's *real* base address and stride is		\
> +    * required.							\
> +    */									\
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
> +   uint8_t *buf = irb->region->buffer->virtual;			\
> +   unsigned stride = irb->region->pitch;
> 
> By setting the buf and stride to the *real* base address and stride,
> intel_offset_S8 can implement the exact W-tiling algorithm as described
> in the bspec. (PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling
> Algorithm).
> 
> If intel_offset_S8 instead used rb->Data and rb->RowStride as the buffer's
> base address and stride, then intel_offset_S8 would essentially
> need two implementations, like this:
>     intptr_t
>     intel_offset_S8(...)
>     {
>         if (stride > 0) {
>             // do normal stuff;
>         } else {
>             // do upside down stuff;
>         }
>     }
> 
> I'd like to avoid such a bifurcated function.

That makes perfect sense.  Thanks for clearing that all up.

>> Will it always be in the future? 
> 
> Yes.
> 
>> This is the problem that we encountered with bug #37351 (fixed by e8b1c6d).
> 
> Ah... Thanks for recalling that bug. I believe setting the return type of
> intel_offset_S8 to be intptr_t will eliminate reoccurrence of this bug. Do you agree?

It should.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4ktN4ACgkQX1gOwKyEAw/fxQCeMr45By87f0YxBNU+Fp62C1LV
asMAnitL5MjvaIKXgXUIGFDof8hNHRPL
=5kb4
-----END PGP SIGNATURE-----


More information about the mesa-dev mailing list