[Mesa-dev] [PATCH] radeonsi: allow unaligned vertex buffer offsets and strides on CIK-VI

Nicolai Hähnle nhaehnle at gmail.com
Tue Feb 14 12:48:57 UTC 2017


On 14.02.2017 11:58, Marek Olšák wrote:
>
>
> On Feb 14, 2017 9:06 AM, "Nicolai Hähnle" <nhaehnle at gmail.com
> <mailto:nhaehnle at gmail.com>> wrote:
>
>     On 13.02.2017 18:01, Marek Olšák wrote:
>
>         From: Marek Olšák <marek.olsak at amd.com <mailto:marek.olsak at amd.com>>
>
>         So that we can disable u_vbuf for GL core profiles.
>
>         This is a v2 of the previous VI-only patch.
>         It requires SH_MEM_CONFIG.ALIGNMENT_MODE = UNALIGNED on CIK-VI.
>         ---
>          src/gallium/drivers/radeonsi/si_descriptors.c |  1 -
>          src/gallium/drivers/radeonsi/si_pipe.c        | 12 +++++++++---
>          2 files changed, 9 insertions(+), 4 deletions(-)
>
>         diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c
>         b/src/gallium/drivers/radeonsi/si_descriptors.c
>         index 3c98176..8abbf10 100644
>         --- a/src/gallium/drivers/radeonsi/si_descriptors.c
>         +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
>         @@ -1014,21 +1014,20 @@ bool
>         si_upload_vertex_buffer_descriptors(struct si_context *sctx)
>                                  * up so that the hardware sees four
>         components as
>                                  * being inside the buffer if and only
>         if the first
>                                  * three components are in the buffer.
>                                  *
>                                  * Since the offset and stride are
>         guaranteed to be
>                                  * 4-byte aligned, this alignment will
>         never cross the
>                                  * winsys buffer boundary.
>                                  */
>                                 size3 = (fix_size3 >> (2 * i)) & 3;
>                                 if (vb->stride && size3) {
>         -                               assert(offset % 4 == 0 &&
>         vb->stride % 4 == 0);
>                                         assert(size3 <= 2);
>                                         desc[2] = align(desc[2], size3 * 2);
>
>
>     I think there's still a bug with this. Consider the following setup:
>
>     3-element GL_UNSIGNED_BYTE vertex attribute, stride = 3, pointing at
>     an offset e.g. 12 bytes before the end of a buffer for a total of 4
>     vertices.
>
>     We use unaligned 8_8_8_8 to read this vertex attribute, but when the
>     last vertex is read, its last byte (of the extended 8_8_8_8) is
>     out-of-bounds of the buffer. The hardware does bounds checks
>     all-or-nothing (i.e., not per-channel) and drops the entire load as
>     if the vertex data were out-of-bounds.
>
>     This is admittedly a bit of a ridiculous case that should not
>     prevent us from dropping u_vbuf. It'll probably need a shader
>     work-around of loading each channel individually, like with your
>     recent GL_DOUBLE patch. At least I can't think of anything else
>     right now.
>
>     In any case, the comment above that code snippet is no longer really
>     correct.
>
>
> We can also say that such cases don't exist in the real world and call
> it a day.

I think we should strive for better than that.


> Or we can increase the buffer size and potentially risk a VM
> fault.
>
> The small problem with fix_fetch is that we have only one unused value
> left in the 4 bits, but we need one fix for 888 and another for 161616.

I see the problem. To be honest, I'd still prefer a proper fix, even if 
it requires an additional bit there, but I guess we can increase the 
buffer size and print a warning when it crosses a page boundary.

Nicolai


>
> Marek
>
>
>     Cheers,
>     Nicolai
>
>
>
>                                 }
>                         }
>
>                         desc[3] = velems->rsrc_word3[i];
>
>                         if (first_vb_use_mask & (1 << i)) {
>                                 radeon_add_to_buffer_list(&sctx->b,
>         &sctx->b.gfx,
>                                                       (struct
>         r600_resource*)vb->buffer,
>         diff --git a/src/gallium/drivers/radeonsi/si_pipe.c
>         b/src/gallium/drivers/radeonsi/si_pipe.c
>         index 8806027..ec324b8 100644
>         --- a/src/gallium/drivers/radeonsi/si_pipe.c
>         +++ b/src/gallium/drivers/radeonsi/si_pipe.c
>         @@ -353,23 +353,20 @@ static int si_get_param(struct
>         pipe_screen* pscreen, enum pipe_cap param)
>                 case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER:
>                 case PIPE_CAP_SM3:
>                 case PIPE_CAP_SEAMLESS_CUBE_MAP:
>                 case PIPE_CAP_PRIMITIVE_RESTART:
>                 case PIPE_CAP_CONDITIONAL_RENDER:
>                 case PIPE_CAP_TEXTURE_BARRIER:
>                 case PIPE_CAP_INDEP_BLEND_ENABLE:
>                 case PIPE_CAP_INDEP_BLEND_FUNC:
>                 case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
>                 case PIPE_CAP_VERTEX_COLOR_UNCLAMPED:
>         -       case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
>         -       case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
>         -       case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
>                 case PIPE_CAP_USER_INDEX_BUFFERS:
>                 case PIPE_CAP_USER_CONSTANT_BUFFERS:
>                 case PIPE_CAP_START_INSTANCE:
>                 case PIPE_CAP_NPOT_TEXTURES:
>                 case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
>                 case PIPE_CAP_MIXED_COLOR_DEPTH_BITS:
>                 case PIPE_CAP_VERTEX_COLOR_CLAMPED:
>                 case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
>                  case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
>                 case PIPE_CAP_TGSI_INSTANCEID:
>         @@ -455,20 +452,29 @@ static int si_get_param(struct
>         pipe_screen* pscreen, enum pipe_cap param)
>
>                 case PIPE_CAP_GLSL_FEATURE_LEVEL:
>                         if (si_have_tgsi_compute(sscreen))
>                                 return 450;
>                         return HAVE_LLVM >= 0x0309 ? 420 :
>                                HAVE_LLVM >= 0x0307 ? 410 : 330;
>
>                 case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
>                         return MIN2(sscreen->b.info.max_alloc_size,
>         INT_MAX);
>
>         +       case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
>         +       case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
>         +       case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
>         +               /* SI doesn't support unaligned loads.
>         +                * CIK needs DRM 2.50.0 on radeon. */
>         +               return sscreen->b.chip_class == SI ||
>         +                      (sscreen->b.info.drm_major == 2 &&
>         +                       sscreen->b.info.drm_minor < 50);
>         +
>                 /* Unsupported features. */
>                 case PIPE_CAP_BUFFER_SAMPLER_VIEW_RGBA_ONLY:
>                 case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
>                 case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS:
>                 case PIPE_CAP_USER_VERTEX_BUFFERS:
>                 case PIPE_CAP_FAKE_SW_MSAA:
>                 case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
>                 case PIPE_CAP_VERTEXID_NOBASE:
>                 case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES:
>                 case PIPE_CAP_TGSI_VOTE:
>
>
>



More information about the mesa-dev mailing list