[Mesa-dev] [PATCH 3/5] intel/fs: Cap dst-aligned region stride to maximum representable hstride value.
Jason Ekstrand
jason at jlekstrand.net
Thu Feb 14 21:49:08 UTC 2019
On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez <currojerez at riseup.net>
wrote:
> This is required in combination with the following commit, because
> otherwise if a source region with an extended 8+ stride is present in
> the instruction (which we're about to declare legal) we'll end up
> emitting code that attempts to write to such a region, even though
> strides greater than four are still illegal for the destination.
> ---
> src/intel/compiler/brw_fs_lower_regioning.cpp | 20 ++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
> b/src/intel/compiler/brw_fs_lower_regioning.cpp
> index 6a3c23892b4..b86e95ed9eb 100644
> --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> @@ -71,15 +71,25 @@ namespace {
> !is_byte_raw_mov(inst)) {
> return get_exec_type_size(inst);
> } else {
> - unsigned stride = inst->dst.stride * type_sz(inst->dst.type);
> + /* Calculate the maximum byte stride and the minimum type size
> across
> + * all source and destination operands.
> + */
> + unsigned max_stride = inst->dst.stride * type_sz(inst->dst.type);
> + unsigned min_size = type_sz(inst->dst.type);
>
> for (unsigned i = 0; i < inst->sources; i++) {
> - if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
> - stride = MAX2(stride, inst->src[i].stride *
> - type_sz(inst->src[i].type));
> + if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
> {
> + max_stride = MAX2(max_stride, inst->src[i].stride *
> + type_sz(inst->src[i].type));
> + min_size = MIN2(min_size, type_sz(inst->src[i].type));
> + }
> }
>
> - return stride;
> + /* Attempt to use the largest byte stride among all present
> operands,
> + * but never exceed a stride of 4 since that would lead to
> illegal
> + * destination regions during lowering.
> + */
> + return MIN2(max_stride, 4 * min_size);
>
Why not just fall back to tightly packed in this case? I think I can
answer my own question: Because using something that's equal to one of the
strides reduces the liklihood that we'll need a temporary. If that's the
correct answer, then maybe what we want is the maximum of all strides with
stride_in_bytes <= 4 * type_sz?
--Jason
> }
> }
>
> --
> 2.19.2
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190214/e335a686/attachment.html>
More information about the mesa-dev
mailing list