[Mesa-dev] [PATCH 3/5] intel/fs: Cap dst-aligned region stride to maximum representable hstride value.

Jason Ekstrand jason at jlekstrand.net
Thu Feb 21 16:49:15 UTC 2019


On Thu, Feb 14, 2019 at 4:53 PM Francisco Jerez <currojerez at riseup.net>
wrote:

> Jason Ekstrand <jason at jlekstrand.net> writes:
>
> > On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez <currojerez at riseup.net>
> > wrote:
> >
> >> This is required in combination with the following commit, because
> >> otherwise if a source region with an extended 8+ stride is present in
> >> the instruction (which we're about to declare legal) we'll end up
> >> emitting code that attempts to write to such a region, even though
> >> strides greater than four are still illegal for the destination.
> >> ---
> >>  src/intel/compiler/brw_fs_lower_regioning.cpp | 20 ++++++++++++++-----
> >>  1 file changed, 15 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
> >> b/src/intel/compiler/brw_fs_lower_regioning.cpp
> >> index 6a3c23892b4..b86e95ed9eb 100644
> >> --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> >> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> >> @@ -71,15 +71,25 @@ namespace {
> >>            !is_byte_raw_mov(inst)) {
> >>           return get_exec_type_size(inst);
> >>        } else {
> >> -         unsigned stride = inst->dst.stride * type_sz(inst->dst.type);
> >> +         /* Calculate the maximum byte stride and the minimum type size
> >> across
> >> +          * all source and destination operands.
> >> +          */
> >> +         unsigned max_stride = inst->dst.stride *
> type_sz(inst->dst.type);
> >> +         unsigned min_size = type_sz(inst->dst.type);
> >>
> >>           for (unsigned i = 0; i < inst->sources; i++) {
> >> -            if (!is_uniform(inst->src[i]) &&
> !inst->is_control_source(i))
> >> -               stride = MAX2(stride, inst->src[i].stride *
> >> -                             type_sz(inst->src[i].type));
> >> +            if (!is_uniform(inst->src[i]) &&
> !inst->is_control_source(i))
> >> {
> >> +               max_stride = MAX2(max_stride, inst->src[i].stride *
> >> +                                 type_sz(inst->src[i].type));
> >> +               min_size = MIN2(min_size, type_sz(inst->src[i].type));
> >> +            }
> >>           }
> >>
> >> -         return stride;
> >> +         /* Attempt to use the largest byte stride among all present
> >> operands,
> >> +          * but never exceed a stride of 4 since that would lead to
> >> illegal
> >> +          * destination regions during lowering.
> >> +          */
> >> +         return MIN2(max_stride, 4 * min_size);
> >>
> >
> > Why not just fall back to tightly packed in this case?  I think I can
> > answer my own question:  Because using something that's equal to one of
> the
> > strides reduces the liklihood that we'll need a temporary.  If that's the
> > correct answer, then maybe what we want is the maximum of all strides
> with
> > stride_in_bytes <= 4 * type_sz?
> >
>
> We also want the result to be greater than or equal to the size of the
> largest non-uniform, non-control source type, since packing a vector of
> such a type into a temporary of lower byte stride than its size is
> impossible.  This patch guarantees that as long as max_size <= 4 *
> min_size, which is necessary for the lowering code that calls this
> function to work at all.
>
> It would be possible to preserve this guarantee while attempting to pick
> one of the strides of the pre-existing sources as you say -- I would be
> happy to review that change as a follow-up micro-optimization patch, but
> there are some corner cases to consider I don't necessarily want to
> bother with in the patch doing the functional change, for the sake of
> bisectability.
>

That's fine with me.

Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>


> It may make sense to add an assert here that max_size <= 4 * min_size
> for the case such an instruction doesn't blow up already at the EU
> validator (it doesn't look like the validator is currently enforcing the
> lack of conversions between 8 and 64 bit types?), it will just involve
> calculating max_size in addition to max_stride and min_size above.
>
> > --Jason
> >
> >
> >>        }
> >>     }
> >>
> >> --
> >> 2.19.2
> >>
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190221/a3c48fcc/attachment.html>


More information about the mesa-dev mailing list