[Mesa-dev] [PATCH 5/6] intel/fs: Handle surface opcode sample masks via predication.
Kenneth Graunke
kenneth at whitecape.org
Thu Mar 1 19:34:20 UTC 2018
On Tuesday, February 27, 2018 1:38:27 PM PST Francisco Jerez wrote:
> The main motivation is to enable HDC surface opcodes on ICL which no
> longer allows the sample mask to be provided in a message header, but
> this is enabled all the way back to IVB when possible because it
> decreases the instruction count of some shaders using HDC messages
> significantly, e.g. one of the SynMark2 CSDof compute shaders
> decreases instruction count by about 40% due to the removal of header
> setup boilerplate which in turn makes a number of send message
> payloads more easily CSE-able. Shader-db results on SKL:
>
> total instructions in shared programs: 15325319 -> 15314384 (-0.07%)
> instructions in affected programs: 311532 -> 300597 (-3.51%)
> helped: 491
> HURT: 1
>
> Shader-db results on BDW where the optimization needs to be disabled
> in some cases due to hardware restrictions:
>
> total instructions in shared programs: 15604794 -> 15598028 (-0.04%)
> instructions in affected programs: 220863 -> 214097 (-3.06%)
> helped: 351
> HURT: 0
>
> The FPS of SynMark2 CSDof improves by 5.09% ±0.36% (n=10) on my SKL
> laptop with this change.
> ---
> src/intel/compiler/brw_fs.cpp | 42 +++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 0b87d8ab14e..639432b4f49 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -4432,6 +4432,8 @@ static void
> lower_surface_logical_send(const fs_builder &bld, fs_inst *inst, opcode op,
> const fs_reg &sample_mask)
> {
> + const gen_device_info *devinfo = bld.shader->devinfo;
> +
> /* Get the logical send arguments. */
> const fs_reg &addr = inst->src[0];
> const fs_reg &src = inst->src[1];
> @@ -4442,7 +4444,20 @@ lower_surface_logical_send(const fs_builder &bld, fs_inst *inst, opcode op,
> /* Calculate the total number of components of the payload. */
> const unsigned addr_sz = inst->components_read(0);
> const unsigned src_sz = inst->components_read(1);
> - const unsigned header_sz = (sample_mask.file == BAD_FILE ? 0 : 1);
> + /* From the BDW PRM Volume 7, page 147:
> + *
> + * "For the Data Cache Data Port*, the header must be present for the
> + * following message types: [...] Typed read/write/atomics"
> + *
> + * Earlier generations have a similar wording. Because of this restriction
> + * we don't attempt to implement sample masks via predication for such
> + * messages prior to Gen9, since we have to provide a header anyway. On
> + * Gen11+ the header has been removed so we can only use predication.
> + */
> + const unsigned header_sz = devinfo->gen < 9 &&
> + (op == SHADER_OPCODE_TYPED_SURFACE_READ ||
> + op == SHADER_OPCODE_TYPED_SURFACE_WRITE ||
> + op == SHADER_OPCODE_TYPED_ATOMIC) ? 1 : 0;
> const unsigned sz = header_sz + addr_sz + src_sz;
>
> /* Allocate space for the payload. */
> @@ -4462,6 +4477,31 @@ lower_surface_logical_send(const fs_builder &bld, fs_inst *inst, opcode op,
>
> bld.LOAD_PAYLOAD(payload, components, sz, header_sz);
>
> + /* Predicate the instruction on the sample mask if no header is
> + * provided.
> + */
> + if (!header_sz && sample_mask.file != BAD_FILE &&
> + sample_mask.file != IMM) {
> + const fs_builder ubld = bld.group(1, 0).exec_all();
> + if (inst->predicate) {
> + assert(inst->predicate == BRW_PREDICATE_NORMAL);
> + assert(!inst->predicate_inverse);
> + assert(inst->flag_subreg < 2);
> + /* Combine the sample mask with the existing predicate by using a
> + * vertical predication mode.
> + */
> + inst->predicate = BRW_PREDICATE_ALIGN1_ALLV;
> + ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg + 2),
> + sample_mask.type),
> + sample_mask);
I was surprised to see flag_subreg remain unchanged here, but then I
re-read how allv works, and it does f0.0 & f1.0, or f0.1 & f1.1. So,
we can leave it as 0 or 1 and it'll implicitly use 2 or 3 as well.
Series is:
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> + } else {
> + inst->flag_subreg = 2;
> + inst->predicate = BRW_PREDICATE_NORMAL;
> + ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg), sample_mask.type),
> + sample_mask);
> + }
> + }
> +
> /* Update the original instruction. */
> inst->opcode = op;
> inst->mlen = header_sz + (addr_sz + src_sz) * inst->exec_size / 8;
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180301/774d80e9/attachment.sig>
More information about the mesa-dev
mailing list