[Bug 104355] Ivy Bridge ignores component mappings in texture views

Thu Jan 24 19:41:22 UTC 2019

https://bugs.freedesktop.org/show_bug.cgi?id=104355

Jason Ekstrand <jason at jlekstrand.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |illia.iorin at gmail.com

--- Comment #12 from Jason Ekstrand <jason at jlekstrand.net> ---
Ilia Iorin asked if I could provide a description of the "better" way to do
this.  In principle, it's not that much different than what Ville did; it's
just a bit more compact and uses a lot less instructions in the shader.

What I would recommend would be to push swizzle into the shader as a single
32-bit value (4 8-bit swizzle parameters) encoded as (R: 0, G: 4, B: 8, A: 12,
ZERO: 16, ONE: 20).  Then add a special NIR intrinsic
nir_intrinsic_swizzle_tex_val_intel which takes two sources: a vec4 and a
uniform swizzle value which would be implemented in brw_fs_nir.cpp as something
like this:

> const bool is_integer = nir_intrinsic_dst_type(intrin);
> dest.type = is_integer ? BRW_REGISTER_TYPE_UD : BRW_REGISTER_TYPE_F;
> fs_reg tmp = bld.vgrf(dest.type, 6);
> 
> for (unsigned i = 0; i < 4; i++)
>    bld.MOV(offset(tmp, bld, i), offset(src0, bld, i));
> 
> bld.MOV(offset(tmp, bld, 4), retype(brw_imm_ud(0), dest.type));
> if (is_integer)
>    bld.MOV(offset(tmp, bld, 5), brw_imm_ud(1));
> else
>    bld.MOV(offset(tmp, bld, 5), brw_imm_f(1));
> 
> bld.emit(SHADER_OPCODE_VEC4_TEX_SWIZZLE, dest, tmp, src1);

where SHADER_OPCODE_VEC4_TEX_SWIZZLE is a pseudo instruction that does the
swizzle.  In particular, it would turn into something like this:

> add(4) r0<1>:uw src1<8,8,1>:ub 62:ud
> mov(8) g5<1>:ud r[a0.0]<1,0>:ud
> mov(8) g6<1>:ud r[a0.1]<1,0>:ud
> mov(8) g7<1>:ud r[a0.2]<1,0>:ud
> mov(8) g8<1>:ud r[a0.3]<1,0>:ud

The first add re-interprets the 32-bit swizzle value as 4 bytes and adds each
to the byte offset of the register provided in src[0] of the VEC4_TEX_SWIZZLE
instruction.  (The values we chose to represent our swizzles were conveniently
in units of bytes so we get byte offsets).  The add then writes those into the
first four address registers.  It's followed by four indirect MOV instructions
to read from the specified component of src[0] and write into the destination. 
This instruction would be very similar to the INDIRECT_MOV opcode we already
have except that it does 4 MOVs at a time with a single address register set-up
instruction.  Because we set up src[0] of the VEC4_TEX_SWIZZLE to be a vec6
which also contains the 0/1 constants, we can get all the different kinds of
swizzles without doing a full matrix multiply or special-casing logic around
the 0/1 constants.

These swizzling instructions will likely hurt texture performance due to
stalling of the texture instructions caused by immediately adding a bunch of
MOVs which use the instruction.  If we want to reduce that, we could
potentially compile two versions of each shader: one which has swizzles and one
which assumes all swizzles are RGBA and dynamically select between them at
dispatch time based on a quick walk of the descriptor set to see if swizzles
are needed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20190124/cbd38944/attachment.html>