[Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler
Chema Casanova
jmcasanova at igalia.com
Tue Dec 5 23:46:48 UTC 2017
On 05/12/17 22:25, Chema Casanova wrote:
> On 05/12/17 19:53, Jason Ekstrand wrote:
>> On Tue, Dec 5, 2017 at 9:08 AM, Chema Casanova <jmcasanova at igalia.com
>> <mailto:jmcasanova at igalia.com>> wrote:
>>
>> El 30/11/17 a las 23:58, Jason Ekstrand escribió:
>> > On Wed, Nov 29, 2017 at 6:50 PM, Jose Maria Casanova Crespo
>> > <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>
>> <mailto:jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>>> wrote:
>> >
>> > load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
>> > surface format defined. So when reading 16-bit components with the
>> > sampler we need to unshuffle two 16-bit components from each
>> 32-bit
>> > component.
>> >
>> > Using the sampler avoids the use of the byte_scattered_read
>> message
>> > that needs one message for each component and is supposed to be
>> > slower.
>> >
>> > In the case of SKL+ we take advantage of a hardware feature that
>> > automatically defines a channel mask based on the rlen value,
>> so on
>> > SKL+ we only use half of the registers without using a header
>> in the
>> > payload.
>> > ---
>> > src/intel/compiler/brw_fs.cpp | 31
>> > +++++++++++++++++++++++++++----
>> > src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++--
>> > 2 files changed, 35 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/src/intel/compiler/brw_fs.cpp
>> > b/src/intel/compiler/brw_fs.cpp
>> > index 1ca4d416b2..9c543496ba 100644
>> > --- a/src/intel/compiler/brw_fs.cpp
>> > +++ b/src/intel/compiler/brw_fs.cpp
>> > @@ -184,9 +184,17 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
>> > fs_builder &bld,
>> > * a double this means we are only loading 2 elements worth of
>> > data.
>> > * We also want to use a 32-bit data type for the dst of the
>> > load operation
>> > * so other parts of the driver don't get confused about the
>> > size of the
>> > - * result.
>> > + * result. On the case of 16-bit data we only need half of the
>> > 32-bit
>> > + * components on SKL+ as we take advance of using message
>> > return size to
>> > + * define an xy channel mask.
>> > */
>> > - fs_reg vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
>> > + fs_reg vec4_result;
>> > + if (type_sz(dst.type) == 2 && (devinfo->gen >= 9)) {
>> > + vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
>> > + vec4_result = retype(vec4_result, BRW_REGISTER_TYPE_HF);
>> > + } else {
>> > + vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
>> > + }
>> >
>> > fs_inst *inst =
>> > bld.emit(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_LOGICAL,
>> > vec4_result, surf_index,
>> vec4_offset);
>> > inst->size_written = 4 *
>> > vec4_result.component_size(inst->exec_size);
>> > @@ -197,8 +205,23 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
>> > fs_builder &bld,
>> > }
>> >
>> > vec4_result.type = dst.type;
>> > - bld.MOV(dst, offset(vec4_result, bld,
>> > - (const_offset & 0xf) /
>> > type_sz(vec4_result.type)));
>> > +
>> > + if (type_sz(dst.type) == 2) {
>> > + /* 16-bit types need to be unshuffled as each pair of
>> > 16-bit components
>> > + * is packed on a 32-bit compoment because we are using a
>> > 32-bit format
>> > + * in the surface of uniform that is read by the sampler.
>> > + * TODO: On BDW+ mark when an uniform has 16-bit type so we
>> > could setup a
>> > + * surface format of 16-bit and use the 16-bit return
>> > format at the
>> > + * sampler.
>> > + */
>> > + vec4_result.stride = 2;
>> > + bld.MOV(dst, byte_offset(offset(vec4_result, bld,
>> > + (const_offset & 0x7) / 4),
>> > + (const_offset & 0x7) / 2 % 2 *
>> 2));
>> > + } else {
>> > + bld.MOV(dst, offset(vec4_result, bld,
>> > + (const_offset & 0xf) /
>> > type_sz(vec4_result.type)));
>> > + }
>> >
>> >
>> > This seems overly complicated. How about something like
>>
>> > fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
>> > switch (type_sz(dst.type)) {
>> > case 2:
>> > shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1);
>> > bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
>> > break;
>> > case 4:
>> > bld.MOV(dst, dw);
Finally i needed to change this line for:
bld.MOV(dst, retype(dw, dst.type));
As dst isn't guaranteed to be F, we need the retype so the MOV doesn't
apply format conversions as cached by some piglit tests. I'm sending the
final patch.
>> > break;
>> > case 8:
>> > shuffle_32bit_load_result_to_64bit_data(bld, dst, dw, 1);
>> > break;
>> > default:
>> > unreachable();
>> > }
>>
>> This implementation it is really more clear. Tested and works perfectly
>> fine.
>>
>> >
>> >
>> > }
>> >
>> > /**
>> > diff --git a/src/intel/compiler/brw_fs_generator.cpp
>> > b/src/intel/compiler/brw_fs_generator.cpp
>> > index a3861cd68e..00a4e29147 100644
>> > --- a/src/intel/compiler/brw_fs_generator.cpp
>> > +++ b/src/intel/compiler/brw_fs_generator.cpp
>> > @@ -1381,12 +1381,18 @@
>> > fs_generator::generate_varying_pull_constant_load_gen7(fs_inst *inst,
>> > uint32_t simd_mode, rlen, mlen;
>> > if (inst->exec_size == 16) {
>> > mlen = 2;
>> > - rlen = 8;
>> > + if (type_sz(dst.type) == 2 && (devinfo->gen >= 9))
>> > + rlen = 4;
>> > + else
>> > + rlen = 8;
>> >
>> >
>> > I'm not sure what I think of this. We intentionally use a vec4 today
>> > instead of a single float because it lets us potentially batch up a
>> > bunch of loads in a single texture operation. Are we sure that we
>> > never need more than 2 of those result registers?
>>
>> I can drop this supposed "optimization", in this hunk and also the
>> following lines. Maybe we could use the not read 3rd and 4th 32-bit
>> components to read for example a consecutive pair of 16-bit vec4 in a
>> matrix context.
>>
>> Removing this last hunks and with your previous code proposal, can I
>> count with the R-b ? At the end the patch would be just reduced to your
>> code ...
>>
>>
>> Yup, fine with me. Or, if it just reduces to what I wrote above, you
>> can re-author the patch and review it yourself. I don't care which.
>
> So let it be my first review in mesa-dev :)
>
> Chema
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
More information about the mesa-dev
mailing list