[Mesa-dev] [PATCH 12/23] i965/fs: fix pull constant load component selection for doubles
Samuel Iglesias Gonsálvez
siglesias at igalia.com
Wed May 11 15:31:02 UTC 2016
On Wed, 2016-05-11 at 17:12 +0200, Samuel Iglesias Gonsálvez wrote:
> On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote:
> >
> > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> >
> > >
> > >
> > > From: Iago Toral Quiroga <itoral at igalia.com>
> > >
> > > UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4
> > > starting at a
> > > constant offset that is 16-byte aligned. If we need to access an
> > > unaligned
> > > offset we emit a load with an aligned offset and use the
> > > remaining
> > > constant
> > > offset to select the component into the vec4 result that we are
> > > interested
> > > in. This component must be computed in units of the type size,
> > > since that
> > > is what fs_reg::set_smear expects.
> > >
> > > This patch does this change in the two places where we use this
> > > message:
> > > In demote_pull_constants when we lower uniform access with
> > > constant
> > > offset
> > > into the pull constant buffer and in UBO loads with constant
> > > offset.
> > > ---
> > > src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
> > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++-
> > > 2 files changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > index 0e69be8..dff13ea 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads()
> > > inst->src[i].file = VGRF;
> > > inst->src[i].nr = dst.nr;
> > > inst->src[i].reg_offset = 0;
> > > - inst->src[i].set_smear(pull_index & 3);
> > > + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) /
> > > 4);
> > > + inst->src[i].set_smear((pull_index & 3) / type_slots);
> > >
> > This cannot be right, why should we care what the destination type
> > of
> > the instruction is while lowering a uniform source? Also I don't
> > think
> > the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 <
> > 1
> > you'll force type_slots to 1 and end up interpreting the pull_index
> > in
> > the wrong units. How about:
> >
> > >
> > >
> > > inst->src[i].set_smear((pull_index & 3) * 4 /
> > > type_sz(inst->src[i].type));
> > >
> OK
>
> >
> > >
> > > brw_mark_surface_used(prog_data, index);
> > > }
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > index 4cd219a..532ca65 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const
> > > fs_builder &bld, nir_intrinsic_instr *instr
> > > bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
> > > packed_consts,
> > > surf_index, const_offset_reg);
> > >
> > > + unsigned component_base =
> > > + (const_offset->u32[0] % 16) / MAX2(1,
> > > type_sz(dest.type));
> > Rather than dividing by the type size only to let set_smear
> > multiply
> > by
> > the type size again, I think it would be cleaner to do something
> > like:
> >
> > >
> > >
> > > const fs_reg consts = byte_offset(packed_consts,
> > > const_offset->u32[0] % 16);
> > >
> > > for (unsigned i = 0; i < instr->num_components; i++) {
> > then here:
> >
> > >
> > >
> > > bld.MOV(offset(dest, bld, i), component(consts, i));
> > and then remove the rest of the loop.
> >
> I am having troubles with adapting patch 13/23 to this way because
> the
> following assert in component() is failing for some tests:
>
> assert(reg.subreg_offset == 0);
>
> consts.subreg is not zero thanks to byte_offset() call.
>
> So I prefer to go to a mixed solution: keep set_smear() usage, then:
>
> bld.MOV(offset(dest, bld, i), packed_consts);
>
Looking at patch 13, offset(dest, bld, i) needs to be adjusted to save
the remaining components, so I think the MOV is clearer as it is now
than the proposed change.
Sam
> and remove the rest of the loop.
>
> Sam
>
> >
> > >
> > >
> > > - packed_consts.set_smear(const_offset->u32[0] % 16 /
> > > 4
> > > + i);
> > > + packed_consts.set_smear(component_base + i);
> > >
> > > /* The std140 packing rules don't allow vectors to
> > > cross 16-byte
> > > * boundaries, and a reg is 32 bytes.
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160511/1e0e9a2d/attachment.sig>
More information about the mesa-dev
mailing list