[Mesa-dev] [PATCH 12/23] i965/fs: fix pull constant load component selection for doubles
Samuel Iglesias Gonsálvez
siglesias at igalia.com
Thu May 12 05:30:36 UTC 2016
On 11/05/16 22:46, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>
>> On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote:
>>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>>>
>>>>
>>>> From: Iago Toral Quiroga <itoral at igalia.com>
>>>>
>>>> UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4
>>>> starting at a
>>>> constant offset that is 16-byte aligned. If we need to access an
>>>> unaligned
>>>> offset we emit a load with an aligned offset and use the remaining
>>>> constant
>>>> offset to select the component into the vec4 result that we are
>>>> interested
>>>> in. This component must be computed in units of the type size,
>>>> since that
>>>> is what fs_reg::set_smear expects.
>>>>
>>>> This patch does this change in the two places where we use this
>>>> message:
>>>> In demote_pull_constants when we lower uniform access with constant
>>>> offset
>>>> into the pull constant buffer and in UBO loads with constant
>>>> offset.
>>>> ---
>>>> src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
>>>> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++-
>>>> 2 files changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> index 0e69be8..dff13ea 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads()
>>>> inst->src[i].file = VGRF;
>>>> inst->src[i].nr = dst.nr;
>>>> inst->src[i].reg_offset = 0;
>>>> - inst->src[i].set_smear(pull_index & 3);
>>>> + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) /
>>>> 4);
>>>> + inst->src[i].set_smear((pull_index & 3) / type_slots);
>>>>
>>> This cannot be right, why should we care what the destination type of
>>> the instruction is while lowering a uniform source? Also I don't
>>> think
>>> the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 < 1
>>> you'll force type_slots to 1 and end up interpreting the pull_index
>>> in
>>> the wrong units. How about:
>>>
>>>>
>>>> inst->src[i].set_smear((pull_index & 3) * 4 /
>>>> type_sz(inst->src[i].type));
>>>>
>>
>> OK
>>
>>>> brw_mark_surface_used(prog_data, index);
>>>> }
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> index 4cd219a..532ca65 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const
>>>> fs_builder &bld, nir_intrinsic_instr *instr
>>>> bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
>>>> packed_consts,
>>>> surf_index, const_offset_reg);
>>>>
>>>> + unsigned component_base =
>>>> + (const_offset->u32[0] % 16) / MAX2(1,
>>>> type_sz(dest.type));
>>> Rather than dividing by the type size only to let set_smear multiply
>>> by
>>> the type size again, I think it would be cleaner to do something
>>> like:
>>>
>>>>
>>>> const fs_reg consts = byte_offset(packed_consts,
>>>> const_offset->u32[0] % 16);
>>>>
>>>> for (unsigned i = 0; i < instr->num_components; i++) {
>>> then here:
>>>
>>>>
>>>> bld.MOV(offset(dest, bld, i), component(consts, i));
>>> and then remove the rest of the loop.
>>>
>>
>> I am having troubles with adapting patch 13/23 to this way because the
>> following assert in component() is failing for some tests:
>>
>> assert(reg.subreg_offset == 0);
>>
>
> Ouch, that seems pretty broken, let's fix it (see attachment).
>
Oh thanks! That fixes my problems. I am going to pick your patch and add
it to our version 2 of the branch.
Sam
>> consts.subreg is not zero thanks to byte_offset() call.
>>
>> So I prefer to go to a mixed solution: keep set_smear() usage, then:
>>
>> bld.MOV(offset(dest, bld, i), packed_consts);
>>
>> and remove the rest of the loop.
>>
>> Sam
>>
>>>>
>>>> - packed_consts.set_smear(const_offset->u32[0] % 16 / 4
>>>> + i);
>>>> + packed_consts.set_smear(component_base + i);
>>>>
>>>> /* The std140 packing rules don't allow vectors to
>>>> cross 16-byte
>>>> * boundaries, and a reg is 32 bytes.
>
More information about the mesa-dev
mailing list