[Mesa-dev] [PATCH 2/5] intel/fs: Lower integer multiply correctly when destination stride equals 4.
Jason Ekstrand
jason at jlekstrand.net
Fri Feb 15 04:21:00 UTC 2019
On February 14, 2019 21:59:18 Francisco Jerez <currojerez at riseup.net> wrote:
> Jason Ekstrand <jason at jlekstrand.net> writes:
>
>> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez <currojerez at riseup.net>
>> wrote:
>>
>>> Because the "low" temporary needs to be accessed with word type and
>>> twice the original stride, attempting to preserve the alignment of the
>>> original destination can potentially lead to instructions with illegal
>>> destination stride greater than four. Because the CHV/BXT alignment
>>> restrictions are now being enforced by the regioning lowering pass run
>>> after lower_integer_multiplication(), there is no real need to
>>> preserve the original strides anymore.
>>>
>>>
>>> Note that this bug can be reproduced on stable branches, but
>>> back-porting would be non-trivial, because the fix relies on the
>>> regioning lowering pass recently introduced.
>>> ---
>>> src/intel/compiler/brw_fs.cpp | 6 ++----
>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>>
>>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>>> index f475b617df2..5768e0d6542 100644
>>> --- a/src/intel/compiler/brw_fs.cpp
>>> +++ b/src/intel/compiler/brw_fs.cpp
>>> @@ -3962,13 +3962,11 @@ fs_visitor::lower_integer_multiplication()
>>> regions_overlap(inst->dst, inst->size_written,
>>> inst->src[0], inst->size_read(0)) ||
>>> regions_overlap(inst->dst, inst->size_written,
>>> - inst->src[1], inst->size_read(1))) {
>>> + inst->src[1], inst->size_read(1)) ||
>>> + inst->dst.stride >= 4) {
>>
>> It would be nice to throw in a quick comment as to why we're adding a
>> temporary when stride >= 4.
>
> There seemed to be no pre-existing comment about any of the other
> conditions to allocate a temporary, so I've added the following:
>
> + /* Get a new VGRF for the "low" 32x16-bit multiplication result if
> + * reusing the original destination is impossible due to hardware
> + * restrictions, source/destination overlap, or it being the null
> + * register.
> + */
Works for me
Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
>
>>
>>> needs_mov = true;
>>> - /* Get a new VGRF but keep the same stride as inst->dst */
>>> low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
>>> inst->dst.type);
>>> - low.stride = inst->dst.stride;
>>> - low.offset = inst->dst.offset % REG_SIZE;
>>> }
>>>
>>>
>>> /* Get a new VGRF but keep the same stride as inst->dst */
>>> --
>>> 2.19.2
>>>
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list