[Mesa-dev] [PATCH 1/2] i965/vec4: Lower integer multiplication after optimizations.
Matt Turner
mattst88 at gmail.com
Wed Apr 27 22:02:57 UTC 2016
On Mon, Apr 18, 2016 at 5:18 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Mon, Apr 18, 2016 at 5:08 PM, Ian Romanick <idr at freedesktop.org> wrote:
>> On 04/18/2016 04:14 PM, Matt Turner wrote:
>>> Analogous to commit 1e4e17fbd in the i965/fs backend.
>>>
>>> Because the copy propagation pass in the vec4 backend is strictly local,
>>> we look at the immediate values coming from NIR and emit the multiplies
>>> we need directly. If the copy propagation pass becomes smarter in the
>>> future, we can reduce the nir_op_imul case in brw_vec4_nir.cpp to a
>>> single multiply.
>>>
>>> total instructions in shared programs: 7082311 -> 7081953 (-0.01%)
>>> instructions in affected programs: 59581 -> 59223 (-0.60%)
>>> helped: 293
>>>
>>> total cycles in shared programs: 65765712 -> 65764796 (-0.00%)
>>> cycles in affected programs: 854112 -> 853196 (-0.11%)
>>> helped: 154
>>> HURT: 73
>>> ---
>>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 67 ++++++++++++++++++++++++++++++
>>> src/mesa/drivers/dri/i965/brw_vec4.h | 1 +
>>> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 48 +++++++++------------
>>> 3 files changed, 88 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index b9cf3f6..1644d4d 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -1671,6 +1671,71 @@ vec4_visitor::lower_minmax()
>>> return progress;
>>> }
>>>
>>> +bool
>>> +vec4_visitor::lower_integer_multiplication()
>>> +{
>>> + bool progress = false;
>>> +
>>> + foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
>>> + const vec4_builder ibld(this, block, inst);
>>> +
>>> + if (inst->opcode == BRW_OPCODE_MUL) {
>>> + if (inst->dst.is_accumulator() ||
>>> + (inst->src[1].type != BRW_REGISTER_TYPE_D &&
>>> + inst->src[1].type != BRW_REGISTER_TYPE_UD))
>>> + continue;
>>> +
>>> + /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit
>>> + * operation directly, but CHV/BXT cannot.
>>> + */
>>> + if (devinfo->gen >= 8 &&
>>> + !devinfo->is_cherryview && !devinfo->is_broxton)
>>> + continue;
>>
>> Shouldn't this whole method just bail if we're Gen >= 8 and !CHV and
>> !BXT? Or does this structure simplify future changes?
>
> Oh, I hadn't noticed.
>
> The FS code was originally as you suggest, with the function returning
> early under those conditions. Curro changed that in commit 2e731264382
> in order to add lowering support for the multiply-high instruction on
> all platforms. We may want to do that in the vec4 backend as well.
>
> The other thing I need to fix is Cherryview multiplications, where we
> need to change the type of src1 to UW. I'm not sure if it's better to
> do that here, or at a lower level. Maybe in brw_MUL itself since
> that's called in a few places...
>
> Depending on whether people think that code should go here or
> elsewhere, I'll move the block to the beginning of the function.
Ken, Curro: any opinion where we should change src1's type to UW on CHV/BXT?
More information about the mesa-dev
mailing list