[Mesa-dev] [PATCH 1/2] i965/vec4: Lower integer multiplication after optimizations.

Matt Turner mattst88 at gmail.com
Tue Apr 19 00:18:38 UTC 2016


On Mon, Apr 18, 2016 at 5:08 PM, Ian Romanick <idr at freedesktop.org> wrote:
> On 04/18/2016 04:14 PM, Matt Turner wrote:
>> Analogous to commit 1e4e17fbd in the i965/fs backend.
>>
>> Because the copy propagation pass in the vec4 backend is strictly local,
>> we look at the immediate values coming from NIR and emit the multiplies
>> we need directly. If the copy propagation pass becomes smarter in the
>> future, we can reduce the nir_op_imul case in brw_vec4_nir.cpp to a
>> single multiply.
>>
>> total instructions in shared programs: 7082311 -> 7081953 (-0.01%)
>> instructions in affected programs: 59581 -> 59223 (-0.60%)
>> helped: 293
>>
>> total cycles in shared programs: 65765712 -> 65764796 (-0.00%)
>> cycles in affected programs: 854112 -> 853196 (-0.11%)
>> helped: 154
>> HURT: 73
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp     | 67 ++++++++++++++++++++++++++++++
>>  src/mesa/drivers/dri/i965/brw_vec4.h       |  1 +
>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 48 +++++++++------------
>>  3 files changed, 88 insertions(+), 28 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index b9cf3f6..1644d4d 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -1671,6 +1671,71 @@ vec4_visitor::lower_minmax()
>>     return progress;
>>  }
>>
>> +bool
>> +vec4_visitor::lower_integer_multiplication()
>> +{
>> +   bool progress = false;
>> +
>> +   foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
>> +      const vec4_builder ibld(this, block, inst);
>> +
>> +      if (inst->opcode == BRW_OPCODE_MUL) {
>> +         if (inst->dst.is_accumulator() ||
>> +             (inst->src[1].type != BRW_REGISTER_TYPE_D &&
>> +              inst->src[1].type != BRW_REGISTER_TYPE_UD))
>> +            continue;
>> +
>> +         /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit
>> +          * operation directly, but CHV/BXT cannot.
>> +          */
>> +         if (devinfo->gen >= 8 &&
>> +             !devinfo->is_cherryview && !devinfo->is_broxton)
>> +            continue;
>
> Shouldn't this whole method just bail if we're Gen >= 8 and !CHV and
> !BXT?  Or does this structure simplify future changes?

Oh, I hadn't noticed.

The FS code was originally as you suggest, with the function returning
early under those conditions. Curro changed that in commit 2e731264382
in order to add lowering support for the multiply-high instruction on
all platforms. We may want to do that in the vec4 backend as well.

The other thing I need to fix is Cherryview multiplications, where we
need to change the type of src1 to UW. I'm not sure if it's better to
do that here, or at a lower level. Maybe in brw_MUL itself since
that's called in a few places...

Depending on whether people think that code should go here or
elsewhere, I'll move the block to the beginning of the function.


More information about the mesa-dev mailing list