[Mesa-dev] [PATCH 1/2] i965/vec4: Lower integer multiplication after optimizations.

Wed Apr 27 22:02:57 UTC 2016

On Mon, Apr 18, 2016 at 5:18 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Mon, Apr 18, 2016 at 5:08 PM, Ian Romanick <idr at freedesktop.org> wrote:
>> On 04/18/2016 04:14 PM, Matt Turner wrote:
>>> Analogous to commit 1e4e17fbd in the i965/fs backend.
>>>
>>> Because the copy propagation pass in the vec4 backend is strictly local,
>>> we look at the immediate values coming from NIR and emit the multiplies
>>> we need directly. If the copy propagation pass becomes smarter in the
>>> future, we can reduce the nir_op_imul case in brw_vec4_nir.cpp to a
>>> single multiply.
>>>
>>> total instructions in shared programs: 7082311 -> 7081953 (-0.01%)
>>> instructions in affected programs: 59581 -> 59223 (-0.60%)
>>> helped: 293
>>>
>>> total cycles in shared programs: 65765712 -> 65764796 (-0.00%)
>>> cycles in affected programs: 854112 -> 853196 (-0.11%)
>>> helped: 154
>>> HURT: 73
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp     | 67 ++++++++++++++++++++++++++++++
>>>  src/mesa/drivers/dri/i965/brw_vec4.h       |  1 +
>>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 48 +++++++++------------
>>>  3 files changed, 88 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index b9cf3f6..1644d4d 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -1671,6 +1671,71 @@ vec4_visitor::lower_minmax()
>>>     return progress;
>>>  }
>>>
>>> +bool
>>> +vec4_visitor::lower_integer_multiplication()
>>> +{
>>> +   bool progress = false;
>>> +
>>> +   foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
>>> +      const vec4_builder ibld(this, block, inst);
>>> +
>>> +      if (inst->opcode == BRW_OPCODE_MUL) {
>>> +         if (inst->dst.is_accumulator() ||
>>> +             (inst->src[1].type != BRW_REGISTER_TYPE_D &&
>>> +              inst->src[1].type != BRW_REGISTER_TYPE_UD))
>>> +            continue;
>>> +
>>> +         /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit
>>> +          * operation directly, but CHV/BXT cannot.
>>> +          */
>>> +         if (devinfo->gen >= 8 &&
>>> +             !devinfo->is_cherryview && !devinfo->is_broxton)
>>> +            continue;
>>
>> Shouldn't this whole method just bail if we're Gen >= 8 and !CHV and
>> !BXT?  Or does this structure simplify future changes?
>
> Oh, I hadn't noticed.
>
> The FS code was originally as you suggest, with the function returning
> early under those conditions. Curro changed that in commit 2e731264382
> in order to add lowering support for the multiply-high instruction on
> all platforms. We may want to do that in the vec4 backend as well.
>
> The other thing I need to fix is Cherryview multiplications, where we
> need to change the type of src1 to UW. I'm not sure if it's better to
> do that here, or at a lower level. Maybe in brw_MUL itself since
> that's called in a few places...
>
> Depending on whether people think that code should go here or
> elsewhere, I'll move the block to the beginning of the function.

Ken, Curro: any opinion where we should change src1's type to UW on CHV/BXT?