[Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.

Tue Sep 15 12:51:01 PDT 2015

On Tue, Sep 15, 2015 at 7:00 AM, He Junyan <junyan.he at inbox.com> wrote:
> On Tue, Sep 15, 2015 at 06:00:57AM -0700, Matt Turner wrote:
>> Date: Tue, 15 Sep 2015 06:00:57 -0700
>> From: Matt Turner <mattst88 at gmail.com>
>> To: "junyan.he" <junyan.he at inbox.com>
>> Cc: "beignet at lists.freedesktop.org" <beignet at lists.freedesktop.org>
>> Subject: Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.
>>
>> On Tue, Sep 15, 2015 at 4:15 AM,  <junyan.he at inbox.com> wrote:
>> > From: Junyan He <junyan.he at linux.intel.com>
>> >
>> > According to the document, we use a set of instructions
>> > to implement double type division.
>> >
>> > Signed-off-by: Junyan He <junyan.he at linux.intel.com>
>> > ---
>> >  backend/src/backend/gen8_context.cpp | 68 ++++++++++++++++++++++++++++++++++++
>> >  backend/src/backend/gen8_context.hpp |  2 ++
>> >  2 files changed, 70 insertions(+)
>> >
>> > diff --git a/backend/src/backend/gen8_context.cpp b/backend/src/backend/gen8_context.cpp
>> > index b497ee5..f465832 100644
>> > --- a/backend/src/backend/gen8_context.cpp
>> > +++ b/backend/src/backend/gen8_context.cpp
>> > @@ -924,6 +924,74 @@ namespace gbe
>> >      this->unpackLongVec(src, dst, p->curr.execWidth);
>> >    }
>> >
>> > +  void Gen8Context::emitF64DIVInstruction(const SelectionInstruction &insn) {
>> > +    /* Macro for Double Precision IEEE Compliant fdiv
>> > +
>> > +       Set Rounding Mode in CR to RNE
>> > +       GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1
>> > +       The default data type for the macro is :df
>> > +
>> > +       math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE
>> > +       (-f0.0) if
>> > +       madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2       // Step(1), q0=a*y0
>> > +       madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2     // Step(2), e0=(1-b*y0)
>> > +       madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3     // Step(3), r0=a-b*q0
>> > +       madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2       // Step(4), y1=y0+e0*y0
>> > +       madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6    // Step(5), e1=(1-b*y1)
>> > +       madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6       // Step(6), y2=y0+e0*y1
>> > +       madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6       // Step(7), q1=q0+r0*y1
>> > +       madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7      // Step(8), y3=y1+e1*y2
>> > +       madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9     // Step(9), r1=a-b*q1
>> > +
>> > +       Change Rounding Mode in CR if required
>> > +       Implicit Accumulator for destination is NULL
>> > +
>> > +       madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2      // Step(10), q=q1+r1*y3
>> > +       endif */
>>
>> I don't see an IF or an ENDIF instruction emitted in the code below.
>> Is that intentional, or am I misreading the code?
>>
> Here, we use f0.1 as the predication for all the instructions, like:
> (-f0.1) madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2
> (-f0.1) madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2
> .....
> I avoid using IF-Endif here, because we need to calculate the instruction number
> within IF clause, and it is not convenient.

Ah, I see.

While that works, I think it does not take advantage of the "early
out" capability of the INVM math instruction. As I understand it, for
some input values, it can calculate a full double-precision value
without any of the MADM sequence, so using IF/ENDIF will allow the EU
to jump over all of the MADM instructions -- but if you just predicate
the instructions the EU cannot jump over them, it must send each down
the pipeline.

Just something to consider. I don't know whether the difficulties of
using IF/ENDIF are great enough to avoid using them.