[Beignet] [PATCH 2/3] support 64bit-integer AND(&), OR(|), XOR(^) arithmetic
Xing, Homer
homer.xing at intel.com
Tue Aug 6 18:40:27 PDT 2013
Haha, I missed "with (DF) double precision source and/or destination" when read spec. Yes, you are right, spec also says "NibCtrl is only used for SIMD4 instructions with a 64bit datatype as source or destination".
But the fact is NibCtrl works even if source/destination is not 64bit. Interesting...
From: beignet-bounces+homer.xing=intel.com at lists.freedesktop.org [mailto:beignet-bounces+homer.xing=intel.com at lists.freedesktop.org] On Behalf Of zhigang gong
Sent: Wednesday, August 7, 2013 9:24 AM
To: Xing, Homer
Cc: beignet at lists.freedesktop.org
Subject: Re: [Beignet] [PATCH 2/3] support 64bit-integer AND(&), OR(|), XOR(^) arithmetic
On Wed, Aug 7, 2013 at 8:26 AM, Xing, Homer <homer.xing at intel.com<mailto:homer.xing at intel.com>> wrote:
Bspec says, nibControl + quarterControl can be used when the real execSize is four. For example, when execSize is eight and stride is two, the real execSize is four.
I really hope so and in practice, it works like above. But I just can't find the statement in the spec. I only found the following statement for IVB:
[DevIVB]:
NibCtrl is only allowed for SIMD4 instructions with (DF) double precision source and/or destination.
Could you tell me where can I find the description which support the same behavior with a non DW destination with 2-horizontal stride?
That will be helpful as I also used this "undocumented" feature in my patch when I implemented 64 bit data reading. The only difference is I disabled the predication
there.
Concentrating different function calls into one place is difficult. But I can change the four times repeated code inside each "case SEL_OP_XX" into a "for-loop".
That's good, let's reduce part of the duplication. And fix the rest in the future.
From: zhigang gong [mailto:zhigang.gong at gmail.com<mailto:zhigang.gong at gmail.com>]
Sent: Tuesday, August 6, 2013 4:38 PM
To: Xing, Homer
Cc: beignet at lists.freedesktop.org<mailto:beignet at lists.freedesktop.org>
Subject: Re: [Beignet] [PATCH 2/3] support 64bit-integer AND(&), OR(|), XOR(^) arithmetic
Sorry that, the mutt crashed and sent out an incomplete email. Here is the complete version:
Homer,
The logical of handling Long/ULong for many operations are very similar to each other. If you can concentrate
all of them into one place, then it will avoid to write duplicate code and it will reduce the maintenance effort.
And even in the same function, the logical seems repeat four times to handle each 4 elements. The different
is the register's offset and the quarterControl and nibCotnrol.
Another interesting issue is which I ignored in your first patch which is about the nibControl usage.
It seems you are using quarterControl + nibControl to handle predication for a DW type with a
2-element-horizontal. I haven't found anything in the Gen spec which support this type of usage.
>From the spec, the only use case for nibControl is when the destination is a DF type. Right?
Any comments here?
On Tue, Aug 6, 2013 at 4:16 PM, Zhigang Gong <zhigang.gong at gmail.com<mailto:zhigang.gong at gmail.com>> wrote:
Homer,
The logical of handling Long/ULong for many operations are
very similar to each other.
handle
On Tue, Aug 06, 2013 at 04:01:30PM +0800, Homer Hsing wrote:
>
> Signed-off-by: Homer Hsing <homer.xing at intel.com<mailto:homer.xing at intel.com>>
> ---
> backend/src/backend/gen_context.cpp | 102 +++++++++++++++++++++++++++++
> backend/src/backend/gen_insn_selection.cpp | 24 ++++++-
> backend/src/backend/gen_insn_selection.hxx | 3 +
> backend/src/ir/instruction.cpp | 1 +
> 4 files changed, 127 insertions(+), 3 deletions(-)
>
> diff --git a/backend/src/backend/gen_context.cpp b/backend/src/backend/gen_context.cpp
> index 69dab85..bbe16d0 100644
> --- a/backend/src/backend/gen_context.cpp
> +++ b/backend/src/backend/gen_context.cpp
> @@ -162,6 +162,108 @@ namespace gbe
> case SEL_OP_AND: p->AND(dst, src0, src1); break;
> case SEL_OP_OR: p->OR (dst, src0, src1); break;
> case SEL_OP_XOR: p->XOR(dst, src0, src1); break;
> + case SEL_OP_I64AND:
> + {
> + GenRegister xdst = GenRegister::retype(dst, GEN_TYPE_UL),
> + xsrc0 = GenRegister::retype(src0, GEN_TYPE_UL),
> + xsrc1 = GenRegister::retype(src1, GEN_TYPE_UL);
> + int execWidth = p->curr.execWidth;
> + p->push();
> + p->curr.execWidth = 8;
> + p->AND(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->AND(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + p->curr.nibControl = 1;
> + xdst = GenRegister::suboffset(xdst, 4),
> + xsrc0 = GenRegister::suboffset(xsrc0, 4),
> + xsrc1 = GenRegister::suboffset(xsrc1, 4);
> + p->AND(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->AND(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + if (execWidth == 16) {
> + p->curr.quarterControl = 1;
> + p->curr.nibControl = 0;
> + xdst = GenRegister::suboffset(xdst, 4),
> + xsrc0 = GenRegister::suboffset(xsrc0, 4),
> + xsrc1 = GenRegister::suboffset(xsrc1, 4);
> + p->AND(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->AND(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + p->curr.nibControl = 1;
> + xdst = GenRegister::suboffset(xdst, 4),
> + xsrc0 = GenRegister::suboffset(xsrc0, 4),
> + xsrc1 = GenRegister::suboffset(xsrc1, 4);
> + p->AND(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->AND(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + }
> + p->pop();
> + }
> + break;
> + case SEL_OP_I64OR:
> + {
> + GenRegister xdst = GenRegister::retype(dst, GEN_TYPE_UL),
> + xsrc0 = GenRegister::retype(src0, GEN_TYPE_UL),
> + xsrc1 = GenRegister::retype(src1, GEN_TYPE_UL);
> + int execWidth = p->curr.execWidth;
> + p->push();
> + p->curr.execWidth = 8;
> + p->OR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->OR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + p->curr.nibControl = 1;
> + xdst = GenRegister::suboffset(xdst, 4),
> + xsrc0 = GenRegister::suboffset(xsrc0, 4),
> + xsrc1 = GenRegister::suboffset(xsrc1, 4);
> + p->OR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->OR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + if (execWidth == 16) {
> + p->curr.quarterControl = 1;
> + p->curr.nibControl = 0;
> + xdst = GenRegister::suboffset(xdst, 4),
> + xsrc0 = GenRegister::suboffset(xsrc0, 4),
> + xsrc1 = GenRegister::suboffset(xsrc1, 4);
> + p->OR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->OR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + p->curr.nibControl = 1;
> + xdst = GenRegister::suboffset(xdst, 4),
> + xsrc0 = GenRegister::suboffset(xsrc0, 4),
> + xsrc1 = GenRegister::suboffset(xsrc1, 4);
> + p->OR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->OR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + }
> + p->pop();
> + }
> + break;
> + case SEL_OP_I64XOR:
> + {
> + GenRegister xdst = GenRegister::retype(dst, GEN_TYPE_UL),
> + xsrc0 = GenRegister::retype(src0, GEN_TYPE_UL),
> + xsrc1 = GenRegister::retype(src1, GEN_TYPE_UL);
> + int execWidth = p->curr.execWidth;
> + p->push();
> + p->curr.execWidth = 8;
> + p->XOR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->XOR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + p->curr.nibControl = 1;
> + xdst = GenRegister::suboffset(xdst, 4),
> + xsrc0 = GenRegister::suboffset(xsrc0, 4),
> + xsrc1 = GenRegister::suboffset(xsrc1, 4);
> + p->XOR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->XOR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + if (execWidth == 16) {
> + p->curr.quarterControl = 1;
> + p->curr.nibControl = 0;
> + xdst = GenRegister::suboffset(xdst, 4),
> + xsrc0 = GenRegister::suboffset(xsrc0, 4),
> + xsrc1 = GenRegister::suboffset(xsrc1, 4);
> + p->XOR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->XOR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + p->curr.nibControl = 1;
> + xdst = GenRegister::suboffset(xdst, 4),
> + xsrc0 = GenRegister::suboffset(xsrc0, 4),
> + xsrc1 = GenRegister::suboffset(xsrc1, 4);
> + p->XOR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());
> + p->XOR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());
> + }
> + p->pop();
> + }
> + break;
> case SEL_OP_SHR: p->SHR(dst, src0, src1); break;
> case SEL_OP_SHL: p->SHL(dst, src0, src1); break;
> case SEL_OP_RSR: p->RSR(dst, src0, src1); break;
> diff --git a/backend/src/backend/gen_insn_selection.cpp b/backend/src/backend/gen_insn_selection.cpp
> index 66cfa31..7e9402d 100644
> --- a/backend/src/backend/gen_insn_selection.cpp
> +++ b/backend/src/backend/gen_insn_selection.cpp
> @@ -423,6 +423,9 @@ namespace gbe
> ALU2(AND)
> ALU2(OR)
> ALU2(XOR)
> + ALU2(I64AND)
> + ALU2(I64OR)
> + ALU2(I64XOR)
> ALU2(SHR)
> ALU2(SHL)
> ALU2(RSR)
> @@ -1434,9 +1437,24 @@ namespace gbe
> sel.ADD(dst, src0, src1);
> sel.pop();
> break;
> - case OP_XOR: sel.XOR(dst, src0, src1); break;
> - case OP_OR: sel.OR(dst, src0, src1); break;
> - case OP_AND: sel.AND(dst, src0, src1); break;
> + case OP_XOR:
> + if (type == Type::TYPE_U64 || type == Type::TYPE_S64)
> + sel.I64XOR(dst, src0, src1);
> + else
> + sel.XOR(dst, src0, src1);
> + break;
> + case OP_OR:
> + if (type == Type::TYPE_U64 || type == Type::TYPE_S64)
> + sel.I64OR(dst, src0, src1);
> + else
> + sel.OR(dst, src0, src1);
> + break;
> + case OP_AND:
> + if (type == Type::TYPE_U64 || type == Type::TYPE_S64)
> + sel.I64AND(dst, src0, src1);
> + else
> + sel.AND(dst, src0, src1);
> + break;
> case OP_SUB:
> if (type == Type::TYPE_U64 || type == Type::TYPE_S64) {
> GenRegister t = sel.selReg(sel.reg(RegisterFamily::FAMILY_QWORD), Type::TYPE_S64);
> diff --git a/backend/src/backend/gen_insn_selection.hxx b/backend/src/backend/gen_insn_selection.hxx
> index 8eeb19f..7664c8f 100644
> --- a/backend/src/backend/gen_insn_selection.hxx
> +++ b/backend/src/backend/gen_insn_selection.hxx
> @@ -14,6 +14,9 @@ DECL_SELECTION_IR(SEL, BinaryInstruction)
> DECL_SELECTION_IR(AND, BinaryInstruction)
> DECL_SELECTION_IR(OR, BinaryInstruction)
> DECL_SELECTION_IR(XOR, BinaryInstruction)
> +DECL_SELECTION_IR(I64AND, BinaryInstruction)
> +DECL_SELECTION_IR(I64OR, BinaryInstruction)
> +DECL_SELECTION_IR(I64XOR, BinaryInstruction)
> DECL_SELECTION_IR(SHR, BinaryInstruction)
> DECL_SELECTION_IR(SHL, BinaryInstruction)
> DECL_SELECTION_IR(RSR, BinaryInstruction)
> diff --git a/backend/src/ir/instruction.cpp b/backend/src/ir/instruction.cpp
> index 2589848..f58757b 100644
> --- a/backend/src/ir/instruction.cpp
> +++ b/backend/src/ir/instruction.cpp
> @@ -672,6 +672,7 @@ namespace ir {
> static const Type logicalType[] = {TYPE_S8, TYPE_U8,
> TYPE_S16, TYPE_U16,
> TYPE_S32, TYPE_U32,
> + TYPE_S64, TYPE_U64,
> TYPE_BOOL};
> static const uint32_t logicalTypeNum = ARRAY_ELEM_NUM(logicalType);
>
> --
> 1.8.1.2
>
> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org<mailto:Beignet at lists.freedesktop.org>
> http://lists.freedesktop.org/mailman/listinfo/beignet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/beignet/attachments/20130807/f6f15450/attachment-0001.html>
More information about the Beignet
mailing list