<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Aug 7, 2013 at 8:26 AM, Xing, Homer <span dir="ltr"><<a href="mailto:homer.xing@intel.com" target="_blank">homer.xing@intel.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">





<div lang="ZH-CN" link="blue" vlink="purple">
<div>
<p class=""><span lang="EN-US" style="font-size:10.5pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Bspec says, nibControl + quarterControl can be used when the real execSize is four. For example, when execSize is eight and stride is two, the
 real execSize is four.</span></p></div></div></blockquote><div style> I really hope so and in practice, it works like above. But I just can't find the statement in the spec. I only found the following statement for IVB:</div>
<div style><br></div><div style> <span style="color:black">[DevIVB]: </span></div><p class=""><span lang="EN-US" style="color:black"><span class=""> NibCtrl</span> is only allowed for SIMD4 instructions with (<span class="">DF</span>) double precision source and/or destination.</span><span lang="EN-US"></span></p>
<div style> Could you tell me where can I find the description which support the same behavior with a non DW destination with 2-horizontal stride? </div><div style> That will be helpful as I also used this "undocumented" feature in my patch when I implemented 64 bit data reading. The only difference is I disabled the predication</div>
<div style> there.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div lang="ZH-CN" link="blue" vlink="purple">
<div><p class=""><span lang="EN-US" style="font-size:10.5pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u><u></u></span></p>
<p class=""><a name="140562b6fb57a25e__MailEndCompose"><span lang="EN-US" style="font-size:10.5pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></a></p>
<p class=""><span lang="EN-US" style="font-size:10.5pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Concentrating different function calls into one place is difficult. But I can change the four times repeated code inside each “case SEL_OP_XX”
 into a “for-loop”.</span></p></div></div></blockquote><div style> That's good, let's reduce part of the duplication. And fix the rest in the future. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div lang="ZH-CN" link="blue" vlink="purple"><div><p class=""><span lang="EN-US" style="font-size:10.5pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u><u></u></span></p>
<p class=""><span lang="EN-US" style="font-size:10.5pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class=""><b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">From:</span></b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif"> zhigang gong [mailto:<a href="mailto:zhigang.gong@gmail.com" target="_blank">zhigang.gong@gmail.com</a>]
<br>
<b>Sent:</b> Tuesday, August 6, 2013 4:38 PM<br>
<b>To:</b> Xing, Homer<br>
<b>Cc:</b> <a href="mailto:beignet@lists.freedesktop.org" target="_blank">beignet@lists.freedesktop.org</a><br>
<b>Subject:</b> Re: [Beignet] [PATCH 2/3] support 64bit-integer AND(&), OR(|), XOR(^) arithmetic<u></u><u></u></span></p><div><div class="h5">
<p class=""><span lang="EN-US"><u></u> <u></u></span></p>
<div>
<p class=""><span lang="EN-US">Sorry that, the mutt crashed and sent out an incomplete email. Here is the complete version:<u></u><u></u></span></p>
<div>
<p class=""><span lang="EN-US"><u></u> <u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">Homer,<u></u><u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US"><u></u> <u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">The logical of handling Long/ULong for many operations are very similar to each other. If you can concentrate<u></u><u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">all of them into one place, then it will avoid to write duplicate code and it will reduce the maintenance effort.<u></u><u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US"><u></u> <u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">And even in the same function, the logical seems repeat four times to handle each 4 elements. The different<u></u><u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">is the register's offset and the quarterControl and nibCotnrol.<u></u><u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US"><u></u> <u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">Another interesting issue is which I ignored in your first patch which is about the nibControl usage.<u></u><u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">It seems you are using quarterControl + nibControl to handle predication for  a DW type with a <u></u><u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">2-element-horizontal. I haven't found anything in the Gen spec which support this type of usage.<u></u><u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">From the spec, the only use case for nibControl is when the destination is a DF type. Right?<u></u><u></u></span></p>
</div>
<div>
<p class=""><span lang="EN-US">Any comments here? <u></u><u></u></span></p>
</div>
</div>
<div>
<p class="" style="margin-bottom:12pt"><span lang="EN-US"><u></u> <u></u></span></p>
<div>
<p class=""><span lang="EN-US">On Tue, Aug 6, 2013 at 4:16 PM, Zhigang Gong <<a href="mailto:zhigang.gong@gmail.com" target="_blank">zhigang.gong@gmail.com</a>> wrote:<u></u><u></u></span></p>
<blockquote style="border-style:none none none solid;border-left-color:rgb(204,204,204);border-left-width:1pt;padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<p class=""><span lang="EN-US">Homer,<br>
<br>
The logical of handling Long/ULong for many operations are<br>
very similar to each other.<br>
<br>
handle<u></u><u></u></span></p>
<div>
<div>
<p class=""><span lang="EN-US">On Tue, Aug 06, 2013 at 04:01:30PM +0800, Homer Hsing wrote:<br>
><br>
> Signed-off-by: Homer Hsing <<a href="mailto:homer.xing@intel.com" target="_blank">homer.xing@intel.com</a>><br>
> ---<br>
>  backend/src/backend/gen_context.cpp        | 102 +++++++++++++++++++++++++++++<br>
>  backend/src/backend/gen_insn_selection.cpp |  24 ++++++-<br>
>  backend/src/backend/gen_insn_selection.hxx |   3 +<br>
>  backend/src/ir/instruction.cpp             |   1 +<br>
>  4 files changed, 127 insertions(+), 3 deletions(-)<br>
><br>
> diff --git a/backend/src/backend/gen_context.cpp b/backend/src/backend/gen_context.cpp<br>
> index 69dab85..bbe16d0 100644<br>
> --- a/backend/src/backend/gen_context.cpp<br>
> +++ b/backend/src/backend/gen_context.cpp<br>
> @@ -162,6 +162,108 @@ namespace gbe<br>
>        case SEL_OP_AND:  p->AND(dst, src0, src1); break;<br>
>        case SEL_OP_OR:   p->OR (dst, src0, src1);  break;<br>
>        case SEL_OP_XOR:  p->XOR(dst, src0, src1); break;<br>
> +      case SEL_OP_I64AND:<br>
> +        {<br>
> +          GenRegister xdst = GenRegister::retype(dst, GEN_TYPE_UL),<br>
> +                      xsrc0 = GenRegister::retype(src0, GEN_TYPE_UL),<br>
> +                      xsrc1 = GenRegister::retype(src1, GEN_TYPE_UL);<br>
> +          int execWidth = p->curr.execWidth;<br>
> +          p->push();<br>
> +          p->curr.execWidth = 8;<br>
> +          p->AND(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +          p->AND(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +          p->curr.nibControl = 1;<br>
> +          xdst = GenRegister::suboffset(xdst, 4),<br>
> +          xsrc0 = GenRegister::suboffset(xsrc0, 4),<br>
> +          xsrc1 = GenRegister::suboffset(xsrc1, 4);<br>
> +          p->AND(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +          p->AND(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +          if (execWidth == 16) {<br>
> +            p->curr.quarterControl = 1;<br>
> +            p->curr.nibControl = 0;<br>
> +            xdst = GenRegister::suboffset(xdst, 4),<br>
> +            xsrc0 = GenRegister::suboffset(xsrc0, 4),<br>
> +            xsrc1 = GenRegister::suboffset(xsrc1, 4);<br>
> +            p->AND(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +            p->AND(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +            p->curr.nibControl = 1;<br>
> +            xdst = GenRegister::suboffset(xdst, 4),<br>
> +            xsrc0 = GenRegister::suboffset(xsrc0, 4),<br>
> +            xsrc1 = GenRegister::suboffset(xsrc1, 4);<br>
> +            p->AND(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +            p->AND(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +          }<br>
> +          p->pop();<br>
> +        }<br>
> +        break;<br>
> +      case SEL_OP_I64OR:<br>
> +        {<br>
> +          GenRegister xdst = GenRegister::retype(dst, GEN_TYPE_UL),<br>
> +                      xsrc0 = GenRegister::retype(src0, GEN_TYPE_UL),<br>
> +                      xsrc1 = GenRegister::retype(src1, GEN_TYPE_UL);<br>
> +          int execWidth = p->curr.execWidth;<br>
> +          p->push();<br>
> +          p->curr.execWidth = 8;<br>
> +          p->OR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +          p->OR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +          p->curr.nibControl = 1;<br>
> +          xdst = GenRegister::suboffset(xdst, 4),<br>
> +          xsrc0 = GenRegister::suboffset(xsrc0, 4),<br>
> +          xsrc1 = GenRegister::suboffset(xsrc1, 4);<br>
> +          p->OR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +          p->OR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +          if (execWidth == 16) {<br>
> +            p->curr.quarterControl = 1;<br>
> +            p->curr.nibControl = 0;<br>
> +            xdst = GenRegister::suboffset(xdst, 4),<br>
> +            xsrc0 = GenRegister::suboffset(xsrc0, 4),<br>
> +            xsrc1 = GenRegister::suboffset(xsrc1, 4);<br>
> +            p->OR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +            p->OR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +            p->curr.nibControl = 1;<br>
> +            xdst = GenRegister::suboffset(xdst, 4),<br>
> +            xsrc0 = GenRegister::suboffset(xsrc0, 4),<br>
> +            xsrc1 = GenRegister::suboffset(xsrc1, 4);<br>
> +            p->OR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +            p->OR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +          }<br>
> +          p->pop();<br>
> +        }<br>
> +        break;<br>
> +      case SEL_OP_I64XOR:<br>
> +        {<br>
> +          GenRegister xdst = GenRegister::retype(dst, GEN_TYPE_UL),<br>
> +                      xsrc0 = GenRegister::retype(src0, GEN_TYPE_UL),<br>
> +                      xsrc1 = GenRegister::retype(src1, GEN_TYPE_UL);<br>
> +          int execWidth = p->curr.execWidth;<br>
> +          p->push();<br>
> +          p->curr.execWidth = 8;<br>
> +          p->XOR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +          p->XOR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +          p->curr.nibControl = 1;<br>
> +          xdst = GenRegister::suboffset(xdst, 4),<br>
> +          xsrc0 = GenRegister::suboffset(xsrc0, 4),<br>
> +          xsrc1 = GenRegister::suboffset(xsrc1, 4);<br>
> +          p->XOR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +          p->XOR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +          if (execWidth == 16) {<br>
> +            p->curr.quarterControl = 1;<br>
> +            p->curr.nibControl = 0;<br>
> +            xdst = GenRegister::suboffset(xdst, 4),<br>
> +            xsrc0 = GenRegister::suboffset(xsrc0, 4),<br>
> +            xsrc1 = GenRegister::suboffset(xsrc1, 4);<br>
> +            p->XOR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +            p->XOR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +            p->curr.nibControl = 1;<br>
> +            xdst = GenRegister::suboffset(xdst, 4),<br>
> +            xsrc0 = GenRegister::suboffset(xsrc0, 4),<br>
> +            xsrc1 = GenRegister::suboffset(xsrc1, 4);<br>
> +            p->XOR(xdst.bottom_half(), xsrc0.bottom_half(), xsrc1.bottom_half());<br>
> +            p->XOR(xdst.top_half(), xsrc0.top_half(), xsrc1.top_half());<br>
> +          }<br>
> +          p->pop();<br>
> +        }<br>
> +        break;<br>
>        case SEL_OP_SHR:  p->SHR(dst, src0, src1); break;<br>
>        case SEL_OP_SHL:  p->SHL(dst, src0, src1); break;<br>
>        case SEL_OP_RSR:  p->RSR(dst, src0, src1); break;<br>
> diff --git a/backend/src/backend/gen_insn_selection.cpp b/backend/src/backend/gen_insn_selection.cpp<br>
> index 66cfa31..7e9402d 100644<br>
> --- a/backend/src/backend/gen_insn_selection.cpp<br>
> +++ b/backend/src/backend/gen_insn_selection.cpp<br>
> @@ -423,6 +423,9 @@ namespace gbe<br>
>      ALU2(AND)<br>
>      ALU2(OR)<br>
>      ALU2(XOR)<br>
> +    ALU2(I64AND)<br>
> +    ALU2(I64OR)<br>
> +    ALU2(I64XOR)<br>
>      ALU2(SHR)<br>
>      ALU2(SHL)<br>
>      ALU2(RSR)<br>
> @@ -1434,9 +1437,24 @@ namespace gbe<br>
>              sel.ADD(dst, src0, src1);<br>
>            sel.pop();<br>
>            break;<br>
> -        case OP_XOR: sel.XOR(dst, src0, src1); break;<br>
> -        case OP_OR:  sel.OR(dst, src0,  src1); break;<br>
> -        case OP_AND: sel.AND(dst, src0, src1); break;<br>
> +        case OP_XOR:<br>
> +          if (type == Type::TYPE_U64 || type == Type::TYPE_S64)<br>
> +            sel.I64XOR(dst, src0, src1);<br>
> +          else<br>
> +            sel.XOR(dst, src0, src1);<br>
> +          break;<br>
> +        case OP_OR:<br>
> +          if (type == Type::TYPE_U64 || type == Type::TYPE_S64)<br>
> +            sel.I64OR(dst, src0, src1);<br>
> +          else<br>
> +            sel.OR(dst, src0, src1);<br>
> +          break;<br>
> +        case OP_AND:<br>
> +          if (type == Type::TYPE_U64 || type == Type::TYPE_S64)<br>
> +            sel.I64AND(dst, src0, src1);<br>
> +          else<br>
> +            sel.AND(dst, src0, src1);<br>
> +          break;<br>
>          case OP_SUB:<br>
>            if (type == Type::TYPE_U64 || type == Type::TYPE_S64) {<br>
>              GenRegister t = sel.selReg(sel.reg(RegisterFamily::FAMILY_QWORD), Type::TYPE_S64);<br>
> diff --git a/backend/src/backend/gen_insn_selection.hxx b/backend/src/backend/gen_insn_selection.hxx<br>
> index 8eeb19f..7664c8f 100644<br>
> --- a/backend/src/backend/gen_insn_selection.hxx<br>
> +++ b/backend/src/backend/gen_insn_selection.hxx<br>
> @@ -14,6 +14,9 @@ DECL_SELECTION_IR(SEL, BinaryInstruction)<br>
>  DECL_SELECTION_IR(AND, BinaryInstruction)<br>
>  DECL_SELECTION_IR(OR, BinaryInstruction)<br>
>  DECL_SELECTION_IR(XOR, BinaryInstruction)<br>
> +DECL_SELECTION_IR(I64AND, BinaryInstruction)<br>
> +DECL_SELECTION_IR(I64OR, BinaryInstruction)<br>
> +DECL_SELECTION_IR(I64XOR, BinaryInstruction)<br>
>  DECL_SELECTION_IR(SHR, BinaryInstruction)<br>
>  DECL_SELECTION_IR(SHL, BinaryInstruction)<br>
>  DECL_SELECTION_IR(RSR, BinaryInstruction)<br>
> diff --git a/backend/src/ir/instruction.cpp b/backend/src/ir/instruction.cpp<br>
> index 2589848..f58757b 100644<br>
> --- a/backend/src/ir/instruction.cpp<br>
> +++ b/backend/src/ir/instruction.cpp<br>
> @@ -672,6 +672,7 @@ namespace ir {<br>
>      static const Type logicalType[] = {TYPE_S8,  TYPE_U8,<br>
>                                         TYPE_S16, TYPE_U16,<br>
>                                         TYPE_S32, TYPE_U32,<br>
> +                                       TYPE_S64, TYPE_U64,<br>
>                                         TYPE_BOOL};<br>
>      static const uint32_t logicalTypeNum = ARRAY_ELEM_NUM(logicalType);<br>
><br>
> --<br>
> 1.8.1.2<br>
><br>
> _______________________________________________<br>
> Beignet mailing list<br>
> <a href="mailto:Beignet@lists.freedesktop.org" target="_blank">Beignet@lists.freedesktop.org</a><br>
> <a href="http://lists.freedesktop.org/mailman/listinfo/beignet" target="_blank">
http://lists.freedesktop.org/mailman/listinfo/beignet</a><u></u><u></u></span></p>
</div>
</div>
</blockquote>
</div>
<p class=""><span lang="EN-US"><u></u> <u></u></span></p>
</div>
</div></div></div>
</div>

</blockquote></div><br></div></div>