[Beignet] [PATCH V2] gbe: Implement a new BTI solution to support dynamic bti
Zhigang Gong
zhigang.gong at linux.intel.com
Fri May 22 03:46:30 PDT 2015
Most of the important issues have been adressed in this version.
And I agree to postpone some minor performance issues.
I just pushed this patch to master branch.
Thanks,
Zhigang Gong.
On Thu, May 21, 2015 at 11:07:30AM +0800, Ruiling Song wrote:
> while the old implementation analyze statically the pointer base, and thus
> manage compile time BTIs for all memory access instruction. The new implementation
> introduce a virtual register to hold the BTI value for the memory access instruction.
> The main benefit of this new method is it can handle storing/loading pointers.
> This is a big step towards supporting storing/loading pointers
>
> consider following example:
> void @compiler_mixed_pointer1(i32 addrspace(1)* readonly %src, i32 addrspace(1)* %dst1, i32 addrspace(1)* %dst2) {
> %cmp = icmp slt i32 %add4.i, 5
> %cond = select i1 %cmp, i32 addrspace(1)* %dst1, i32 addrspace(1)* %dst2
> store i32 %6, i32 addrspace(1)* %10, align 4, !tbaa !31
> }
>
> will be changed to:
>
> void @compiler_mixed_pointer1(i32 addrspace(1)* readonly %src, i32 addrspace(1)* %dst1, i32 addrspace(1)* %dst2) {
> %cmp = icmp slt i32 %add4.i, 5
>
> // new added instruction:
> // %0 hold the value of BTIs, '3' is bti of dst1, '4' is the bti of dst2
> // %1 holds the value of starting address for the BTIs, which will be subtracted.
>
> %0 = select i1 %cmp, i32 3, i32 4
> %1 = select i1 %cmp, i32 addrspace(1)* %dst1, i32 addrspace(1)* %dst2
>
> %cond = select i1 %cmp, i32 addrspace(1)* %dst1, i32 addrspace(1)* %dst2
> store i32 %cond, i32 addrspace(1)* %10, align 4
> }
>
> The idea of the solution is: check bti register and select one lane of bti that is not accessed (through 'lzd').
> and issue the send message to the bti, and continue get the un-accessed lanes and repeat the steps.
>
> for mixed pointer, the final asm looks like below:
> (g118 (offset 0xec0) is register holds bti of all lanes)
> ((31-lzd(active_lane_mask))*4 + bti_reg_start) is the target bti for this iteration
>
> As the gen backend currently only allow one flag register for one selectionInstruction,
> so I have to store the flag at (54) and load at (64) at the example below.
>
> ( 38) mov(1) f0.1<2>:UW 0x0UW { align1 WE_all };
> ( 40) cmp.ne.f0.1(16) null:F f0.1<0,1,0>:UW 0x1UW { align1 WE_normal 1H switch };
> ( 42) and(1) g8.2<1>:UD f0.1<0,1,0>:UW 0xffffffffUD { align1 WE_all };
> ( 44) lzd(1) g8.2<1>:UD g8.2<0,1,0>:UD { align1 WE_all };
> ( 46) add(1) g8.4<2>:UW -g8.4<0,1,0>:UW 0x1fUW { align1 WE_all };
> ( 48) mul(1) g8.4<2>:UW g8.4<0,1,0>:UW 0x4UW { align1 WE_all };
> ( 50) add(1) a0<2>:UW g8.4<0,1,0>:UW 0xec0UD { align1 WE_all };
> ( 52) mov(1) g8.2<1>:UD g[a0]<0,1,0>:UD { align1 WE_all };
> ( 54) mov(1) g121.14<2>:UW f0.1<0,1,0>:UW { align1 WE_all };
> ( 56) cmp.e.f0.1(8) null:F g118<8,8,1>:UD g8.2<0,1,0>:UD { align1 WE_normal 1Q switch };
> ( 58) cmp.e.f0.1(8) null:F g119<8,8,1>:UD g8.2<0,1,0>:UD { align1 WE_normal 2Q switch };
> ( 60) or(1) a0<1>:UD g8.8<0,1,0>:UB 0x8035e00UD { align1 WE_all };
> ( 62) (+f0.1) send(16) null:UW g104<8,8,1>:UD a0<0,1,0>:UW
> data { align1 WE_normal 1H };
> ( 64) mov(1) f0.1<2>:UW g121.14<0,1,0>:UW { align1 WE_all };
> ( 66) (+f0.1) cmp.ne.f0.1(8) null:F g118<8,8,1>:UD g8.2<0,1,0>:UD { align1 WE_normal 1Q switch };
> ( 68) (+f0.1) cmp.ne.f0.1(8) null:F g119<8,8,1>:UD g8.2<0,1,0>:UD { align1 WE_normal 2Q switch };
> ( 70) (+f0.1) while(16) -28 { align1 WE_normal 1H };
>
> v2:
> 1. remove markAllChildrenExceptBTI, instead detach child0 before marking children.
> 2. fix a signed/unsigned warning in instruction.cpp
> 3. when the pointer operand of a load/store instruction is same as origin,
> don't add it to pointerOrigMap
> 4. unify elementType when creating PHINode in getPointerBase
> 5. make separate api setXXXMessageDesc() and generateXXXMessageDesc()
> 6. refine GenContext::emitStackPointer() and Gen75Context::emitStackPointer().
> 7. reuse isImageType in function.hpp
> 8. add function getKernelFunctionMetadata(Function *)
>
> Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> ---
> backend/src/backend/gen/gen_mesa_disasm.c | 100 ++---
> backend/src/backend/gen75_context.cpp | 4 +-
> backend/src/backend/gen75_encoder.cpp | 80 +++-
> backend/src/backend/gen75_encoder.hpp | 9 +-
> backend/src/backend/gen8_context.cpp | 45 +-
> backend/src/backend/gen8_encoder.cpp | 79 +++-
> backend/src/backend/gen8_encoder.hpp | 9 +-
> backend/src/backend/gen_context.cpp | 192 ++++++++-
> backend/src/backend/gen_context.hpp | 2 +
> backend/src/backend/gen_encoder.cpp | 172 ++++++--
> backend/src/backend/gen_encoder.hpp | 22 +-
> backend/src/backend/gen_insn_selection.cpp | 462 +++++++++++++--------
> backend/src/backend/gen_insn_selection.hpp | 20 +-
> backend/src/backend/program.h | 1 +
> backend/src/ir/context.hpp | 8 +-
> backend/src/ir/instruction.cpp | 109 +++--
> backend/src/ir/instruction.hpp | 36 +-
> backend/src/ir/profile.cpp | 4 +-
> backend/src/ir/profile.hpp | 3 +-
> backend/src/llvm/llvm_gen_backend.cpp | 639 +++++++++++++++++++++++------
> 20 files changed, 1439 insertions(+), 557 deletions(-)
>
> diff --git a/backend/src/backend/gen/gen_mesa_disasm.c b/backend/src/backend/gen/gen_mesa_disasm.c
> index 705f5e2..adf4e58 100644
> --- a/backend/src/backend/gen/gen_mesa_disasm.c
> +++ b/backend/src/backend/gen/gen_mesa_disasm.c
> @@ -99,8 +99,8 @@ static const struct {
> [GEN_OPCODE_CMP] = { .name = "cmp", .nsrc = 2, .ndst = 1 },
> [GEN_OPCODE_CMPN] = { .name = "cmpn", .nsrc = 2, .ndst = 1 },
>
> - [GEN_OPCODE_SEND] = { .name = "send", .nsrc = 1, .ndst = 1 },
> - [GEN_OPCODE_SENDC] = { .name = "sendc", .nsrc = 1, .ndst = 1 },
> + [GEN_OPCODE_SEND] = { .name = "send", .nsrc = 2, .ndst = 1 },
> + [GEN_OPCODE_SENDC] = { .name = "sendc", .nsrc = 2, .ndst = 1 },
> [GEN_OPCODE_NOP] = { .name = "nop", .nsrc = 0, .ndst = 0 },
> [GEN_OPCODE_JMPI] = { .name = "jmpi", .nsrc = 0, .ndst = 0 },
> [GEN_OPCODE_BRD] = { .name = "brd", .nsrc = 0, .ndst = 0 },
> @@ -1258,59 +1258,61 @@ int gen_disasm (FILE *file, const void *inst, uint32_t deviceID, uint32_t compac
> target, &space);
> }
>
> - switch (target) {
> - case GEN_SFID_SAMPLER:
> - format(file, " (%d, %d, %d, %d)",
> - SAMPLE_BTI(inst),
> - SAMPLER(inst),
> - SAMPLER_MSG_TYPE(inst),
> - SAMPLER_SIMD_MODE(inst));
> - break;
> - case GEN_SFID_DATAPORT_DATA:
> - if(UNTYPED_RW_CATEGORY(inst) == 0) {
> + if (GEN_BITS_FIELD2(inst, bits1.da1.src1_reg_file, bits2.da1.src1_reg_file) == GEN_IMMEDIATE_VALUE) {
> + switch (target) {
> + case GEN_SFID_SAMPLER:
> + format(file, " (%d, %d, %d, %d)",
> + SAMPLE_BTI(inst),
> + SAMPLER(inst),
> + SAMPLER_MSG_TYPE(inst),
> + SAMPLER_SIMD_MODE(inst));
> + break;
> + case GEN_SFID_DATAPORT_DATA:
> + if(UNTYPED_RW_CATEGORY(inst) == 0) {
> + format(file, " (bti: %d, rgba: %d, %s, %s, %s)",
> + UNTYPED_RW_BTI(inst),
> + UNTYPED_RW_RGBA(inst),
> + data_port_data_cache_simd_mode[UNTYPED_RW_SIMD_MODE(inst)],
> + data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)],
> + data_port_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]);
> + } else {
> + format(file, " (addr: %d, blocks: %s, %s, mode: %s, %s)",
> + SCRATCH_RW_OFFSET(inst),
> + data_port_scratch_block_size[SCRATCH_RW_BLOCK_SIZE(inst)],
> + data_port_scratch_invalidate[SCRATCH_RW_INVALIDATE_AFTER_READ(inst)],
> + data_port_scratch_channel_mode[SCRATCH_RW_CHANNEL_MODE(inst)],
> + data_port_scratch_msg_type[SCRATCH_RW_MSG_TYPE(inst)]);
> + }
> + break;
> + case GEN_SFID_DATAPORT1_DATA:
> format(file, " (bti: %d, rgba: %d, %s, %s, %s)",
> UNTYPED_RW_BTI(inst),
> UNTYPED_RW_RGBA(inst),
> data_port_data_cache_simd_mode[UNTYPED_RW_SIMD_MODE(inst)],
> data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)],
> - data_port_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]);
> - } else {
> - format(file, " (addr: %d, blocks: %s, %s, mode: %s, %s)",
> - SCRATCH_RW_OFFSET(inst),
> - data_port_scratch_block_size[SCRATCH_RW_BLOCK_SIZE(inst)],
> - data_port_scratch_invalidate[SCRATCH_RW_INVALIDATE_AFTER_READ(inst)],
> - data_port_scratch_channel_mode[SCRATCH_RW_CHANNEL_MODE(inst)],
> - data_port_scratch_msg_type[SCRATCH_RW_MSG_TYPE(inst)]);
> - }
> - break;
> - case GEN_SFID_DATAPORT1_DATA:
> - format(file, " (bti: %d, rgba: %d, %s, %s, %s)",
> - UNTYPED_RW_BTI(inst),
> - UNTYPED_RW_RGBA(inst),
> - data_port_data_cache_simd_mode[UNTYPED_RW_SIMD_MODE(inst)],
> - data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)],
> - data_port1_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]);
> - break;
> - case GEN_SFID_DATAPORT_CONSTANT:
> - format(file, " (bti: %d, %s)",
> - DWORD_RW_BTI(inst),
> - data_port_data_cache_msg_type[DWORD_RW_MSG_TYPE(inst)]);
> - break;
> - case GEN_SFID_MESSAGE_GATEWAY:
> - format(file, " (subfunc: %s, notify: %d, ackreq: %d)",
> - gateway_sub_function[MSG_GW_SUBFUNC(inst)],
> - MSG_GW_NOTIFY(inst),
> - MSG_GW_ACKREQ(inst));
> - break;
> -
> - default:
> - format(file, "unsupported target %d", target);
> - break;
> + data_port1_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]);
> + break;
> + case GEN_SFID_DATAPORT_CONSTANT:
> + format(file, " (bti: %d, %s)",
> + DWORD_RW_BTI(inst),
> + data_port_data_cache_msg_type[DWORD_RW_MSG_TYPE(inst)]);
> + break;
> + case GEN_SFID_MESSAGE_GATEWAY:
> + format(file, " (subfunc: %s, notify: %d, ackreq: %d)",
> + gateway_sub_function[MSG_GW_SUBFUNC(inst)],
> + MSG_GW_NOTIFY(inst),
> + MSG_GW_ACKREQ(inst));
> + break;
> +
> + default:
> + format(file, "unsupported target %d", target);
> + break;
> + }
> + if (space)
> + string(file, " ");
> + format(file, "mlen %d", GENERIC_MSG_LENGTH(inst));
> + format(file, " rlen %d", GENERIC_RESPONSE_LENGTH(inst));
> }
> - if (space)
> - string(file, " ");
> - format(file, "mlen %d", GENERIC_MSG_LENGTH(inst));
> - format(file, " rlen %d", GENERIC_RESPONSE_LENGTH(inst));
> }
> pad(file, 64);
> if (OPCODE(inst) != GEN_OPCODE_NOP) {
> diff --git a/backend/src/backend/gen75_context.cpp b/backend/src/backend/gen75_context.cpp
> index a830260..caf7043 100644
> --- a/backend/src/backend/gen75_context.cpp
> +++ b/backend/src/backend/gen75_context.cpp
> @@ -84,10 +84,9 @@ namespace gbe
> GenRegister::ud8grf(ir::ocl::stackptr) :
> GenRegister::ud16grf(ir::ocl::stackptr);
> const GenRegister stackptr = ra->genReg(selStatckPtr);
> - const GenRegister selStackBuffer = GenRegister::ud1grf(ir::ocl::stackbuffer);
> - const GenRegister bufferptr = ra->genReg(selStackBuffer);
>
> // We compute the per-lane stack pointer here
> + // private address start from zero
> p->push();
> p->curr.execWidth = 1;
> p->curr.predicate = GEN_PREDICATE_NONE;
> @@ -102,7 +101,6 @@ namespace gbe
> p->ADD(GenRegister::ud1grf(126,0), GenRegister::ud1grf(126,0), GenRegister::ud1grf(126, 4));
> p->SHL(GenRegister::ud1grf(126,0), GenRegister::ud1grf(126,0), GenRegister::immud(perThreadShift));
> p->curr.execWidth = this->simdWidth;
> - p->ADD(stackptr, stackptr, bufferptr);
> p->ADD(stackptr, stackptr, GenRegister::ud1grf(126,0));
> p->pop();
> }
> diff --git a/backend/src/backend/gen75_encoder.cpp b/backend/src/backend/gen75_encoder.cpp
> index c77ce4d..602f9c7 100644
> --- a/backend/src/backend/gen75_encoder.cpp
> +++ b/backend/src/backend/gen75_encoder.cpp
> @@ -96,8 +96,7 @@ namespace gbe
> gen7_insn->bits3.gen7_typed_rw.slot = 1;
> }
>
> - void Gen75Encoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister src, uint32_t bti, uint32_t srcNum) {
> - GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> + unsigned Gen75Encoder::setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum) {
> Gen7NativeInstruction *gen7_insn = &insn->gen7_insn;
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> @@ -111,11 +110,6 @@ namespace gbe
> } else
> NOT_IMPLEMENTED;
>
> - this->setHeader(insn);
> - this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> - this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> -
> const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA;
> setMessageDescriptor(insn, sfid, msg_length, response_length);
> gen7_insn->bits3.gen7_atomic_op.msg_type = GEN75_P1_UNTYPED_ATOMIC_OP;
> @@ -129,11 +123,26 @@ namespace gbe
> gen7_insn->bits3.gen7_atomic_op.simd_mode = GEN_ATOMIC_SIMD16;
> else
> NOT_SUPPORTED;
> + return gen7_insn->bits3.ud;
> }
>
> - void Gen75Encoder::UNTYPED_READ(GenRegister dst, GenRegister src, uint32_t bti, uint32_t elemNum) {
> + void Gen75Encoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum) {
> GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> - assert(elemNum >= 1 || elemNum <= 4);
> +
> + this->setHeader(insn);
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA;
> +
> + this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> + this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setAtomicMessageDesc(insn, function, bti.value.ud, srcNum);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> + }
> +
> + unsigned Gen75Encoder::setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) {
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> if (this->curr.execWidth == 8) {
> @@ -144,44 +153,75 @@ namespace gbe
> response_length = 2 * elemNum;
> } else
> NOT_IMPLEMENTED;
> -
> - this->setHeader(insn);
> - this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> - this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> setDPUntypedRW(insn,
> bti,
> untypedRWMask[elemNum],
> GEN75_P1_UNTYPED_READ,
> msg_length,
> response_length);
> + return insn->bits3.ud;
> }
>
> - void Gen75Encoder::UNTYPED_WRITE(GenRegister msg, uint32_t bti, uint32_t elemNum) {
> + void Gen75Encoder::UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum) {
> GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> assert(elemNum >= 1 || elemNum <= 4);
> +
> + this->setHeader(insn);
> + this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> + this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> + this->setSrc1(insn, GenRegister::immud(0));
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA;
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setUntypedReadMessageDesc(insn, bti.value.ud, elemNum);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> + }
> +
> + unsigned Gen75Encoder::setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) {
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> - this->setHeader(insn);
> if (this->curr.execWidth == 8) {
> - this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD));
> msg_length = 1 + elemNum;
> } else if (this->curr.execWidth == 16) {
> - this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW));
> msg_length = 2 * (1 + elemNum);
> }
> else
> NOT_IMPLEMENTED;
> - this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> setDPUntypedRW(insn,
> bti,
> untypedRWMask[elemNum],
> GEN75_P1_UNTYPED_SURFACE_WRITE,
> msg_length,
> response_length);
> + return insn->bits3.ud;
> }
>
> + void Gen75Encoder::UNTYPED_WRITE(GenRegister msg, GenRegister bti, uint32_t elemNum) {
> + GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> + assert(elemNum >= 1 || elemNum <= 4);
> + this->setHeader(insn);
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA;
> + if (this->curr.execWidth == 8) {
> + this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD));
> + } else if (this->curr.execWidth == 16) {
> + this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW));
> + }
> + else
> + NOT_IMPLEMENTED;
> + this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0));
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setUntypedWriteMessageDesc(insn, bti.value.ud, elemNum);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> + }
> +
> +
> void Gen75Encoder::LOAD_DF_IMM(GenRegister dest, GenRegister tmp, double value) {
> union { double d; unsigned u[2]; } u;
> u.d = value;
> diff --git a/backend/src/backend/gen75_encoder.hpp b/backend/src/backend/gen75_encoder.hpp
> index 9545157..5d80bbd 100644
> --- a/backend/src/backend/gen75_encoder.hpp
> +++ b/backend/src/backend/gen75_encoder.hpp
> @@ -48,15 +48,18 @@ namespace gbe
> virtual int getDoubleExecWidth(void) { return GEN75_DOUBLE_EXEC_WIDTH; }
> virtual void MOV_DF(GenRegister dest, GenRegister src0, GenRegister tmp = GenRegister::null());
> virtual void LOAD_DF_IMM(GenRegister dest, GenRegister tmp, double value);
> - virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister src, uint32_t bti, uint32_t srcNum);
> - virtual void UNTYPED_READ(GenRegister dst, GenRegister src, uint32_t bti, uint32_t elemNum);
> - virtual void UNTYPED_WRITE(GenRegister src, uint32_t bti, uint32_t elemNum);
> + virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum);
> + virtual void UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum);
> + virtual void UNTYPED_WRITE(GenRegister src, GenRegister bti, uint32_t elemNum);
> virtual void setHeader(GenNativeInstruction *insn);
> virtual void setDPUntypedRW(GenNativeInstruction *insn, uint32_t bti, uint32_t rgba,
> uint32_t msg_type, uint32_t msg_length, uint32_t response_length);
> virtual void setTypedWriteMessage(GenNativeInstruction *insn, unsigned char bti,
> unsigned char msg_type, uint32_t msg_length,
> bool header_present);
> + virtual unsigned setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum);
> + virtual unsigned setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum);
> + virtual unsigned setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum);
> };
> }
> #endif /* __GBE_GEN75_ENCODER_HPP__ */
> diff --git a/backend/src/backend/gen8_context.cpp b/backend/src/backend/gen8_context.cpp
> index 834a3be..69d3916 100644
> --- a/backend/src/backend/gen8_context.cpp
> +++ b/backend/src/backend/gen8_context.cpp
> @@ -817,19 +817,33 @@ namespace gbe
> p->pop();
> }
> }
> -
> void Gen8Context::emitRead64Instruction(const SelectionInstruction &insn)
> {
> - const uint32_t bti = insn.getbti();
> const uint32_t elemNum = insn.extra.elem;
> GBE_ASSERT(elemNum == 1);
>
> - const GenRegister addr = ra->genReg(insn.src(0));
> - const GenRegister tmp_dst = ra->genReg(insn.dst(0));
> + const GenRegister dst = ra->genReg(insn.dst(0));
> + const GenRegister src = ra->genReg(insn.src(0));
> + const GenRegister bti = ra->genReg(insn.src(1));
>
> /* Because BDW's store and load send instructions for 64 bits require the bti to be surfaceless,
> which we can not accept. We just fallback to 2 DW untyperead here. */
> - p->UNTYPED_READ(tmp_dst, addr, bti, elemNum*2);
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + p->UNTYPED_READ(dst, src, bti, 2*elemNum);
> + } else {
> + const GenRegister tmp = ra->genReg(insn.dst(2*elemNum));
> + unsigned desc = p->generateUntypedReadMessageDesc(0, 2*elemNum);
> +
> + unsigned jip0 = beforeMessage(insn, bti, tmp, desc);
> +
> + //predicated load
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->UNTYPED_READ(dst, src, GenRegister::retype(GenRegister::addr1(0), GEN_TYPE_UD), 2*elemNum);
> + p->pop();
> + afterMessage(insn, bti, tmp, jip0);
> + }
>
> for (uint32_t elemID = 0; elemID < elemNum; elemID++) {
> GenRegister long_tmp = ra->genReg(insn.dst(elemID));
> @@ -840,11 +854,10 @@ namespace gbe
>
> void Gen8Context::emitWrite64Instruction(const SelectionInstruction &insn)
> {
> - const uint32_t bti = insn.getbti();
> const uint32_t elemNum = insn.extra.elem;
> GBE_ASSERT(elemNum == 1);
> -
> const GenRegister addr = ra->genReg(insn.src(elemNum));
> + const GenRegister bti = ra->genReg(insn.src(elemNum*2+1));
>
> /* Because BDW's store and load send instructions for 64 bits require the bti to be surfaceless,
> which we can not accept. We just fallback to 2 DW untypewrite here. */
> @@ -854,9 +867,23 @@ namespace gbe
> this->unpackLongVec(the_long, long_tmp, p->curr.execWidth);
> }
>
> - p->UNTYPED_WRITE(addr, bti, elemNum*2);
> - }
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + p->UNTYPED_WRITE(addr, bti, elemNum*2);
> + } else {
> + const GenRegister tmp = ra->genReg(insn.dst(elemNum));
> + unsigned desc = p->generateUntypedWriteMessageDesc(0, elemNum*2);
> +
> + unsigned jip0 = beforeMessage(insn, bti, tmp, desc);
>
> + //predicated load
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->UNTYPED_WRITE(addr, GenRegister::addr1(0), elemNum*2);
> + p->pop();
> + afterMessage(insn, bti, tmp, jip0);
> + }
> + }
> void Gen8Context::emitPackLongInstruction(const SelectionInstruction &insn) {
> const GenRegister src = ra->genReg(insn.src(0));
> const GenRegister dst = ra->genReg(insn.dst(0));
> diff --git a/backend/src/backend/gen8_encoder.cpp b/backend/src/backend/gen8_encoder.cpp
> index f02a2ca..fd35838 100644
> --- a/backend/src/backend/gen8_encoder.cpp
> +++ b/backend/src/backend/gen8_encoder.cpp
> @@ -103,9 +103,7 @@ namespace gbe
> void Gen8Encoder::F32TO16(GenRegister dest, GenRegister src0) {
> MOV(GenRegister::retype(dest, GEN_TYPE_HF), GenRegister::retype(src0, GEN_TYPE_F));
> }
> -
> - void Gen8Encoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister src, uint32_t bti, uint32_t srcNum) {
> - GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> + unsigned Gen8Encoder::setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum) {
> Gen8NativeInstruction *gen8_insn = &insn->gen8_insn;
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> @@ -119,11 +117,6 @@ namespace gbe
> } else
> NOT_IMPLEMENTED;
>
> - this->setHeader(insn);
> - this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> - this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> -
> const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA;
> setMessageDescriptor(insn, sfid, msg_length, response_length);
> gen8_insn->bits3.gen7_atomic_op.msg_type = GEN75_P1_UNTYPED_ATOMIC_OP;
> @@ -137,11 +130,26 @@ namespace gbe
> gen8_insn->bits3.gen7_atomic_op.simd_mode = GEN_ATOMIC_SIMD16;
> else
> NOT_SUPPORTED;
> + return gen8_insn->bits3.ud;
> }
>
> - void Gen8Encoder::UNTYPED_READ(GenRegister dst, GenRegister src, uint32_t bti, uint32_t elemNum) {
> + void Gen8Encoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum) {
> GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> - assert(elemNum >= 1 || elemNum <= 4);
> +
> + this->setHeader(insn);
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA;
> +
> + this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> + this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setAtomicMessageDesc(insn, function, bti.value.ud, srcNum);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> + }
> + unsigned Gen8Encoder::setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) {
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> if (this->curr.execWidth == 8) {
> @@ -152,44 +160,73 @@ namespace gbe
> response_length = 2 * elemNum;
> } else
> NOT_IMPLEMENTED;
> -
> - this->setHeader(insn);
> - this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> - this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> setDPUntypedRW(insn,
> bti,
> untypedRWMask[elemNum],
> GEN75_P1_UNTYPED_READ,
> msg_length,
> response_length);
> + return insn->bits3.ud;
> }
>
> - void Gen8Encoder::UNTYPED_WRITE(GenRegister msg, uint32_t bti, uint32_t elemNum) {
> + void Gen8Encoder::UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum) {
> GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> assert(elemNum >= 1 || elemNum <= 4);
> +
> + this->setHeader(insn);
> + this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> + this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> + this->setSrc1(insn, GenRegister::immud(0));
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA;
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setUntypedReadMessageDesc(insn, bti.value.ud, elemNum);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> + }
> +
> + unsigned Gen8Encoder::setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) {
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> - this->setHeader(insn);
> if (this->curr.execWidth == 8) {
> - this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD));
> msg_length = 1 + elemNum;
> } else if (this->curr.execWidth == 16) {
> - this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW));
> msg_length = 2 * (1 + elemNum);
> }
> else
> NOT_IMPLEMENTED;
> - this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> setDPUntypedRW(insn,
> bti,
> untypedRWMask[elemNum],
> GEN75_P1_UNTYPED_SURFACE_WRITE,
> msg_length,
> response_length);
> + return insn->bits3.ud;
> }
>
> + void Gen8Encoder::UNTYPED_WRITE(GenRegister msg, GenRegister bti, uint32_t elemNum) {
> + GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> + assert(elemNum >= 1 || elemNum <= 4);
> + this->setHeader(insn);
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA;
> + if (this->curr.execWidth == 8) {
> + this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD));
> + } else if (this->curr.execWidth == 16) {
> + this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW));
> + }
> + else
> + NOT_IMPLEMENTED;
> + this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0));
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setUntypedWriteMessageDesc(insn, bti.value.ud, elemNum);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> + }
> void Gen8Encoder::LOAD_DF_IMM(GenRegister dest, GenRegister tmp, double value) {
> union { double d; unsigned u[2]; } u;
> u.d = value;
> diff --git a/backend/src/backend/gen8_encoder.hpp b/backend/src/backend/gen8_encoder.hpp
> index 4c5e556..504e13d 100644
> --- a/backend/src/backend/gen8_encoder.hpp
> +++ b/backend/src/backend/gen8_encoder.hpp
> @@ -49,9 +49,9 @@ namespace gbe
> virtual void MOV_DF(GenRegister dest, GenRegister src0, GenRegister tmp = GenRegister::null());
> virtual void LOAD_DF_IMM(GenRegister dest, GenRegister tmp, double value);
> virtual void LOAD_INT64_IMM(GenRegister dest, GenRegister value);
> - virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister src, uint32_t bti, uint32_t srcNum);
> - virtual void UNTYPED_READ(GenRegister dst, GenRegister src, uint32_t bti, uint32_t elemNum);
> - virtual void UNTYPED_WRITE(GenRegister src, uint32_t bti, uint32_t elemNum);
> + virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum);
> + virtual void UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum);
> + virtual void UNTYPED_WRITE(GenRegister src, GenRegister bti, uint32_t elemNum);
> virtual void setHeader(GenNativeInstruction *insn);
> virtual void setDPUntypedRW(GenNativeInstruction *insn, uint32_t bti, uint32_t rgba,
> uint32_t msg_type, uint32_t msg_length, uint32_t response_length);
> @@ -66,6 +66,9 @@ namespace gbe
> GenRegister src0, GenRegister src1, GenRegister src2);
> virtual bool canHandleLong(uint32_t opcode, GenRegister dst, GenRegister src0,
> GenRegister src1 = GenRegister::null());
> + virtual unsigned setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum);
> + virtual unsigned setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum);
> + virtual unsigned setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum);
> };
> }
> #endif /* __GBE_GEN8_ENCODER_HPP__ */
> diff --git a/backend/src/backend/gen_context.cpp b/backend/src/backend/gen_context.cpp
> index 94094fc..43d14d2 100644
> --- a/backend/src/backend/gen_context.cpp
> +++ b/backend/src/backend/gen_context.cpp
> @@ -192,10 +192,10 @@ namespace gbe
> GenRegister::ud8grf(ir::ocl::stackptr) :
> GenRegister::ud16grf(ir::ocl::stackptr);
> const GenRegister stackptr = ra->genReg(selStatckPtr);
> - const GenRegister selStackBuffer = GenRegister::ud1grf(ir::ocl::stackbuffer);
> - const GenRegister bufferptr = ra->genReg(selStackBuffer);
>
> // We compute the per-lane stack pointer here
> + // threadId * perThreadSize + laneId*perLaneSize
> + // let private address start from zero
> p->push();
> p->curr.execWidth = 1;
> p->curr.predicate = GEN_PREDICATE_NONE;
> @@ -205,7 +205,6 @@ namespace gbe
> p->curr.execWidth = 1;
> p->SHL(GenRegister::ud1grf(126,0), GenRegister::ud1grf(126,0), GenRegister::immud(perThreadShift));
> p->curr.execWidth = this->simdWidth;
> - p->ADD(stackptr, stackptr, bufferptr);
> p->ADD(stackptr, stackptr, GenRegister::ud1grf(126,0));
> p->pop();
> }
> @@ -1721,9 +1720,25 @@ namespace gbe
> const GenRegister src = ra->genReg(insn.src(0));
> const GenRegister dst = ra->genReg(insn.dst(0));
> const uint32_t function = insn.extra.function;
> - const uint32_t bti = insn.getbti();
> + unsigned srcNum = insn.extra.elem;
> +
> + const GenRegister bti = ra->genReg(insn.src(srcNum));
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + p->ATOMIC(dst, function, src, bti, srcNum);
> + } else {
> + GenRegister flagTemp = ra->genReg(insn.dst(1));
> +
> + unsigned desc = p->generateAtomicMessageDesc(function, 0, srcNum);
>
> - p->ATOMIC(dst, function, src, bti, insn.srcNum);
> + unsigned jip0 = beforeMessage(insn, bti, flagTemp, desc);
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->ATOMIC(dst, function, src, GenRegister::addr1(0), srcNum);
> + p->pop();
> + afterMessage(insn, bti, flagTemp, jip0);
> + }
> }
>
> void GenContext::emitIndirectMoveInstruction(const SelectionInstruction &insn) {
> @@ -1855,48 +1870,188 @@ namespace gbe
> }
>
> void GenContext::emitRead64Instruction(const SelectionInstruction &insn) {
> - const uint32_t elemNum = insn.extra.elem;
> + const uint32_t elemNum = insn.extra.elem * 2;
> const GenRegister dst = ra->genReg(insn.dst(0));
> const GenRegister src = ra->genReg(insn.src(0));
> - const uint32_t bti = insn.getbti();
> - p->UNTYPED_READ(dst, src, bti, elemNum*2);
> + const GenRegister bti = ra->genReg(insn.src(1));
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + p->UNTYPED_READ(dst, src, bti, elemNum);
> + } else {
> + const GenRegister tmp = ra->genReg(insn.dst(elemNum));
> + unsigned desc = p->generateUntypedReadMessageDesc(0, elemNum);
> +
> + unsigned jip0 = beforeMessage(insn, bti, tmp, desc);
> +
> + //predicated load
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->UNTYPED_READ(dst, src, GenRegister::retype(GenRegister::addr1(0), GEN_TYPE_UD), elemNum);
> + p->pop();
> + afterMessage(insn, bti, tmp, jip0);
> + }
> + }
> + unsigned GenContext::beforeMessage(const SelectionInstruction &insn, GenRegister bti, GenRegister tmp, unsigned desc) {
> + const GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag);
> + setFlag(flagReg, GenRegister::immuw(0));
> + p->CMP(GEN_CONDITIONAL_NZ, flagReg, GenRegister::immuw(1));
> +
> + GenRegister btiUD = ra->genReg(GenRegister::ud1grf(ir::ocl::btiUtil));
> + GenRegister btiUW = ra->genReg(GenRegister::uw1grf(ir::ocl::btiUtil));
> + GenRegister btiUB = ra->genReg(GenRegister::ub1grf(ir::ocl::btiUtil));
> + unsigned jip0 = p->n_instruction();
> + p->push();
> + p->curr.execWidth = 1;
> + p->curr.noMask = 1;
> + p->AND(btiUD, flagReg, GenRegister::immud(0xffffffff));
> + p->LZD(btiUD, btiUD);
> + p->ADD(btiUW, GenRegister::negate(btiUW), GenRegister::immuw(0x1f));
> + p->MUL(btiUW, btiUW, GenRegister::immuw(0x4));
> + p->ADD(GenRegister::addr1(0), btiUW, GenRegister::immud(bti.nr*32));
> + p->MOV(btiUD, GenRegister::indirect(GEN_TYPE_UD, 0, GEN_WIDTH_1, GEN_VERTICAL_STRIDE_ONE_DIMENSIONAL, GEN_HORIZONTAL_STRIDE_0));
> + //save flag
> + p->MOV(tmp, flagReg);
> + p->pop();
> +
> + p->CMP(GEN_CONDITIONAL_Z, bti, btiUD);
> + p->push();
> + p->curr.execWidth = 1;
> + p->curr.noMask = 1;
> + p->OR(GenRegister::retype(GenRegister::addr1(0), GEN_TYPE_UD), btiUB, GenRegister::immud(desc));
> + p->pop();
> + return jip0;
> + }
> + void GenContext::afterMessage(const SelectionInstruction &insn, GenRegister bti, GenRegister tmp, unsigned jip0) {
> + const GenRegister btiUD = ra->genReg(GenRegister::ud1grf(ir::ocl::btiUtil));
> + //restore flag
> + setFlag(GenRegister::flag(insn.state.flag, insn.state.subFlag), tmp);
> + // get active channel
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->CMP(GEN_CONDITIONAL_NZ, bti, btiUD);
> + unsigned jip1 = p->n_instruction();
> + p->WHILE(GenRegister::immud(0));
> + p->pop();
> + p->patchJMPI(jip1, jip0 - jip1, 0);
> }
>
> void GenContext::emitUntypedReadInstruction(const SelectionInstruction &insn) {
> const GenRegister dst = ra->genReg(insn.dst(0));
> const GenRegister src = ra->genReg(insn.src(0));
> - const uint32_t bti = insn.getbti();
> + const GenRegister bti = ra->genReg(insn.src(1));
> +
> const uint32_t elemNum = insn.extra.elem;
> - p->UNTYPED_READ(dst, src, bti, elemNum);
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + p->UNTYPED_READ(dst, src, bti, elemNum);
> + } else {
> + const GenRegister tmp = ra->genReg(insn.dst(elemNum));
> + unsigned desc = p->generateUntypedReadMessageDesc(0, elemNum);
> +
> + unsigned jip0 = beforeMessage(insn, bti, tmp, desc);
> +
> + //predicated load
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->UNTYPED_READ(dst, src, GenRegister::retype(GenRegister::addr1(0), GEN_TYPE_UD), elemNum);
> + p->pop();
> + afterMessage(insn, bti, tmp, jip0);
> + }
> }
>
> void GenContext::emitWrite64Instruction(const SelectionInstruction &insn) {
> const GenRegister src = ra->genReg(insn.dst(0));
> const uint32_t elemNum = insn.extra.elem;
> - const uint32_t bti = insn.getbti();
> - p->UNTYPED_WRITE(src, bti, elemNum*2);
> + const GenRegister bti = ra->genReg(insn.src(elemNum+1));
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + p->UNTYPED_WRITE(src, bti, elemNum*2);
> + } else {
> + const GenRegister tmp = ra->genReg(insn.dst(0));
> + unsigned desc = p->generateUntypedWriteMessageDesc(0, elemNum*2);
> +
> + unsigned jip0 = beforeMessage(insn, bti, tmp, desc);
> +
> + //predicated load
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->UNTYPED_WRITE(src, GenRegister::addr1(0), elemNum*2);
> + p->pop();
> + afterMessage(insn, bti, tmp, jip0);
> + }
> }
>
> void GenContext::emitUntypedWriteInstruction(const SelectionInstruction &insn) {
> const GenRegister src = ra->genReg(insn.src(0));
> - const uint32_t bti = insn.getbti();
> const uint32_t elemNum = insn.extra.elem;
> - p->UNTYPED_WRITE(src, bti, elemNum);
> + const GenRegister bti = ra->genReg(insn.src(elemNum+1));
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + p->UNTYPED_WRITE(src, bti, elemNum);
> + } else {
> + const GenRegister tmp = ra->genReg(insn.dst(0));
> + unsigned desc = p->generateUntypedWriteMessageDesc(0, elemNum);
> +
> + unsigned jip0 = beforeMessage(insn, bti, tmp, desc);
> +
> + //predicated load
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->UNTYPED_WRITE(src, GenRegister::addr1(0), elemNum);
> + p->pop();
> + afterMessage(insn, bti, tmp, jip0);
> + }
> }
>
> void GenContext::emitByteGatherInstruction(const SelectionInstruction &insn) {
> const GenRegister dst = ra->genReg(insn.dst(0));
> const GenRegister src = ra->genReg(insn.src(0));
> - const uint32_t bti = insn.getbti();
> + const GenRegister bti = ra->genReg(insn.src(1));
> const uint32_t elemSize = insn.extra.elem;
> - p->BYTE_GATHER(dst, src, bti, elemSize);
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + p->BYTE_GATHER(dst, src, bti, elemSize);
> + } else {
> + const GenRegister tmp = ra->genReg(insn.dst(1));
> + unsigned desc = p->generateByteGatherMessageDesc(0, elemSize);
> +
> + unsigned jip0 = beforeMessage(insn, bti, tmp, desc);
> +
> + //predicated load
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->BYTE_GATHER(dst, src, GenRegister::addr1(0), elemSize);
> + p->pop();
> + afterMessage(insn, bti, tmp, jip0);
> + }
> }
>
> void GenContext::emitByteScatterInstruction(const SelectionInstruction &insn) {
> const GenRegister src = ra->genReg(insn.src(0));
> - const uint32_t bti = insn.getbti();
> const uint32_t elemSize = insn.extra.elem;
> - p->BYTE_SCATTER(src, bti, elemSize);
> + const GenRegister bti = ra->genReg(insn.src(2));
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + p->BYTE_SCATTER(src, bti, elemSize);
> + } else {
> + const GenRegister tmp = ra->genReg(insn.dst(0));
> + unsigned desc = p->generateByteScatterMessageDesc(0, elemSize);
> +
> + unsigned jip0 = beforeMessage(insn, bti, tmp, desc);
> +
> + //predicated load
> + p->push();
> + p->curr.predicate = GEN_PREDICATE_NORMAL;
> + p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> + p->BYTE_SCATTER(src, GenRegister::addr1(0), elemSize);
> + p->pop();
> + afterMessage(insn, bti, tmp, jip0);
> + }
> +
> }
>
> void GenContext::emitUnpackByteInstruction(const SelectionInstruction &insn) {
> @@ -2032,6 +2187,7 @@ namespace gbe
> allocCurbeReg(lid2, GBE_CURBE_LOCAL_ID_Z);
> allocCurbeReg(zero, GBE_CURBE_ZERO);
> allocCurbeReg(one, GBE_CURBE_ONE);
> + allocCurbeReg(btiUtil, GBE_CURBE_BTI_UTIL);
> if (stackUse.size() != 0)
> allocCurbeReg(stackbuffer, GBE_CURBE_EXTRA_ARGUMENT, GBE_STACK_BUFFER);
> // Go over the arguments and find the related patch locations
> diff --git a/backend/src/backend/gen_context.hpp b/backend/src/backend/gen_context.hpp
> index 560248a..a85657c 100644
> --- a/backend/src/backend/gen_context.hpp
> +++ b/backend/src/backend/gen_context.hpp
> @@ -169,6 +169,8 @@ namespace gbe
> virtual void emitI64DIVREMInstruction(const SelectionInstruction &insn);
> void scratchWrite(const GenRegister header, uint32_t offset, uint32_t reg_num, uint32_t reg_type, uint32_t channel_mode);
> void scratchRead(const GenRegister dst, const GenRegister header, uint32_t offset, uint32_t reg_num, uint32_t reg_type, uint32_t channel_mode);
> + unsigned beforeMessage(const SelectionInstruction &insn, GenRegister bti, GenRegister flagTemp, unsigned desc);
> + void afterMessage(const SelectionInstruction &insn, GenRegister bti, GenRegister flagTemp, unsigned jip0);
>
> /*! Implements base class */
> virtual Kernel *allocateKernel(void);
> diff --git a/backend/src/backend/gen_encoder.cpp b/backend/src/backend/gen_encoder.cpp
> index 5aa8c5c..cac29e8 100644
> --- a/backend/src/backend/gen_encoder.cpp
> +++ b/backend/src/backend/gen_encoder.cpp
> @@ -329,10 +329,13 @@ namespace gbe
> GEN_UNTYPED_ALPHA,
> 0
> };
> + unsigned GenEncoder::generateUntypedReadMessageDesc(unsigned bti, unsigned elemNum) {
> + GenNativeInstruction insn;
> + memset(&insn, 0, sizeof(GenNativeInstruction));
> + return setUntypedReadMessageDesc(&insn, bti, elemNum);
> + }
>
> - void GenEncoder::UNTYPED_READ(GenRegister dst, GenRegister src, uint32_t bti, uint32_t elemNum) {
> - GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> - assert(elemNum >= 1 || elemNum <= 4);
> + unsigned GenEncoder::setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) {
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> if (this->curr.execWidth == 8) {
> @@ -340,49 +343,88 @@ namespace gbe
> response_length = elemNum;
> } else if (this->curr.execWidth == 16) {
> msg_length = 2;
> - response_length = 2*elemNum;
> + response_length = 2 * elemNum;
> } else
> NOT_IMPLEMENTED;
> -
> - this->setHeader(insn);
> - this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> - this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> setDPUntypedRW(insn,
> bti,
> untypedRWMask[elemNum],
> GEN7_UNTYPED_READ,
> msg_length,
> response_length);
> + return insn->bits3.ud;
> }
>
> - void GenEncoder::UNTYPED_WRITE(GenRegister msg, uint32_t bti, uint32_t elemNum) {
> + void GenEncoder::UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum) {
> GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> assert(elemNum >= 1 || elemNum <= 4);
> +
> + this->setHeader(insn);
> + this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> + this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA;
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setUntypedReadMessageDesc(insn, bti.value.ud, elemNum);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> + }
> +
> + unsigned GenEncoder::generateUntypedWriteMessageDesc(unsigned bti, unsigned elemNum) {
> + GenNativeInstruction insn;
> + memset(&insn, 0, sizeof(GenNativeInstruction));
> + return setUntypedWriteMessageDesc(&insn, bti, elemNum);
> + }
> +
> + unsigned GenEncoder::setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) {
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> - this->setHeader(insn);
> if (this->curr.execWidth == 8) {
> - this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD));
> - msg_length = 1+elemNum;
> + msg_length = 1 + elemNum;
> } else if (this->curr.execWidth == 16) {
> - this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW));
> - msg_length = 2*(1+elemNum);
> + msg_length = 2 * (1 + elemNum);
> }
> else
> NOT_IMPLEMENTED;
> - this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> setDPUntypedRW(insn,
> bti,
> untypedRWMask[elemNum],
> GEN7_UNTYPED_WRITE,
> msg_length,
> response_length);
> + return insn->bits3.ud;
> }
>
> - void GenEncoder::BYTE_GATHER(GenRegister dst, GenRegister src, uint32_t bti, uint32_t elemSize) {
> + void GenEncoder::UNTYPED_WRITE(GenRegister msg, GenRegister bti, uint32_t elemNum) {
> GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> + assert(elemNum >= 1 || elemNum <= 4);
> + this->setHeader(insn);
> + if (this->curr.execWidth == 8) {
> + this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD));
> + } else if (this->curr.execWidth == 16) {
> + this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW));
> + }
> + else
> + NOT_IMPLEMENTED;
> + this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0));
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA;
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setUntypedWriteMessageDesc(insn, bti.value.ud, elemNum);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> + }
> +
> + unsigned GenEncoder::generateByteGatherMessageDesc(unsigned bti, unsigned elemSize) {
> + GenNativeInstruction insn;
> + memset(&insn, 0, sizeof(GenNativeInstruction));
> + return setByteGatherMessageDesc(&insn, bti, elemSize);
> + }
> +
> + unsigned GenEncoder::setByteGatherMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize) {
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> if (this->curr.execWidth == 8) {
> @@ -393,11 +435,6 @@ namespace gbe
> response_length = 2;
> } else
> NOT_IMPLEMENTED;
> -
> - this->setHeader(insn);
> - this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> - this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> setDPByteScatterGather(this,
> insn,
> bti,
> @@ -405,23 +442,42 @@ namespace gbe
> GEN7_BYTE_GATHER,
> msg_length,
> response_length);
> + return insn->bits3.ud;
> +
> }
>
> - void GenEncoder::BYTE_SCATTER(GenRegister msg, uint32_t bti, uint32_t elemSize) {
> + void GenEncoder::BYTE_GATHER(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemSize) {
> GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> + this->setHeader(insn);
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA;
> +
> + this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> + this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setByteGatherMessageDesc(insn, bti.value.ud, elemSize);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> + }
> +
> + unsigned GenEncoder::generateByteScatterMessageDesc(unsigned bti, unsigned elemSize) {
> + GenNativeInstruction insn;
> + memset(&insn, 0, sizeof(GenNativeInstruction));
> + return setByteScatterMessageDesc(&insn, bti, elemSize);
> + }
> +
> + unsigned GenEncoder::setByteScatterMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize) {
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
> - this->setHeader(insn);
> if (this->curr.execWidth == 8) {
> - this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD));
> msg_length = 2;
> } else if (this->curr.execWidth == 16) {
> - this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW));
> msg_length = 4;
> } else
> NOT_IMPLEMENTED;
> - this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> +
> setDPByteScatterGather(this,
> insn,
> bti,
> @@ -429,6 +485,30 @@ namespace gbe
> GEN7_BYTE_SCATTER,
> msg_length,
> response_length);
> + return insn->bits3.ud;
> + }
> +
> + void GenEncoder::BYTE_SCATTER(GenRegister msg, GenRegister bti, uint32_t elemSize) {
> + GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> +
> + this->setHeader(insn);
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA;
> +
> + if (this->curr.execWidth == 8) {
> + this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD));
> + } else if (this->curr.execWidth == 16) {
> + this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW));
> + } else
> + NOT_IMPLEMENTED;
> +
> + this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0));
> +
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setByteScatterMessageDesc(insn, bti.value.ud, elemSize);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> }
>
> void GenEncoder::DWORD_GATHER(GenRegister dst, GenRegister src, uint32_t bti) {
> @@ -461,8 +541,13 @@ namespace gbe
>
> }
>
> - void GenEncoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister src, uint32_t bti, uint32_t srcNum) {
> - GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
> + unsigned GenEncoder::generateAtomicMessageDesc(unsigned function, unsigned bti, unsigned srcNum) {
> + GenNativeInstruction insn;
> + memset(&insn, 0, sizeof(GenNativeInstruction));
> + return setAtomicMessageDesc(&insn, function, bti, srcNum);
> + }
> +
> + unsigned GenEncoder::setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum) {
> uint32_t msg_length = 0;
> uint32_t response_length = 0;
>
> @@ -470,16 +555,11 @@ namespace gbe
> msg_length = srcNum;
> response_length = 1;
> } else if (this->curr.execWidth == 16) {
> - msg_length = 2*srcNum;
> + msg_length = 2 * srcNum;
> response_length = 2;
> } else
> NOT_IMPLEMENTED;
>
> - this->setHeader(insn);
> - this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> - this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> - this->setSrc1(insn, GenRegister::immud(0));
> -
> const GenMessageTarget sfid = GEN_SFID_DATAPORT_DATA;
> setMessageDescriptor(insn, sfid, msg_length, response_length);
> insn->bits3.gen7_atomic_op.msg_type = GEN7_UNTYPED_ATOMIC_READ;
> @@ -493,7 +573,23 @@ namespace gbe
> insn->bits3.gen7_atomic_op.simd_mode = GEN_ATOMIC_SIMD16;
> else
> NOT_SUPPORTED;
> + return insn->bits3.ud;
> + }
> +
> + void GenEncoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum) {
> + GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND);
>
> + this->setHeader(insn);
> + insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA;
> +
> + this->setDst(insn, GenRegister::uw16grf(dst.nr, 0));
> + this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0));
> + if (bti.file == GEN_IMMEDIATE_VALUE) {
> + this->setSrc1(insn, GenRegister::immud(0));
> + setAtomicMessageDesc(insn, function, bti.value.ud, srcNum);
> + } else {
> + this->setSrc1(insn, bti);
> + }
> }
> GenCompactInstruction *GenEncoder::nextCompact(uint32_t opcode) {
> GenCompactInstruction insn;
> @@ -893,6 +989,8 @@ namespace gbe
> ALU2_BRA(BRD)
> ALU2_BRA(BRC)
>
> + // jip is the distance between jump instruction and jump-target. we have handled
> + // pre/post-increment in patchJMPI() function body
> void GenEncoder::patchJMPI(uint32_t insnID, int32_t jip, int32_t uip) {
> GenNativeInstruction &insn = *(GenNativeInstruction *)&this->store[insnID];
> GBE_ASSERT(insnID < this->store.size());
> diff --git a/backend/src/backend/gen_encoder.hpp b/backend/src/backend/gen_encoder.hpp
> index 21faabc..79e7b6e 100644
> --- a/backend/src/backend/gen_encoder.hpp
> +++ b/backend/src/backend/gen_encoder.hpp
> @@ -169,15 +169,15 @@ namespace gbe
> /*! Wait instruction (used for the barrier) */
> void WAIT(void);
> /*! Atomic instructions */
> - virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister src, uint32_t bti, uint32_t srcNum);
> + virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum);
> /*! Untyped read (upto 4 channels) */
> - virtual void UNTYPED_READ(GenRegister dst, GenRegister src, uint32_t bti, uint32_t elemNum);
> + virtual void UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum);
> /*! Untyped write (upto 4 channels) */
> - virtual void UNTYPED_WRITE(GenRegister src, uint32_t bti, uint32_t elemNum);
> + virtual void UNTYPED_WRITE(GenRegister src, GenRegister bti, uint32_t elemNum);
> /*! Byte gather (for unaligned bytes, shorts and ints) */
> - void BYTE_GATHER(GenRegister dst, GenRegister src, uint32_t bti, uint32_t elemSize);
> + void BYTE_GATHER(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemSize);
> /*! Byte scatter (for unaligned bytes, shorts and ints) */
> - void BYTE_SCATTER(GenRegister src, uint32_t bti, uint32_t elemSize);
> + void BYTE_SCATTER(GenRegister src, GenRegister bti, uint32_t elemSize);
> /*! DWord gather (for constant cache read) */
> void DWORD_GATHER(GenRegister dst, GenRegister src, uint32_t bti);
> /*! for scratch memory read */
> @@ -230,6 +230,18 @@ namespace gbe
> void setMessageDescriptor(GenNativeInstruction *inst, enum GenMessageTarget sfid,
> unsigned msg_length, unsigned response_length,
> bool header_present = false, bool end_of_thread = false);
> + virtual unsigned setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum);
> + virtual unsigned setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum);
> + virtual unsigned setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum);
> + unsigned setByteGatherMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize);
> + unsigned setByteScatterMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize);
> +
> + unsigned generateAtomicMessageDesc(unsigned function, unsigned bti, unsigned srcNum);
> + unsigned generateUntypedReadMessageDesc(unsigned bti, unsigned elemNum);
> + unsigned generateUntypedWriteMessageDesc(unsigned bti, unsigned elemNum);
> + unsigned generateByteGatherMessageDesc(unsigned bti, unsigned elemSize);
> + unsigned generateByteScatterMessageDesc(unsigned bti, unsigned elemSize);
> +
> virtual void setHeader(GenNativeInstruction *insn) = 0;
> virtual void setDst(GenNativeInstruction *insn, GenRegister dest) = 0;
> virtual void setSrc0(GenNativeInstruction *insn, GenRegister reg) = 0;
> diff --git a/backend/src/backend/gen_insn_selection.cpp b/backend/src/backend/gen_insn_selection.cpp
> index 7d4ea00..a68d0ce 100644
> --- a/backend/src/backend/gen_insn_selection.cpp
> +++ b/backend/src/backend/gen_insn_selection.cpp
> @@ -598,19 +598,19 @@ namespace gbe
> /*! Wait instruction (used for the barrier) */
> void WAIT(void);
> /*! Atomic instruction */
> - void ATOMIC(Reg dst, uint32_t function, uint32_t srcNum, Reg src0, Reg src1, Reg src2, uint32_t bti);
> + void ATOMIC(Reg dst, uint32_t function, uint32_t srcNum, Reg src0, Reg src1, Reg src2, GenRegister bti, GenRegister *flagTemp);
> /*! Read 64 bits float/int array */
> - void READ64(Reg addr, const GenRegister *dst, const GenRegister *tmp, uint32_t elemNum, uint32_t bti, bool native_long);
> + void READ64(Reg addr, const GenRegister *dst, const GenRegister *tmp, uint32_t elemNum, const GenRegister bti, bool native_long, GenRegister *flagTemp);
> /*! Write 64 bits float/int array */
> - void WRITE64(Reg addr, const GenRegister *src, const GenRegister *tmp, uint32_t srcNum, uint32_t bti, bool native_long);
> + void WRITE64(Reg addr, const GenRegister *src, const GenRegister *tmp, uint32_t srcNum, GenRegister bti, bool native_long, GenRegister *flagTemp);
> /*! Untyped read (up to 4 elements) */
> - void UNTYPED_READ(Reg addr, const GenRegister *dst, uint32_t elemNum, uint32_t bti);
> + void UNTYPED_READ(Reg addr, const GenRegister *dst, uint32_t elemNum, GenRegister bti, GenRegister *flagTemp);
> /*! Untyped write (up to 4 elements) */
> - void UNTYPED_WRITE(Reg addr, const GenRegister *src, uint32_t elemNum, uint32_t bti);
> + void UNTYPED_WRITE(Reg addr, const GenRegister *src, uint32_t elemNum, GenRegister bti, GenRegister *flagTemp);
> /*! Byte gather (for unaligned bytes, shorts and ints) */
> - void BYTE_GATHER(Reg dst, Reg addr, uint32_t elemSize, uint32_t bti);
> + void BYTE_GATHER(Reg dst, Reg addr, uint32_t elemSize, GenRegister bti, GenRegister *flagTemp);
> /*! Byte scatter (for unaligned bytes, shorts and ints) */
> - void BYTE_SCATTER(Reg addr, Reg src, uint32_t elemSize, uint32_t bti);
> + void BYTE_SCATTER(Reg addr, Reg src, uint32_t elemSize, GenRegister bti, GenRegister *flagTemp);
> /*! DWord scatter (for constant cache read) */
> void DWORD_GATHER(Reg dst, Reg addr, uint32_t bti);
> /*! Unpack the uint to charN */
> @@ -1204,16 +1204,26 @@ namespace gbe
>
> void Selection::Opaque::ATOMIC(Reg dst, uint32_t function,
> uint32_t srcNum, Reg src0,
> - Reg src1, Reg src2, uint32_t bti) {
> - SelectionInstruction *insn = this->appendInsn(SEL_OP_ATOMIC, 1, srcNum);
> + Reg src1, Reg src2, GenRegister bti, GenRegister *flagTemp) {
> + unsigned dstNum = flagTemp == NULL ? 1 : 2;
> + SelectionInstruction *insn = this->appendInsn(SEL_OP_ATOMIC, dstNum, srcNum + 1);
> +
> + if (bti.file != GEN_IMMEDIATE_VALUE) {
> + insn->state.flag = 0;
> + insn->state.subFlag = 1;
> + }
> +
> insn->dst(0) = dst;
> + if(flagTemp) insn->dst(1) = *flagTemp;
> +
> insn->src(0) = src0;
> if(srcNum > 1) insn->src(1) = src1;
> if(srcNum > 2) insn->src(2) = src2;
> + insn->src(srcNum) = bti;
> insn->extra.function = function;
> - insn->setbti(bti);
> - SelectionVector *vector = this->appendVector();
> + insn->extra.elem = srcNum;
>
> + SelectionVector *vector = this->appendVector();
> vector->regNum = srcNum;
> vector->reg = &insn->src(0);
> vector->isSrc = 1;
> @@ -1227,22 +1237,29 @@ namespace gbe
> const GenRegister *dst,
> const GenRegister *tmp,
> uint32_t elemNum,
> - uint32_t bti,
> - bool native_long)
> + const GenRegister bti,
> + bool native_long,
> + GenRegister *flagTemp)
> {
> SelectionInstruction *insn = NULL;
> SelectionVector *srcVector = NULL;
> SelectionVector *dstVector = NULL;
>
> if (!native_long) {
> - insn = this->appendInsn(SEL_OP_READ64, elemNum, 1);
> + unsigned dstNum = flagTemp == NULL ? elemNum : elemNum+1;
> + insn = this->appendInsn(SEL_OP_READ64, dstNum, 2);
> srcVector = this->appendVector();
> dstVector = this->appendVector();
> // Regular instruction to encode
> for (uint32_t elemID = 0; elemID < elemNum; ++elemID)
> insn->dst(elemID) = dst[elemID];
> +
> + // flagTemp don't need to be put in SelectionVector
> + if (flagTemp)
> + insn->dst(elemNum) = *flagTemp;
> } else {
> - insn = this->appendInsn(SEL_OP_READ64, elemNum*2, 1);
> + unsigned dstNum = flagTemp == NULL ? elemNum*2 : elemNum*2+1;
> + insn = this->appendInsn(SEL_OP_READ64, dstNum, 2);
> srcVector = this->appendVector();
> dstVector = this->appendVector();
>
> @@ -1251,10 +1268,20 @@ namespace gbe
>
> for (uint32_t elemID = 0; elemID < elemNum; ++elemID)
> insn->dst(elemID + elemNum) = dst[elemID];
> +
> + // flagTemp don't need to be put in SelectionVector
> + if (flagTemp)
> + insn->dst(2*elemNum) = *flagTemp;
> + }
> +
> + if (bti.file != GEN_IMMEDIATE_VALUE) {
> + insn->state.flag = 0;
> + insn->state.subFlag = 1;
> }
>
> insn->src(0) = addr;
> - insn->setbti(bti);
> + insn->src(1) = bti;
> +
> insn->extra.elem = elemNum;
>
> dstVector->regNum = elemNum;
> @@ -1269,9 +1296,11 @@ namespace gbe
> void Selection::Opaque::UNTYPED_READ(Reg addr,
> const GenRegister *dst,
> uint32_t elemNum,
> - uint32_t bti)
> + GenRegister bti,
> + GenRegister *flagTemp)
> {
> - SelectionInstruction *insn = this->appendInsn(SEL_OP_UNTYPED_READ, elemNum, 1);
> + unsigned dstNum = flagTemp == NULL ? elemNum : elemNum+1;
> + SelectionInstruction *insn = this->appendInsn(SEL_OP_UNTYPED_READ, dstNum, 2);
> SelectionVector *srcVector = this->appendVector();
> SelectionVector *dstVector = this->appendVector();
> if (this->isScalarReg(dst[0].reg()))
> @@ -1279,8 +1308,16 @@ namespace gbe
> // Regular instruction to encode
> for (uint32_t elemID = 0; elemID < elemNum; ++elemID)
> insn->dst(elemID) = dst[elemID];
> + if (flagTemp)
> + insn->dst(elemNum) = *flagTemp;
> +
> insn->src(0) = addr;
> - insn->setbti(bti);
> + insn->src(1) = bti;
> + if (bti.file != GEN_IMMEDIATE_VALUE) {
> + insn->state.flag = 0;
> + insn->state.subFlag = 1;
> + }
> +
> insn->extra.elem = elemNum;
>
> // Sends require contiguous allocation
> @@ -1297,31 +1334,40 @@ namespace gbe
> const GenRegister *src,
> const GenRegister *tmp,
> uint32_t srcNum,
> - uint32_t bti,
> - bool native_long)
> + GenRegister bti,
> + bool native_long,
> + GenRegister *flagTemp)
> {
> SelectionVector *vector = NULL;
> SelectionInstruction *insn = NULL;
>
> if (!native_long) {
> - insn = this->appendInsn(SEL_OP_WRITE64, 0, srcNum + 1);
> + unsigned dstNum = flagTemp == NULL ? 0 : 1;
> + insn = this->appendInsn(SEL_OP_WRITE64, dstNum, srcNum + 2);
> vector = this->appendVector();
> - // Regular instruction to encode
> + // Register layout:
> + // dst: (flagTemp)
> + // src: addr, srcNum, bti
> insn->src(0) = addr;
> for (uint32_t elemID = 0; elemID < srcNum; ++elemID)
> insn->src(elemID + 1) = src[elemID];
>
> - insn->setbti(bti);
> + insn->src(srcNum+1) = bti;
> + if (flagTemp)
> + insn->dst(0) = *flagTemp;
> insn->extra.elem = srcNum;
>
> vector->regNum = srcNum + 1;
> vector->reg = &insn->src(0);
> vector->isSrc = 1;
> } else { // handle the native long case
> - insn = this->appendInsn(SEL_OP_WRITE64, srcNum, srcNum*2 + 1);
> + unsigned dstNum = flagTemp == NULL ? srcNum : srcNum+1;
> + // Register layout:
> + // dst: srcNum, (flagTemp)
> + // src: srcNum, addr, srcNum, bti.
> + insn = this->appendInsn(SEL_OP_WRITE64, dstNum, srcNum*2 + 2);
> vector = this->appendVector();
>
> - insn->src(0) = addr;
> for (uint32_t elemID = 0; elemID < srcNum; ++elemID)
> insn->src(elemID) = src[elemID];
>
> @@ -1329,33 +1375,50 @@ namespace gbe
> for (uint32_t elemID = 0; elemID < srcNum; ++elemID)
> insn->src(srcNum + 1 + elemID) = tmp[0];
>
> + insn->src(srcNum*2+1) = bti;
> /* We also need to add the tmp reigster to dst, in order
> to avoid the post schedule error . */
> for (uint32_t elemID = 0; elemID < srcNum; ++elemID)
> insn->dst(elemID) = tmp[0];
>
> - insn->setbti(bti);
> + if (flagTemp)
> + insn->dst(srcNum) = *flagTemp;
> insn->extra.elem = srcNum;
>
> vector->regNum = srcNum + 1;
> vector->reg = &insn->src(srcNum);
> vector->isSrc = 1;
> }
> +
> + if (bti.file != GEN_IMMEDIATE_VALUE) {
> + insn->state.flag = 0;
> + insn->state.subFlag = 1;
> + }
> }
>
> void Selection::Opaque::UNTYPED_WRITE(Reg addr,
> const GenRegister *src,
> uint32_t elemNum,
> - uint32_t bti)
> + GenRegister bti,
> + GenRegister *flagTemp)
> {
> - SelectionInstruction *insn = this->appendInsn(SEL_OP_UNTYPED_WRITE, 0, elemNum+1);
> + unsigned dstNum = flagTemp == NULL ? 0 : 1;
> + SelectionInstruction *insn = this->appendInsn(SEL_OP_UNTYPED_WRITE, dstNum, elemNum+2);
> SelectionVector *vector = this->appendVector();
>
> + if (bti.file != GEN_IMMEDIATE_VALUE) {
> + insn->state.flag = 0;
> + insn->state.subFlag = 1;
> + }
> +
> + if (flagTemp) insn->dst(0) = *flagTemp;
> // Regular instruction to encode
> insn->src(0) = addr;
> for (uint32_t elemID = 0; elemID < elemNum; ++elemID)
> insn->src(elemID+1) = src[elemID];
> - insn->setbti(bti);
> + insn->src(elemNum+1) = bti;
> + if (flagTemp)
> + insn->src(elemNum+2) = *flagTemp;
> insn->extra.elem = elemNum;
>
> // Sends require contiguous allocation for the sources
> @@ -1364,17 +1427,26 @@ namespace gbe
> vector->isSrc = 1;
> }
>
> - void Selection::Opaque::BYTE_GATHER(Reg dst, Reg addr, uint32_t elemSize, uint32_t bti) {
> - SelectionInstruction *insn = this->appendInsn(SEL_OP_BYTE_GATHER, 1, 1);
> + void Selection::Opaque::BYTE_GATHER(Reg dst, Reg addr, uint32_t elemSize, GenRegister bti, GenRegister *flagTemp) {
> + unsigned dstNum = flagTemp == NULL ? 1 : 2;
> + SelectionInstruction *insn = this->appendInsn(SEL_OP_BYTE_GATHER, dstNum, 2);
> SelectionVector *srcVector = this->appendVector();
> SelectionVector *dstVector = this->appendVector();
>
> + if (bti.file != GEN_IMMEDIATE_VALUE) {
> + insn->state.flag = 0;
> + insn->state.subFlag = 1;
> + }
> +
> if (this->isScalarReg(dst.reg()))
> insn->state.noMask = 1;
> // Instruction to encode
> insn->src(0) = addr;
> + insn->src(1) = bti;
> insn->dst(0) = dst;
> - insn->setbti(bti);
> + if (flagTemp)
> + insn->dst(1) = *flagTemp;
> +
> insn->extra.elem = elemSize;
>
> // byte gather requires vector in the sense that scalar are not allowed
> @@ -1387,14 +1459,22 @@ namespace gbe
> srcVector->reg = &insn->src(0);
> }
>
> - void Selection::Opaque::BYTE_SCATTER(Reg addr, Reg src, uint32_t elemSize, uint32_t bti) {
> - SelectionInstruction *insn = this->appendInsn(SEL_OP_BYTE_SCATTER, 0, 2);
> + void Selection::Opaque::BYTE_SCATTER(Reg addr, Reg src, uint32_t elemSize, GenRegister bti, GenRegister *flagTemp) {
> + unsigned dstNum = flagTemp == NULL ? 0 : 1;
> + SelectionInstruction *insn = this->appendInsn(SEL_OP_BYTE_SCATTER, dstNum, 3);
> SelectionVector *vector = this->appendVector();
>
> + if (bti.file != GEN_IMMEDIATE_VALUE) {
> + insn->state.flag = 0;
> + insn->state.subFlag = 1;
> + }
> +
> + if (flagTemp)
> + insn->dst(0) = *flagTemp;
> // Instruction to encode
> insn->src(0) = addr;
> insn->src(1) = src;
> - insn->setbti(bti);
> + insn->src(2) = bti;
> insn->extra.elem = elemSize;
>
> // value and address are contiguous in the send
> @@ -3122,34 +3202,24 @@ namespace gbe
> }
> }
>
> - /*! Load instruction pattern */
> - DECL_PATTERN(LoadInstruction)
> + class LoadInstructionPattern : public SelectionPattern
> {
> + public:
> + /*! Register the pattern for all opcodes of the family */
> + LoadInstructionPattern(void) : SelectionPattern(1, 1) {
> + this->opcodes.push_back(ir::OP_LOAD);
> + }
> void readDWord(Selection::Opaque &sel,
> vector<GenRegister> &dst,
> - vector<GenRegister> &dst2,
> GenRegister addr,
> uint32_t valueNum,
> ir::BTI bti) const
> {
> - for (uint32_t x = 0; x < bti.count; x++) {
> - if(x > 0)
> - for (uint32_t dstID = 0; dstID < valueNum; ++dstID)
> - dst2[dstID] = sel.selReg(sel.reg(ir::FAMILY_DWORD), ir::TYPE_U32);
> -
> - GenRegister temp = getRelativeAddress(sel, addr, bti.bti[x]);
> - sel.UNTYPED_READ(temp, dst2.data(), valueNum, bti.bti[x]);
> - if(x > 0) {
> - sel.push();
> - if(sel.isScalarReg(dst[0].reg())) {
> - sel.curr.noMask = 1;
> - sel.curr.execWidth = 1;
> - }
> - for (uint32_t y = 0; y < valueNum; y++)
> - sel.ADD(dst[y], dst[y], dst2[y]);
> - sel.pop();
> - }
> - }
> + //GenRegister temp = getRelativeAddress(sel, addr, sel.selReg(bti.base, ir::TYPE_U32));
> +
> + GenRegister b = bti.isConst ? GenRegister::immud(bti.imm) : sel.selReg(bti.reg, ir::TYPE_U32);
> + GenRegister tmp = sel.selReg(sel.reg(ir::FAMILY_WORD, true), ir::TYPE_U16);
> + sel.UNTYPED_READ(addr, dst.data(), valueNum, b, bti.isConst ? NULL : &tmp);
> }
>
> void emitUntypedRead(Selection::Opaque &sel,
> @@ -3160,10 +3230,9 @@ namespace gbe
> using namespace ir;
> const uint32_t valueNum = insn.getValueNum();
> vector<GenRegister> dst(valueNum);
> - vector<GenRegister> dst2(valueNum);
> for (uint32_t dstID = 0; dstID < valueNum; ++dstID)
> - dst2[dstID] = dst[dstID] = sel.selReg(insn.getValue(dstID), TYPE_U32);
> - readDWord(sel, dst, dst2, addr, valueNum, bti);
> + dst[dstID] = sel.selReg(insn.getValue(dstID), TYPE_U32);
> + readDWord(sel, dst, addr, valueNum, bti);
> }
>
> void emitDWordGather(Selection::Opaque &sel,
> @@ -3172,15 +3241,15 @@ namespace gbe
> ir::BTI bti) const
> {
> using namespace ir;
> - GBE_ASSERT(bti.count == 1);
> - const uint32_t isUniform = sel.isScalarReg(insn.getValue(0));
> + GBE_ASSERT(bti.isConst == 1);
> GBE_ASSERT(insn.getValueNum() == 1);
> + const uint32_t isUniform = sel.isScalarReg(insn.getValue(0));
>
> if(isUniform) {
> GenRegister dst = sel.selReg(insn.getValue(0), ir::TYPE_U32);
> sel.push();
> sel.curr.noMask = 1;
> - sel.SAMPLE(&dst, 1, &addr, 1, bti.bti[0], 0, true, true);
> + sel.SAMPLE(&dst, 1, &addr, 1, bti.imm, 0, true, true);
> sel.pop();
> return;
> }
> @@ -3196,7 +3265,7 @@ namespace gbe
> sel.SHR(addrDW, GenRegister::retype(addr, GEN_TYPE_UD), GenRegister::immud(2));
> sel.pop();
>
> - sel.DWORD_GATHER(dst, addrDW, bti.bti[0]);
> + sel.DWORD_GATHER(dst, addrDW, bti.imm);
> }
>
> void emitRead64(Selection::Opaque &sel,
> @@ -3208,9 +3277,10 @@ namespace gbe
> const uint32_t valueNum = insn.getValueNum();
> /* XXX support scalar only right now. */
> GBE_ASSERT(valueNum == 1);
> - GBE_ASSERT(bti.count == 1);
> + GBE_ASSERT(bti.isConst == 1);
> vector<GenRegister> dst(valueNum);
> - GenRegister tmpAddr = getRelativeAddress(sel, addr, bti.bti[0]);
> + GenRegister b = bti.isConst ? GenRegister::immud(bti.imm) : sel.selReg(bti.reg, ir::TYPE_U32);
> + GenRegister tmpFlag = sel.selReg(sel.reg(ir::FAMILY_WORD, true), ir::TYPE_U16);
> for ( uint32_t dstID = 0; dstID < valueNum; ++dstID)
> dst[dstID] = sel.selReg(insn.getValue(dstID), ir::TYPE_U64);
>
> @@ -3220,9 +3290,9 @@ namespace gbe
> tmp[valueID] = GenRegister::retype(sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64), GEN_TYPE_UL);
> }
>
> - sel.READ64(tmpAddr, dst.data(), tmp.data(), valueNum, bti.bti[0], true);
> + sel.READ64(addr, dst.data(), tmp.data(), valueNum, b, true, bti.isConst ? NULL : &tmpFlag);
> } else {
> - sel.READ64(tmpAddr, dst.data(), NULL, valueNum, bti.bti[0], false);
> + sel.READ64(addr, dst.data(), NULL, valueNum, b, false, bti.isConst ? NULL : &tmpFlag);
> }
> }
>
> @@ -3231,12 +3301,16 @@ namespace gbe
> GenRegister address,
> GenRegister dst,
> bool isUniform,
> - uint8_t bti) const
> + ir::BTI bti) const
> {
> using namespace ir;
> Register tmpReg = sel.reg(FAMILY_DWORD, isUniform);
> GenRegister tmpAddr = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32);
> GenRegister tmpData = sel.selReg(tmpReg, ir::TYPE_U32);
> +
> + GenRegister b = bti.isConst ? GenRegister::immud(bti.imm) : sel.selReg(bti.reg, ir::TYPE_U32);
> + GenRegister tmpFlag = sel.selReg(sel.reg(ir::FAMILY_WORD, true), ir::TYPE_U16);
> +
> // Get dword aligned addr
> sel.push();
> if (isUniform) {
> @@ -3248,7 +3322,7 @@ namespace gbe
> sel.push();
> if (isUniform)
> sel.curr.noMask = 1;
> - sel.UNTYPED_READ(tmpAddr, &tmpData, 1, bti);
> + sel.UNTYPED_READ(tmpAddr, &tmpData, 1, b, bti.isConst ? NULL : &tmpFlag);
>
> if (isUniform)
> sel.curr.execWidth = 1;
> @@ -3284,14 +3358,11 @@ namespace gbe
>
> uint32_t tmpRegNum = (typeSize*valueNum + 3) / 4;
> vector<GenRegister> tmp(tmpRegNum);
> - vector<GenRegister> tmp2(tmpRegNum);
> - vector<Register> tmpReg(tmpRegNum);
> for(uint32_t i = 0; i < tmpRegNum; i++) {
> - tmpReg[i] = sel.reg(FAMILY_DWORD, isUniform);
> - tmp2[i] = tmp[i] = sel.selReg(tmpReg[i], ir::TYPE_U32);
> + tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32);
> }
>
> - readDWord(sel, tmp, tmp2, address, tmpRegNum, bti);
> + readDWord(sel, tmp, address, tmpRegNum, bti);
>
> for(uint32_t i = 0; i < tmpRegNum; i++) {
> unsigned int elemNum = (valueNum - i * (4 / typeSize)) > 4/typeSize ?
> @@ -3396,7 +3467,7 @@ namespace gbe
> sel.ADD(alignedAddr, alignedAddr, GenRegister::immud(pos * 4));
> sel.pop();
> }
> - readDWord(sel, t1, t2, alignedAddr, width, bti);
> + readDWord(sel, t1, alignedAddr, width, bti);
> remainedReg -= width;
> pos += width;
> } while(remainedReg);
> @@ -3415,51 +3486,39 @@ namespace gbe
> GBE_ASSERT(insn.getValueNum() == 1);
> const GenRegister value = sel.selReg(insn.getValue(0), insn.getValueType());
> GBE_ASSERT(elemSize == GEN_BYTE_SCATTER_WORD || elemSize == GEN_BYTE_SCATTER_BYTE);
> - GenRegister tmp = value;
> -
> - for (int x = 0; x < bti.count; x++) {
> - if (x > 0)
> - tmp = sel.selReg(sel.reg(family, isUniform), insn.getValueType());
>
> - GenRegister addr = getRelativeAddress(sel, address, bti.bti[x]);
> - readByteAsDWord(sel, elemSize, addr, tmp, isUniform, bti.bti[x]);
> - if (x > 0) {
> - sel.push();
> - if (isUniform) {
> - sel.curr.noMask = 1;
> - sel.curr.execWidth = 1;
> - }
> - sel.ADD(value, value, tmp);
> - sel.pop();
> - }
> - }
> + readByteAsDWord(sel, elemSize, address, value, isUniform, bti);
> }
> }
>
> - INLINE GenRegister getRelativeAddress(Selection::Opaque &sel, GenRegister address, uint8_t bti) const {
> - if (bti == 0xfe || bti == BTI_CONSTANT)
> - return address;
> -
> - sel.push();
> - sel.curr.noMask = 1;
> - if (GenRegister::hstride_size(address) == 0)
> - sel.curr.execWidth = 1;
> - GenRegister temp = sel.selReg(sel.reg(ir::FAMILY_DWORD, sel.curr.execWidth == 1), ir::TYPE_U32);
> - sel.ADD(temp, address, GenRegister::negate(sel.selReg(sel.ctx.getSurfaceBaseReg(bti), ir::TYPE_U32)));
> - sel.pop();
> - return temp;
> - }
> // check whether all binded table index point to constant memory
> INLINE bool isAllConstant(const ir::BTI &bti) const {
> - for (int x = 0; x < bti.count; x++) {
> - if (bti.bti[x] != BTI_CONSTANT)
> - return false;
> + if (bti.isConst && bti.imm == BTI_CONSTANT)
> + return true;
> + return false;
> + }
> +
> + INLINE ir::BTI getBTI(SelectionDAG &dag, const ir::LoadInstruction &insn) const {
> + using namespace ir;
> + SelectionDAG *child0 = dag.child[0];
> + ir::BTI b;
> + if (insn.isFixedBTI()) {
> + const auto &immInsn = cast<LoadImmInstruction>(child0->insn);
> + const auto imm = immInsn.getImmediate();
> + b.isConst = 1;
> + b.imm = imm.getIntegerValue();
> + } else {
> + b.isConst = 0;
> + b.reg = insn.getBTI();
> }
> - return true;
> + return b;
> }
>
> - INLINE bool emitOne(Selection::Opaque &sel, const ir::LoadInstruction &insn, bool &markChildren) const {
> + /*! Implements base class */
> + virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const
> + {
> using namespace ir;
> + const ir::LoadInstruction &insn = cast<ir::LoadInstruction>(dag.insn);
> GenRegister address = sel.selReg(insn.getAddress(), ir::TYPE_U32);
> GBE_ASSERT(insn.getAddressSpace() == MEM_GLOBAL ||
> insn.getAddressSpace() == MEM_CONSTANT ||
> @@ -3467,9 +3526,11 @@ namespace gbe
> insn.getAddressSpace() == MEM_LOCAL ||
> insn.getAddressSpace() == MEM_MIXED);
> //GBE_ASSERT(sel.isScalarReg(insn.getValue(0)) == false);
> +
> + BTI bti = getBTI(dag, insn);
> +
> const Type type = insn.getValueType();
> const uint32_t elemSize = getByteScatterGatherSize(type);
> - const BTI &bti = insn.getBTI();
> bool allConstant = isAllConstant(bti);
>
> if (allConstant) {
> @@ -3494,65 +3555,79 @@ namespace gbe
> else
> this->emitUnalignedByteGather(sel, insn, elemSize, address, bti);
> }
> +
> +
> + // for fixed bti, don't generate the useless loadi
> + if (insn.isFixedBTI())
> + dag.child[0] = NULL;
> + markAllChildren(dag);
> +
> return true;
> }
> - DECL_CTOR(LoadInstruction, 1, 1);
> };
> -
> - /*! Store instruction pattern */
> - DECL_PATTERN(StoreInstruction)
> + class StoreInstructionPattern : public SelectionPattern
> {
> + public:
> + /*! Register the pattern for all opcodes of the family */
> + StoreInstructionPattern(void) : SelectionPattern(1, 1) {
> + this->opcodes.push_back(ir::OP_STORE);
> + }
> void emitUntypedWrite(Selection::Opaque &sel,
> const ir::StoreInstruction &insn,
> - GenRegister addr,
> - uint32_t bti) const
> + GenRegister address,
> + ir::BTI &bti) const
> {
> using namespace ir;
> const uint32_t valueNum = insn.getValueNum();
> vector<GenRegister> value(valueNum);
> + GenRegister b = bti.isConst ? GenRegister::immud(bti.imm) : sel.selReg(bti.reg, ir::TYPE_U32);
>
> - addr = GenRegister::retype(addr, GEN_TYPE_F);
> for (uint32_t valueID = 0; valueID < valueNum; ++valueID)
> - value[valueID] = GenRegister::retype(sel.selReg(insn.getValue(valueID)), GEN_TYPE_F);
> - sel.UNTYPED_WRITE(addr, value.data(), valueNum, bti);
> + value[valueID] = GenRegister::retype(sel.selReg(insn.getValue(valueID)), GEN_TYPE_UD);
> + GenRegister tmp = sel.selReg(sel.reg(FAMILY_WORD, true), ir::TYPE_U16);
> + sel.UNTYPED_WRITE(address, value.data(), valueNum, b, bti.isConst? NULL : &tmp);
> }
>
> void emitWrite64(Selection::Opaque &sel,
> const ir::StoreInstruction &insn,
> - GenRegister addr,
> - uint32_t bti) const
> + GenRegister address,
> + ir::BTI &bti) const
> {
> using namespace ir;
> const uint32_t valueNum = insn.getValueNum();
> /* XXX support scalar only right now. */
> GBE_ASSERT(valueNum == 1);
> - addr = GenRegister::retype(addr, GEN_TYPE_UD);
> + GenRegister b = bti.isConst ? GenRegister::immud(bti.imm) : sel.selReg(bti.reg, ir::TYPE_U32);
> vector<GenRegister> src(valueNum);
>
> for (uint32_t valueID = 0; valueID < valueNum; ++valueID)
> src[valueID] = sel.selReg(insn.getValue(valueID), ir::TYPE_U64);
>
> + GenRegister tmpFlag = sel.selReg(sel.reg(FAMILY_WORD, true), ir::TYPE_U16);
> +
> if (sel.hasLongType()) {
> vector<GenRegister> tmp(valueNum);
> for (uint32_t valueID = 0; valueID < valueNum; ++valueID) {
> tmp[valueID] = GenRegister::retype(sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64), GEN_TYPE_UL);
> }
> - sel.WRITE64(addr, src.data(), tmp.data(), valueNum, bti, true);
> + sel.WRITE64(address, src.data(), tmp.data(), valueNum, b, true, bti.isConst? NULL : &tmpFlag);
> } else {
> - sel.WRITE64(addr, src.data(), NULL, valueNum, bti, false);
> + sel.WRITE64(address, src.data(), NULL, valueNum, b, false, bti.isConst? NULL : &tmpFlag);
> }
> }
>
> void emitByteScatter(Selection::Opaque &sel,
> const ir::StoreInstruction &insn,
> const uint32_t elemSize,
> - GenRegister addr,
> - uint32_t bti,
> + GenRegister address,
> + ir::BTI &bti,
> bool isUniform) const
> {
> using namespace ir;
> uint32_t valueNum = insn.getValueNum();
>
> + GenRegister b = bti.isConst ? GenRegister::immud(bti.imm) : sel.selReg(bti.reg, ir::TYPE_U32);
> + GenRegister tmpFlag = sel.selReg(sel.reg(FAMILY_WORD, true), ir::TYPE_U16);
> if(valueNum > 1) {
> const uint32_t typeSize = getFamilySize(getFamily(insn.getValueType()));
> vector<GenRegister> value(valueNum);
> @@ -3572,11 +3647,12 @@ namespace gbe
> sel.PACK_BYTE(tmp[i], value.data() + i * 4/typeSize, typeSize, 4/typeSize);
> }
>
> - sel.UNTYPED_WRITE(addr, tmp.data(), tmpRegNum, bti);
> + sel.UNTYPED_WRITE(address, tmp.data(), tmpRegNum, b, bti.isConst ? NULL : &tmpFlag);
> } else {
> const GenRegister value = sel.selReg(insn.getValue(0));
> GBE_ASSERT(insn.getValueNum() == 1);
> const GenRegister tmp = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32);
> +
> sel.push();
> if (isUniform) {
> sel.curr.noMask = 1;
> @@ -3588,47 +3664,52 @@ namespace gbe
> else if (elemSize == GEN_BYTE_SCATTER_BYTE)
> sel.MOV(tmp, GenRegister::retype(value, GEN_TYPE_UB));
> sel.pop();
> - sel.BYTE_SCATTER(addr, tmp, elemSize, bti);
> + sel.BYTE_SCATTER(address, tmp, elemSize, b, bti.isConst ? NULL : &tmpFlag);
> }
> }
>
> - INLINE GenRegister getRelativeAddress(Selection::Opaque &sel, GenRegister address, uint8_t bti, bool isUniform) const {
> - if(bti == 0xfe)
> - return address;
>
> - sel.push();
> - sel.curr.noMask = 1;
> - if (isUniform)
> - sel.curr.execWidth = 1;
> - GenRegister temp = sel.selReg(sel.reg(ir::FAMILY_DWORD, isUniform), ir::TYPE_U32);
> - sel.ADD(temp, address, GenRegister::negate(sel.selReg(sel.ctx.getSurfaceBaseReg(bti), ir::TYPE_U32)));
> - sel.pop();
> - return temp;
> + INLINE ir::BTI getBTI(SelectionDAG &dag, const ir::StoreInstruction &insn) const {
> + using namespace ir;
> + SelectionDAG *child0 = dag.child[0];
> + ir::BTI b;
> + if (insn.isFixedBTI()) {
> + const auto &immInsn = cast<LoadImmInstruction>(child0->insn);
> + const auto imm = immInsn.getImmediate();
> + b.isConst = 1;
> + b.imm = imm.getIntegerValue();
> + } else {
> + b.isConst = 0;
> + b.reg = insn.getBTI();
> + }
> + return b;
> }
> -
> - INLINE bool emitOne(Selection::Opaque &sel, const ir::StoreInstruction &insn, bool &markChildren) const
> + virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const
> {
> using namespace ir;
> + const ir::StoreInstruction &insn = cast<ir::StoreInstruction>(dag.insn);
> + GenRegister address = sel.selReg(insn.getAddress(), ir::TYPE_U32);
> const Type type = insn.getValueType();
> const uint32_t elemSize = getByteScatterGatherSize(type);
> - GenRegister address = sel.selReg(insn.getAddress(), ir::TYPE_U32);
>
> const bool isUniform = sel.isScalarReg(insn.getAddress()) && sel.isScalarReg(insn.getValue(0));
> + BTI bti = getBTI(dag, insn);
>
> - BTI bti = insn.getBTI();
> - for (int x = 0; x < bti.count; x++) {
> - GenRegister temp = getRelativeAddress(sel, address, bti.bti[x], isUniform);
> - if (insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_QWORD)
> - this->emitWrite64(sel, insn, temp, bti.bti[x]);
> - else if (insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_DWORD)
> - this->emitUntypedWrite(sel, insn, temp, bti.bti[x]);
> - else {
> - this->emitByteScatter(sel, insn, elemSize, temp, bti.bti[x], isUniform);
> - }
> + if (insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_QWORD)
> + this->emitWrite64(sel, insn, address, bti);
> + else if (insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_DWORD)
> + this->emitUntypedWrite(sel, insn, address, bti);
> + else {
> + this->emitByteScatter(sel, insn, elemSize, address, bti, isUniform);
> }
> +
> + // for fixed bti, don't generate the useless loadi
> + if (insn.isFixedBTI())
> + dag.child[0] = NULL;
> + markAllChildren(dag);
> +
> return true;
> }
> - DECL_CTOR(StoreInstruction, 1, 1);
> };
>
> /*! Compare instruction pattern */
> @@ -4226,38 +4307,61 @@ namespace gbe
> DECL_CTOR(ConvertInstruction, 1, 1);
> };
>
> - /*! Convert instruction pattern */
> - DECL_PATTERN(AtomicInstruction)
> + /*! atomic instruction pattern */
> + class AtomicInstructionPattern : public SelectionPattern
> {
> - INLINE bool emitOne(Selection::Opaque &sel, const ir::AtomicInstruction &insn, bool &markChildren) const
> - {
> + public:
> + AtomicInstructionPattern(void) : SelectionPattern(1,1) {
> + for (uint32_t op = 0; op < ir::OP_INVALID; ++op)
> + if (ir::isOpcodeFrom<ir::AtomicInstruction>(ir::Opcode(op)) == true)
> + this->opcodes.push_back(ir::Opcode(op));
> + }
> +
> + INLINE ir::BTI getBTI(SelectionDAG &dag, const ir::AtomicInstruction &insn) const {
> + using namespace ir;
> + SelectionDAG *child0 = dag.child[0];
> + ir::BTI b;
> + if (insn.isFixedBTI()) {
> + const auto &immInsn = cast<LoadImmInstruction>(child0->insn);
> + const auto imm = immInsn.getImmediate();
> + b.isConst = 1;
> + b.imm = imm.getIntegerValue();
> + } else {
> + b.isConst = 0;
> + b.reg = insn.getBTI();
> + }
> + return b;
> + }
> +
> + INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const {
> using namespace ir;
> + const ir::AtomicInstruction &insn = cast<ir::AtomicInstruction>(dag.insn);
> +
> + ir::BTI b = getBTI(dag, insn);
> const AtomicOps atomicOp = insn.getAtomicOpcode();
> - const AddressSpace space = insn.getAddressSpace();
> - const uint32_t srcNum = insn.getSrcNum();
> + unsigned srcNum = insn.getSrcNum();
> + unsigned opNum = srcNum - 1;
>
> - GenRegister src0 = sel.selReg(insn.getSrc(0), TYPE_U32); //address
> - GenRegister src1 = src0, src2 = src0;
> - if(srcNum > 1) src1 = sel.selReg(insn.getSrc(1), TYPE_U32);
> - if(srcNum > 2) src2 = sel.selReg(insn.getSrc(2), TYPE_U32);
> GenRegister dst = sel.selReg(insn.getDst(0), TYPE_U32);
> + GenRegister bti = b.isConst ? GenRegister::immud(b.imm) : sel.selReg(b.reg, ir::TYPE_U32);
> + GenRegister src0 = sel.selReg(insn.getSrc(1), TYPE_U32); //address
> + GenRegister src1 = src0, src2 = src0;
> + if(srcNum > 2) src1 = sel.selReg(insn.getSrc(2), TYPE_U32);
> + if(srcNum > 3) src2 = sel.selReg(insn.getSrc(3), TYPE_U32);
> +
> + GenRegister flagTemp = sel.selReg(sel.reg(FAMILY_WORD, true), TYPE_U16);
> +
> GenAtomicOpCode genAtomicOp = (GenAtomicOpCode)atomicOp;
> - if(space == MEM_LOCAL) {
> - sel.ATOMIC(dst, genAtomicOp, srcNum, src0, src1, src2, 0xfe);
> - } else {
> - ir::BTI b = insn.getBTI();
> - for (int x = 0; x < b.count; x++) {
> - sel.push();
> - sel.curr.noMask = 1;
> - GenRegister temp = sel.selReg(sel.reg(FAMILY_DWORD), ir::TYPE_U32);
> - sel.ADD(temp, src0, GenRegister::negate(sel.selReg(sel.ctx.getSurfaceBaseReg(b.bti[x]), ir::TYPE_U32)));
> - sel.pop();
> - sel.ATOMIC(dst, genAtomicOp, srcNum, temp, src1, src2, b.bti[x]);
> - }
> - }
> +
> + sel.ATOMIC(dst, genAtomicOp, opNum, src0, src1, src2, bti, b.isConst ? NULL : &flagTemp);
> +
> + // for fixed bti, don't generate the useless loadi
> + if (insn.isFixedBTI())
> + dag.child[0] = NULL;
> + markAllChildren(dag);
> +
> return true;
> }
> - DECL_CTOR(AtomicInstruction, 1, 1);
> };
>
> /*! Select instruction pattern */
> diff --git a/backend/src/backend/gen_insn_selection.hpp b/backend/src/backend/gen_insn_selection.hpp
> index 2262ef9..8c6caac 100644
> --- a/backend/src/backend/gen_insn_selection.hpp
> +++ b/backend/src/backend/gen_insn_selection.hpp
> @@ -100,7 +100,7 @@ namespace gbe
> struct {
> /*! Store bti for loads/stores and function for math, atomic and compares */
> uint16_t function:8;
> - /*! elemSize for byte scatters / gathers, elemNum for untyped msg, bti for atomic */
> + /*! elemSize for byte scatters / gathers, elemNum for untyped msg, operand number for atomic */
> uint16_t elem:8;
> };
> struct {
> @@ -150,14 +150,7 @@ namespace gbe
> INLINE uint32_t getbti() const {
> GBE_ASSERT(isRead() || isWrite());
> switch (opcode) {
> - case SEL_OP_ATOMIC: return extra.elem;
> - case SEL_OP_BYTE_SCATTER:
> - case SEL_OP_WRITE64:
> - case SEL_OP_DWORD_GATHER:
> - case SEL_OP_UNTYPED_WRITE:
> - case SEL_OP_UNTYPED_READ:
> - case SEL_OP_BYTE_GATHER:
> - case SEL_OP_READ64: return extra.function;
> + case SEL_OP_DWORD_GATHER: return extra.function;
> case SEL_OP_SAMPLE: return extra.rdbti;
> case SEL_OP_TYPED_WRITE: return extra.bti;
> default:
> @@ -169,14 +162,7 @@ namespace gbe
> INLINE void setbti(uint32_t bti) {
> GBE_ASSERT(isRead() || isWrite());
> switch (opcode) {
> - case SEL_OP_ATOMIC: extra.elem = bti; return;
> - case SEL_OP_BYTE_SCATTER:
> - case SEL_OP_WRITE64:
> - case SEL_OP_UNTYPED_WRITE:
> - case SEL_OP_DWORD_GATHER:
> - case SEL_OP_UNTYPED_READ:
> - case SEL_OP_BYTE_GATHER:
> - case SEL_OP_READ64: extra.function = bti; return;
> + case SEL_OP_DWORD_GATHER: extra.function = bti; return;
> case SEL_OP_SAMPLE: extra.rdbti = bti; return;
> case SEL_OP_TYPED_WRITE: extra.bti = bti; return;
> default:
> diff --git a/backend/src/backend/program.h b/backend/src/backend/program.h
> index 8c171f5..3637ebb 100644
> --- a/backend/src/backend/program.h
> +++ b/backend/src/backend/program.h
> @@ -103,6 +103,7 @@ enum gbe_curbe_type {
> GBE_CURBE_ONE,
> GBE_CURBE_LANE_ID,
> GBE_CURBE_SLM_OFFSET,
> + GBE_CURBE_BTI_UTIL,
> };
>
> /*! Extra arguments use the negative range of sub-values */
> diff --git a/backend/src/ir/context.hpp b/backend/src/ir/context.hpp
> index af65ff3..54265d0 100644
> --- a/backend/src/ir/context.hpp
> +++ b/backend/src/ir/context.hpp
> @@ -190,22 +190,22 @@ namespace ir {
>
> /*! LOAD with the destinations directly specified */
> template <typename... Args>
> - void LOAD(Type type, Register offset, AddressSpace space, bool dwAligned, BTI bti, Args...values)
> + void LOAD(Type type, Register offset, AddressSpace space, bool dwAligned, bool fixedBTI, Register bti, Args...values)
> {
> const Tuple index = this->tuple(values...);
> const uint16_t valueNum = std::tuple_size<std::tuple<Args...>>::value;
> GBE_ASSERT(valueNum > 0);
> - this->LOAD(type, index, offset, space, valueNum, dwAligned, bti);
> + this->LOAD(type, index, offset, space, valueNum, dwAligned, fixedBTI, bti);
> }
>
> /*! STORE with the sources directly specified */
> template <typename... Args>
> - void STORE(Type type, Register offset, AddressSpace space, bool dwAligned, BTI bti, Args...values)
> + void STORE(Type type, Register offset, AddressSpace space, bool dwAligned, bool fixedBTI, Register bti, Args...values)
> {
> const Tuple index = this->tuple(values...);
> const uint16_t valueNum = std::tuple_size<std::tuple<Args...>>::value;
> GBE_ASSERT(valueNum > 0);
> - this->STORE(type, index, offset, space, valueNum, dwAligned, bti);
> + this->STORE(type, index, offset, space, valueNum, dwAligned, fixedBTI, bti);
> }
> void appendSurface(uint8_t bti, Register reg) { fn->appendSurface(bti, reg); }
>
> diff --git a/backend/src/ir/instruction.cpp b/backend/src/ir/instruction.cpp
> index 784ae9c..e2c4a14 100644
> --- a/backend/src/ir/instruction.cpp
> +++ b/backend/src/ir/instruction.cpp
> @@ -318,14 +318,14 @@ namespace ir {
>
> class ALIGNED_INSTRUCTION AtomicInstruction :
> public BasePolicy,
> - public TupleSrcPolicy<AtomicInstruction>,
> public NDstPolicy<AtomicInstruction, 1>
> {
> public:
> AtomicInstruction(AtomicOps atomicOp,
> Register dst,
> AddressSpace addrSpace,
> - BTI bti,
> + Register bti,
> + bool fixedBTI,
> Tuple src)
> {
> this->opcode = OP_ATOMIC;
> @@ -334,23 +334,43 @@ namespace ir {
> this->src = src;
> this->addrSpace = addrSpace;
> this->bti = bti;
> + this->fixedBTI = fixedBTI ? 1: 0;
> srcNum = 2;
> if((atomicOp == ATOMIC_OP_INC) ||
> (atomicOp == ATOMIC_OP_DEC))
> srcNum = 1;
> if(atomicOp == ATOMIC_OP_CMPXCHG)
> srcNum = 3;
> + srcNum++;
> }
> + INLINE Register getSrc(const Function &fn, uint32_t ID) const {
> + GBE_ASSERTM(ID < srcNum, "Out-of-bound source register for atomic");
> + if (ID == 0u)
> + return bti;
> + else
> + return fn.getRegister(src, ID -1);
> + }
> + INLINE void setSrc(Function &fn, uint32_t ID, Register reg) {
> + GBE_ASSERTM(ID < srcNum, "Out-of-bound source register for atomic");
> + if (ID == 0u)
> + bti = reg;
> + else
> + fn.setRegister(src, ID - 1, reg);
> + }
> + INLINE uint32_t getSrcNum(void) const { return srcNum; }
> +
> INLINE AddressSpace getAddressSpace(void) const { return this->addrSpace; }
> - INLINE BTI getBTI(void) const { return bti; }
> + INLINE Register getBTI(void) const { return bti; }
> + INLINE bool isFixedBTI(void) const { return !!fixedBTI; }
> INLINE AtomicOps getAtomicOpcode(void) const { return this->atomicOp; }
> INLINE bool wellFormed(const Function &fn, std::string &whyNot) const;
> INLINE void out(std::ostream &out, const Function &fn) const;
> Register dst[1];
> Tuple src;
> AddressSpace addrSpace; //!< Address space
> - BTI bti; //!< bti
> - uint8_t srcNum:2; //!<Source Number
> + Register bti; //!< bti
> + uint8_t fixedBTI:1; //!< fixed bti or not
> + uint8_t srcNum:3; //!<Source Number
> AtomicOps atomicOp:6; //!<Source Number
> };
>
> @@ -410,7 +430,7 @@ namespace ir {
>
> class ALIGNED_INSTRUCTION LoadInstruction :
> public BasePolicy,
> - public NSrcPolicy<LoadInstruction, 1>
> + public NSrcPolicy<LoadInstruction, 2>
> {
> public:
> LoadInstruction(Type type,
> @@ -419,7 +439,8 @@ namespace ir {
> AddressSpace addrSpace,
> uint32_t valueNum,
> bool dwAligned,
> - BTI bti)
> + bool fixedBTI,
> + Register bti)
> {
> GBE_ASSERT(valueNum < 128);
> this->opcode = OP_LOAD;
> @@ -429,6 +450,7 @@ namespace ir {
> this->addrSpace = addrSpace;
> this->valueNum = valueNum;
> this->dwAligned = dwAligned ? 1 : 0;
> + this->fixedBTI = fixedBTI ? 1 : 0;
> this->bti = bti;
> }
> INLINE Register getDst(const Function &fn, uint32_t ID) const {
> @@ -443,16 +465,18 @@ namespace ir {
> INLINE Type getValueType(void) const { return type; }
> INLINE uint32_t getValueNum(void) const { return valueNum; }
> INLINE AddressSpace getAddressSpace(void) const { return addrSpace; }
> - INLINE BTI getBTI(void) const { return bti; }
> + INLINE Register getBTI(void) const { return bti; }
> INLINE bool wellFormed(const Function &fn, std::string &why) const;
> INLINE void out(std::ostream &out, const Function &fn) const;
> INLINE bool isAligned(void) const { return !!dwAligned; }
> + INLINE bool isFixedBTI(void) const { return !!fixedBTI; }
> Type type; //!< Type to store
> Register src[0]; //!< Address where to load from
> + Register bti;
> Register offset; //!< Alias to make it similar to store
> Tuple values; //!< Values to load
> AddressSpace addrSpace; //!< Where to load
> - BTI bti;
> + uint8_t fixedBTI:1;
> uint8_t valueNum:7; //!< Number of values to load
> uint8_t dwAligned:1; //!< DWORD aligned is what matters with GEN
> };
> @@ -467,7 +491,8 @@ namespace ir {
> AddressSpace addrSpace,
> uint32_t valueNum,
> bool dwAligned,
> - BTI bti)
> + bool fixedBTI,
> + Register bti)
> {
> GBE_ASSERT(valueNum < 255);
> this->opcode = OP_STORE;
> @@ -477,35 +502,42 @@ namespace ir {
> this->addrSpace = addrSpace;
> this->valueNum = valueNum;
> this->dwAligned = dwAligned ? 1 : 0;
> + this->fixedBTI = fixedBTI ? 1 : 0;
> this->bti = bti;
> }
> INLINE Register getSrc(const Function &fn, uint32_t ID) const {
> - GBE_ASSERTM(ID < valueNum + 1u, "Out-of-bound source register for store");
> + GBE_ASSERTM(ID < valueNum + 2u, "Out-of-bound source register for store");
> if (ID == 0u)
> + return bti;
> + else if (ID == 1u)
> return offset;
> else
> - return fn.getRegister(values, ID - 1);
> + return fn.getRegister(values, ID - 2);
> }
> INLINE void setSrc(Function &fn, uint32_t ID, Register reg) {
> - GBE_ASSERTM(ID < valueNum + 1u, "Out-of-bound source register for store");
> + GBE_ASSERTM(ID < valueNum + 2u, "Out-of-bound source register for store");
> if (ID == 0u)
> + bti = reg;
> + else if (ID == 1u)
> offset = reg;
> else
> - fn.setRegister(values, ID - 1, reg);
> + fn.setRegister(values, ID - 2, reg);
> }
> - INLINE uint32_t getSrcNum(void) const { return valueNum + 1u; }
> + INLINE uint32_t getSrcNum(void) const { return valueNum + 2u; }
> INLINE uint32_t getValueNum(void) const { return valueNum; }
> INLINE Type getValueType(void) const { return type; }
> INLINE AddressSpace getAddressSpace(void) const { return addrSpace; }
> - INLINE BTI getBTI(void) const { return bti; }
> + INLINE Register getBTI(void) const { return bti; }
> INLINE bool wellFormed(const Function &fn, std::string &why) const;
> INLINE void out(std::ostream &out, const Function &fn) const;
> INLINE bool isAligned(void) const { return !!dwAligned; }
> + INLINE bool isFixedBTI(void) const { return !!fixedBTI; }
> Type type; //!< Type to store
> + Register bti;
> Register offset; //!< First source is the offset where to store
> Tuple values; //!< Values to store
> AddressSpace addrSpace; //!< Where to store
> - BTI bti; //!< Which btis need access
> + uint8_t fixedBTI:1; //!< Which btis need access
> uint8_t valueNum:7; //!< Number of values to store
> uint8_t dwAligned:1; //!< DWORD aligned is what matters with GEN
> Register dst[0]; //!< No destination
> @@ -985,10 +1017,12 @@ namespace ir {
> return false;
> if (UNLIKELY(checkRegisterData(FAMILY_DWORD, dst[0], fn, whyNot) == false))
> return false;
> - for (uint32_t srcID = 0; srcID < srcNum; ++srcID)
> - if (UNLIKELY(checkRegisterData(FAMILY_DWORD, getSrc(fn, srcID), fn, whyNot) == false))
> + for (uint32_t srcID = 0; srcID < srcNum-1u; ++srcID)
> + if (UNLIKELY(checkRegisterData(FAMILY_DWORD, getSrc(fn, srcID+1u), fn, whyNot) == false))
> return false;
>
> + if (UNLIKELY(checkRegisterData(FAMILY_DWORD, bti, fn, whyNot) == false))
> + return false;
> return true;
> }
>
> @@ -1199,12 +1233,10 @@ namespace ir {
> this->outOpcode(out);
> out << "." << addrSpace;
> out << " %" << this->getDst(fn, 0);
> - out << " {" << "%" << this->getSrc(fn, 0) << "}";
> - for (uint32_t i = 1; i < srcNum; ++i)
> + out << " {" << "%" << this->getSrc(fn, 1) << "}";
> + for (uint32_t i = 2; i < srcNum; ++i)
> out << " %" << this->getSrc(fn, i);
> - out << " bti";
> - for (uint32_t i = 0; i < bti.count; ++i)
> - out << ": " << (int)bti.bti[i];
> + out << (fixedBTI ? " bti" : " bti(mixed)") << " %" << this->getBTI();
> }
>
>
> @@ -1238,22 +1270,18 @@ namespace ir {
> for (uint32_t i = 0; i < valueNum; ++i)
> out << "%" << this->getDst(fn, i) << (i != (valueNum-1u) ? " " : "");
> out << "}";
> - out << " %" << this->getSrc(fn, 0);
> - out << " bti";
> - for (uint32_t i = 0; i < bti.count; ++i)
> - out << ": " << (int)bti.bti[i];
> + out << " %" << this->getSrc(fn, 1);
> + out << (fixedBTI ? " bti" : " bti(mixed)") << " %" << this->getBTI();
> }
>
> INLINE void StoreInstruction::out(std::ostream &out, const Function &fn) const {
> this->outOpcode(out);
> out << "." << type << "." << addrSpace << (dwAligned ? "." : ".un") << "aligned";
> - out << " %" << this->getSrc(fn, 0) << " {";
> + out << " %" << this->getSrc(fn, 1) << " {";
> for (uint32_t i = 0; i < valueNum; ++i)
> - out << "%" << this->getSrc(fn, i+1) << (i != (valueNum-1u) ? " " : "");
> + out << "%" << this->getSrc(fn, i+2) << (i != (valueNum-1u) ? " " : "");
> out << "}";
> - out << " bti";
> - for (uint32_t i = 0; i < bti.count; ++i)
> - out << ": " << (int)bti.bti[i];
> + out << (fixedBTI ? " bti" : " bti(mixed)") << " %" << this->getBTI();
> }
>
> INLINE void ReadARFInstruction::out(std::ostream &out, const Function &fn) const {
> @@ -1604,18 +1632,18 @@ DECL_MEM_FN(BitCastInstruction, Type, getDstType(void), getDstType())
> DECL_MEM_FN(ConvertInstruction, Type, getSrcType(void), getSrcType())
> DECL_MEM_FN(ConvertInstruction, Type, getDstType(void), getDstType())
> DECL_MEM_FN(AtomicInstruction, AddressSpace, getAddressSpace(void), getAddressSpace())
> -DECL_MEM_FN(AtomicInstruction, BTI, getBTI(void), getBTI())
> DECL_MEM_FN(AtomicInstruction, AtomicOps, getAtomicOpcode(void), getAtomicOpcode())
> +DECL_MEM_FN(AtomicInstruction, bool, isFixedBTI(void), isFixedBTI())
> DECL_MEM_FN(StoreInstruction, Type, getValueType(void), getValueType())
> DECL_MEM_FN(StoreInstruction, uint32_t, getValueNum(void), getValueNum())
> DECL_MEM_FN(StoreInstruction, AddressSpace, getAddressSpace(void), getAddressSpace())
> -DECL_MEM_FN(StoreInstruction, BTI, getBTI(void), getBTI())
> DECL_MEM_FN(StoreInstruction, bool, isAligned(void), isAligned())
> +DECL_MEM_FN(StoreInstruction, bool, isFixedBTI(void), isFixedBTI())
> DECL_MEM_FN(LoadInstruction, Type, getValueType(void), getValueType())
> DECL_MEM_FN(LoadInstruction, uint32_t, getValueNum(void), getValueNum())
> DECL_MEM_FN(LoadInstruction, AddressSpace, getAddressSpace(void), getAddressSpace())
> -DECL_MEM_FN(LoadInstruction, BTI, getBTI(void), getBTI())
> DECL_MEM_FN(LoadInstruction, bool, isAligned(void), isAligned())
> +DECL_MEM_FN(LoadInstruction, bool, isFixedBTI(void), isFixedBTI())
> DECL_MEM_FN(LoadImmInstruction, Type, getType(void), getType())
> DECL_MEM_FN(LabelInstruction, LabelIndex, getLabelIndex(void), getLabelIndex())
> DECL_MEM_FN(BranchInstruction, bool, isPredicated(void), isPredicated())
> @@ -1782,8 +1810,8 @@ DECL_MEM_FN(GetImageInfoInstruction, uint8_t, getImageIndex(void), getImageIndex
> }
>
> // For all unary functions with given opcode
> - Instruction ATOMIC(AtomicOps atomicOp, Register dst, AddressSpace space, BTI bti, Tuple src) {
> - return internal::AtomicInstruction(atomicOp, dst, space, bti, src).convert();
> + Instruction ATOMIC(AtomicOps atomicOp, Register dst, AddressSpace space, Register bti, bool fixedBTI, Tuple src) {
> + return internal::AtomicInstruction(atomicOp, dst, space, bti, fixedBTI, src).convert();
> }
>
> // BRA
> @@ -1831,9 +1859,10 @@ DECL_MEM_FN(GetImageInfoInstruction, uint8_t, getImageIndex(void), getImageIndex
> AddressSpace space, \
> uint32_t valueNum, \
> bool dwAligned, \
> - BTI bti) \
> + bool fixedBTI, \
> + Register bti) \
> { \
> - return internal::CLASS(type,tuple,offset,space,valueNum,dwAligned,bti).convert(); \
> + return internal::CLASS(type,tuple,offset,space,valueNum,dwAligned,fixedBTI,bti).convert(); \
> }
>
> DECL_EMIT_FUNCTION(LOAD, LoadInstruction)
> diff --git a/backend/src/ir/instruction.hpp b/backend/src/ir/instruction.hpp
> index 343d12a..ec4d00d 100644
> --- a/backend/src/ir/instruction.hpp
> +++ b/backend/src/ir/instruction.hpp
> @@ -36,10 +36,13 @@
> namespace gbe {
> namespace ir {
> struct BTI {
> - uint8_t bti[MAX_MIXED_POINTER];
> - uint8_t count;
> - BTI() : count(0) {
> - memset(bti, 0, MAX_MIXED_POINTER);
> + uint8_t isConst; // whether fixed bti
> + union {
> + Register reg; // mixed reg
> + unsigned short imm; // fixed bti
> + };
> +
> + BTI() : isConst(0) {
> }
> ~BTI() {}
> };
> @@ -289,10 +292,12 @@ namespace ir {
> class AtomicInstruction : public Instruction {
> public:
> /*! Where the address register goes */
> - static const uint32_t addressIndex = 0;
> + static const uint32_t btiIndex = 0;
> + static const uint32_t addressIndex = 1;
> /*! Address space that is manipulated here */
> AddressSpace getAddressSpace(void) const;
> - BTI getBTI(void) const;
> + Register getBTI(void) const { return this->getSrc(btiIndex); }
> + bool isFixedBTI(void) const;
> /*! Return the atomic function code */
> AtomicOps getAtomicOpcode(void) const;
> /*! Return the register that contains the addresses */
> @@ -307,12 +312,14 @@ namespace ir {
> class StoreInstruction : public Instruction {
> public:
> /*! Where the address register goes */
> - static const uint32_t addressIndex = 0;
> + static const uint32_t btiIndex = 0;
> + static const uint32_t addressIndex = 1;
> /*! Return the types of the values to store */
> Type getValueType(void) const;
> /*! Give the number of values the instruction is storing (srcNum-1) */
> uint32_t getValueNum(void) const;
> - BTI getBTI(void) const;
> + Register getBTI(void) const { return this->getSrc(btiIndex); }
> + bool isFixedBTI(void) const;
> /*! Address space that is manipulated here */
> AddressSpace getAddressSpace(void) const;
> /*! DWORD aligned means untyped read for Gen. That is what matters */
> @@ -322,7 +329,7 @@ namespace ir {
> /*! Return the register that contain value valueID */
> INLINE Register getValue(uint32_t valueID) const {
> GBE_ASSERT(valueID < this->getValueNum());
> - return this->getSrc(valueID + 1u);
> + return this->getSrc(valueID + 2u);
> }
> /*! Return true if the given instruction is an instance of this class */
> static bool isClassOf(const Instruction &insn);
> @@ -343,8 +350,9 @@ namespace ir {
> /*! DWORD aligned means untyped read for Gen. That is what matters */
> bool isAligned(void) const;
> /*! Return the register that contains the addresses */
> - INLINE Register getAddress(void) const { return this->getSrc(0u); }
> - BTI getBTI(void) const;
> + INLINE Register getAddress(void) const { return this->getSrc(1u); }
> + Register getBTI(void) const {return this->getSrc(0u);}
> + bool isFixedBTI(void) const;
> /*! Return the register that contain value valueID */
> INLINE Register getValue(uint32_t valueID) const {
> return this->getDst(valueID);
> @@ -708,7 +716,7 @@ namespace ir {
> /*! F32TO16.{dstType <- srcType} dst src */
> Instruction F32TO16(Type dstType, Type srcType, Register dst, Register src);
> /*! atomic dst addr.space {src1 {src2}} */
> - Instruction ATOMIC(AtomicOps opcode, Register dst, AddressSpace space, BTI bti, Tuple src);
> + Instruction ATOMIC(AtomicOps opcode, Register dst, AddressSpace space, Register bti, bool fixedBTI, Tuple src);
> /*! bra labelIndex */
> Instruction BRA(LabelIndex labelIndex);
> /*! (pred) bra labelIndex */
> @@ -724,9 +732,9 @@ namespace ir {
> /*! ret */
> Instruction RET(void);
> /*! load.type.space {dst1,...,dst_valueNum} offset value */
> - Instruction LOAD(Type type, Tuple dst, Register offset, AddressSpace space, uint32_t valueNum, bool dwAligned, BTI bti);
> + Instruction LOAD(Type type, Tuple dst, Register offset, AddressSpace space, uint32_t valueNum, bool dwAligned, bool fixedBTI, Register bti);
> /*! store.type.space offset {src1,...,src_valueNum} value */
> - Instruction STORE(Type type, Tuple src, Register offset, AddressSpace space, uint32_t valueNum, bool dwAligned, BTI bti);
> + Instruction STORE(Type type, Tuple src, Register offset, AddressSpace space, uint32_t valueNum, bool dwAligned, bool fixedBTI, Register bti);
> /*! loadi.type dst value */
> Instruction LOADI(Type type, Register dst, ImmediateIndex value);
> /*! sync.params... (see Sync instruction) */
> diff --git a/backend/src/ir/profile.cpp b/backend/src/ir/profile.cpp
> index 2f6539a..af9f698 100644
> --- a/backend/src/ir/profile.cpp
> +++ b/backend/src/ir/profile.cpp
> @@ -45,7 +45,8 @@ namespace ir {
> "printf_buffer_pointer", "printf_index_buffer_pointer",
> "dwblockip",
> "lane_id",
> - "invalid"
> + "invalid",
> + "bti_utility"
> };
>
> #if GBE_DEBUG
> @@ -91,6 +92,7 @@ namespace ir {
> DECL_NEW_REG(FAMILY_DWORD, dwblockip, 0);
> DECL_NEW_REG(FAMILY_DWORD, laneid, 0);
> DECL_NEW_REG(FAMILY_DWORD, invalid, 1);
> + DECL_NEW_REG(FAMILY_DWORD, btiUtil, 1);
> }
> #undef DECL_NEW_REG
>
> diff --git a/backend/src/ir/profile.hpp b/backend/src/ir/profile.hpp
> index 4de6fe0..9323824 100644
> --- a/backend/src/ir/profile.hpp
> +++ b/backend/src/ir/profile.hpp
> @@ -74,7 +74,8 @@ namespace ir {
> static const Register dwblockip = Register(30); // blockip
> static const Register laneid = Register(31); // lane id.
> static const Register invalid = Register(32); // used for valid comparation.
> - static const uint32_t regNum = 33; // number of special registers
> + static const Register btiUtil = Register(33); // used for mixed pointer as bti utility.
> + static const uint32_t regNum = 34; // number of special registers
> extern const char *specialRegMean[]; // special register name.
> } /* namespace ocl */
>
> diff --git a/backend/src/llvm/llvm_gen_backend.cpp b/backend/src/llvm/llvm_gen_backend.cpp
> index 61b66b6..aec04fb 100644
> --- a/backend/src/llvm/llvm_gen_backend.cpp
> +++ b/backend/src/llvm/llvm_gen_backend.cpp
> @@ -87,6 +87,7 @@
> #endif /* LLVM_VERSION_MINOR <= 2 */
> #include "llvm/Pass.h"
> #include "llvm/PassManager.h"
> +#include "llvm/IR/IRBuilder.h"
> #if LLVM_VERSION_MINOR <= 2
> #include "llvm/Intrinsics.h"
> #include "llvm/IntrinsicInst.h"
> @@ -290,11 +291,8 @@ namespace gbe
> return ir::MEM_GLOBAL;
> }
>
> - static INLINE ir::AddressSpace btiToGen(const ir::BTI &bti) {
> - if (bti.count > 1)
> - return ir::MEM_MIXED;
> - uint8_t singleBti = bti.bti[0];
> - switch (singleBti) {
> + static INLINE ir::AddressSpace btiToGen(const unsigned bti) {
> + switch (bti) {
> case BTI_CONSTANT: return ir::MEM_CONSTANT;
> case BTI_PRIVATE: return ir::MEM_PRIVATE;
> case BTI_LOCAL: return ir::MEM_LOCAL;
> @@ -485,7 +483,14 @@ namespace gbe
>
> map<Value *, SmallVector<Value *, 4>> pointerOrigMap;
> typedef map<Value *, SmallVector<Value *, 4>>::iterator PtrOrigMapIter;
> -
> + // map pointer source to bti
> + map<Value *, unsigned> BtiMap;
> + // map ptr to its bti register
> + map<Value *, Value *> BtiValueMap;
> + // map ptr to it's base
> + map<Value *, Value *> pointerBaseMap;
> +
> + typedef map<Value *, Value *>::iterator PtrBaseMapIter;
> /*! We visit each function twice. Once to allocate the registers and once to
> * emit the Gen IR instructions
> */
> @@ -501,6 +506,7 @@ namespace gbe
> } ConstTypeId;
>
> LoopInfo *LI;
> + Function *Func;
> const Module *TheModule;
> int btiBase;
> public:
> @@ -547,23 +553,35 @@ namespace gbe
> bool bKernel = isKernelFunction(F);
> if(!bKernel) return false;
>
> + Func = &F;
> + assignBti(F);
> analyzePointerOrigin(F);
> +
> LI = &getAnalysis<LoopInfo>();
> emitFunction(F);
> phiMap.clear();
> globalPointer.clear();
> pointerOrigMap.clear();
> + BtiMap.clear();
> + BtiValueMap.clear();
> + pointerBaseMap.clear();
> // Reset for next function
> btiBase = BTI_RESERVED_NUM;
> return false;
> }
> /*! Given a possible pointer value, find out the interested escape like
> load/store or atomic instruction */
> - void findPointerEscape(Value *ptr);
> + void findPointerEscape(Value *ptr, std::set<Value *> &mixedPtr, bool recordMixed);
> /*! For all possible pointers, GlobalVariable, function pointer argument,
> alloca instruction, find their pointer escape points */
> void analyzePointerOrigin(Function &F);
> + unsigned getNewBti(Value *origin);
> + void assignBti(Function &F);
> + bool isSingleBti(Value *Val);
> + Value *getBtiRegister(Value *v);
> + Value *getPointerBase(Value *ptr);
>
> + MDNode *getKernelFunctionMetadata(Function *F);
> virtual bool doFinalization(Module &M) { return false; }
> /*! handle global variable register allocation (local, constant space) */
> void allocateGlobalVariableRegister(Function &F);
> @@ -660,10 +678,10 @@ namespace gbe
> // batch vec4/8/16 load/store
> INLINE void emitBatchLoadOrStore(const ir::Type type, const uint32_t elemNum,
> Value *llvmValue, const ir::Register ptr,
> - const ir::AddressSpace addrSpace, Type * elemType, bool isLoad, ir::BTI bti,
> - bool dwAligned);
> + const ir::AddressSpace addrSpace, Type * elemType, bool isLoad, ir::Register bti,
> + bool dwAligned, bool fixedBTI);
> // handle load of dword/qword with unaligned address
> - void emitUnalignedDQLoadStore(Value *llvmPtr, Value *llvmValues, ir::AddressSpace addrSpace, ir::BTI &binding, bool isLoad, bool dwAligned);
> + void emitUnalignedDQLoadStore(ir::Register ptr, Value *llvmValues, ir::AddressSpace addrSpace, ir::Register bti, bool isLoad, bool dwAligned, bool fixedBTI);
> void visitInstruction(Instruction &I) {NOT_SUPPORTED;}
> private:
> ir::ImmediateIndex processConstantImmIndexImpl(Constant *CPV, int32_t index = 0u);
> @@ -675,7 +693,44 @@ namespace gbe
>
> char GenWriter::ID = 0;
>
> - void GenWriter::findPointerEscape(Value *ptr) {
> + static void updatePointerSource(Value *parent, Value *theUser, Value *source, SmallVector<Value *, 4> &pointers) {
> + if (isa<SelectInst>(theUser)) {
> + SelectInst *si = dyn_cast<SelectInst>(theUser);
> + if (si->getTrueValue() == parent)
> + pointers[0] = source;
> + else
> + pointers[1] = source;
> + } else if (isa<PHINode>(theUser)) {
> + PHINode *phi = dyn_cast<PHINode>(theUser);
> + unsigned opNum = phi->getNumIncomingValues();
> + for (unsigned j = 0; j < opNum; j++) {
> + if (phi->getIncomingValue(j) == parent) {
> + pointers[j] = source;
> + }
> + }
> + } else {
> + pointers[0] = source;
> + }
> + }
> +
> + bool isMixedPoint(Value *val, SmallVector<Value *, 4> &pointers) {
> + Value *validSrc = NULL;
> + unsigned i = 0;
> + if (pointers.size() < 2) return false;
> + while(i < pointers.size()) {
> + if (pointers[i] != NULL && validSrc != NULL && pointers[i] != validSrc)
> + return true;
> + // when source is same as itself, we don't treat it as a new source
> + // this often occurs for PHINode
> + if (pointers[i] != NULL && validSrc == NULL && pointers[i] != val) {
> + validSrc = pointers[i];
> + }
> + i++;
> + }
> + return false;
> + }
> +
> + void GenWriter::findPointerEscape(Value *ptr, std::set<Value *> &mixedPtr, bool bFirstPass) {
> std::vector<Value*> workList;
> std::set<Value *> visited;
>
> @@ -695,7 +750,52 @@ namespace gbe
> #else
> User *theUser = iter->getUser();
> #endif
> - if (visited.find(theUser) != visited.end()) continue;
> + bool visitedInThisSource = visited.find(theUser) != visited.end();
> +
> + if (isa<SelectInst>(theUser) || isa<PHINode>(theUser))
> + {
> + // reached from another source, update pointer source
> + PtrOrigMapIter ptrIter = pointerOrigMap.find(theUser);
> + if (ptrIter == pointerOrigMap.end()) {
> + // create new one
> + unsigned capacity = 1;
> + if (isa<SelectInst>(theUser)) capacity = 2;
> + if (isa<PHINode>(theUser)) {
> + PHINode *phi = dyn_cast<PHINode>(theUser);
> + capacity = phi->getNumIncomingValues();
> + }
> +
> + SmallVector<Value *, 4> pointers;
> +
> + unsigned k = 0;
> + while (k++ < capacity) {
> + pointers.push_back(NULL);
> + }
> +
> + updatePointerSource(work, theUser, ptr, pointers);
> + pointerOrigMap.insert(std::make_pair(theUser, pointers));
> + } else {
> + // update pointer source
> + updatePointerSource(work, theUser, ptr, (*ptrIter).second);
> + }
> + ptrIter = pointerOrigMap.find(theUser);
> +
> + if (isMixedPoint(theUser, (*ptrIter).second)) {
> + // for the first pass, we need to record the mixed point instruction.
> + // for the second pass, we don't need to go further, the reason is:
> + // we always use it's 'direct mixed pointer parent' as origin, if we don't
> + // stop here, we may set wrong pointer origin.
> + if (bFirstPass)
> + mixedPtr.insert(theUser);
> + else
> + continue;
> + }
> + // don't fall into dead loop,
> + if (visitedInThisSource || theUser == ptr) {
> + continue;
> + }
> + }
> +
> // pointer address is used as the ValueOperand in store instruction, should be skipped
> if (StoreInst *load = dyn_cast<StoreInst>(theUser)) {
> if (load->getValueOperand() == work) {
> @@ -710,16 +810,33 @@ namespace gbe
> Function *F = dyn_cast<CallInst>(theUser)->getCalledFunction();
> if (!F || F->getIntrinsicID() != 0) continue;
> }
> + Value *pointer = NULL;
> + if (isa<LoadInst>(theUser)) {
> + pointer = dyn_cast<LoadInst>(theUser)->getPointerOperand();
> + } else if (isa<StoreInst>(theUser)) {
> + pointer = dyn_cast<StoreInst>(theUser)->getPointerOperand();
> + } else if (isa<CallInst>(theUser)) {
> + // atomic/read(write)image
> + CallInst *ci = dyn_cast<CallInst>(theUser);
> + pointer = ci->getArgOperand(0);
> + } else {
> + theUser->dump();
> + GBE_ASSERT(0 && "Unknown instruction operating on pointers\n");
> + }
>
> - PtrOrigMapIter ptrIter = pointerOrigMap.find(theUser);
> + // the pointer operand is same as pointer origin, don't add to pointerOrigMap
> + if (ptr == pointer) continue;
> +
> + // load/store/atomic instruction, we have reached the end, stop further traversing
> + PtrOrigMapIter ptrIter = pointerOrigMap.find(pointer);
> if (ptrIter == pointerOrigMap.end()) {
> // create new one
> SmallVector<Value *, 4> pointers;
> pointers.push_back(ptr);
> - pointerOrigMap.insert(std::make_pair(theUser, pointers));
> + pointerOrigMap.insert(std::make_pair(pointer, pointers));
> } else {
> - // append it
> - (*ptrIter).second.push_back(ptr);
> + // update the pointer source here,
> + (*ptrIter).second[0] = ptr;
> }
> } else {
> workList.push_back(theUser);
> @@ -727,28 +844,307 @@ namespace gbe
> }
> }
> }
> + bool GenWriter::isSingleBti(Value *Val) {
> + // self + others same --> single
> + // all same ---> single
> + if (!isa<SelectInst>(Val) && !isa<PHINode>(Val)) {
> + return true;
> + } else {
> + PtrOrigMapIter iter = pointerOrigMap.find(Val);
> + SmallVector<Value *, 4> &pointers = (*iter).second;
> + unsigned srcNum = pointers.size();
> + Value *source = NULL;
> + for (unsigned x = 0; x < srcNum; x++) {
> + // often happend in phiNode where one source is same as PHINode itself, skip it
> + if (pointers[x] == Val) continue;
> +
> + if (source == NULL) source = pointers[x];
> + else {
> + if (source != pointers[x])
> + return false;
> + }
> + }
> + return true;
> + }
> + }
> + Value *GenWriter::getPointerBase(Value *ptr) {
> + PtrBaseMapIter baseIter = pointerBaseMap.find(ptr);
> + if (baseIter != pointerBaseMap.end()) {
> + return baseIter->second;
> + }
> + typedef std::map<Value *, unsigned>::iterator BtiIter;
> + // for pointers that already assigned a bti, it is the base pointer,
> + BtiIter found = BtiMap.find(ptr);
> + if (found != BtiMap.end()) {
> + if (isa<PointerType>(ptr->getType())) {
> + PointerType *ty = cast<PointerType>(ptr->getType());
> + // only global pointer will have starting address
> + if (ty->getAddressSpace() == 1) {
> + return ptr;
> + } else {
> + return ConstantPointerNull::get(ty);
> + }
> + } else {
> + PointerType *ty = PointerType::get(ptr->getType(), 0);
> + return ConstantPointerNull::get(ty);
> + }
> + }
> +
> + PtrOrigMapIter iter = pointerOrigMap.find(ptr);
> + SmallVector<Value *, 4> &pointers = (*iter).second;
> + if (isSingleBti(ptr)) {
> + Value *base = getPointerBase(pointers[0]);
> + pointerBaseMap.insert(std::make_pair(ptr, base));
> + return base;
> + } else {
> + if (isa<SelectInst>(ptr)) {
> + SelectInst *si = dyn_cast<SelectInst>(ptr);
> + IRBuilder<> Builder(si->getParent());
> +
> + Value *trueVal = getPointerBase((*iter).second[0]);
> + Value *falseVal = getPointerBase((*iter).second[1]);
> + Builder.SetInsertPoint(si);
> + Value *base = Builder.CreateSelect(si->getCondition(), trueVal, falseVal);
> + pointerBaseMap.insert(std::make_pair(ptr, base));
> + return base;
> + } else if (isa<PHINode>(ptr)) {
> + PHINode *phi = dyn_cast<PHINode>(ptr);
> + IRBuilder<> Builder(phi->getParent());
> + Builder.SetInsertPoint(phi);
> +
> + PHINode *basePhi = Builder.CreatePHI(ptr->getType(), phi->getNumIncomingValues());
> + unsigned srcNum = pointers.size();
> + for (unsigned x = 0; x < srcNum; x++) {
> + Value *base = NULL;
> + if (pointers[x] != ptr) {
> + base = getPointerBase(pointers[x]);
> + } else {
> + base = basePhi;
> + }
> + IRBuilder<> Builder2(phi->getIncomingBlock(x));
> + BasicBlock *predBB = phi->getIncomingBlock(x);
> + if (predBB->getTerminator())
> + Builder2.SetInsertPoint(predBB->getTerminator());
> +
> +#if (LLVM_VERSION_MAJOR== 3 && LLVM_VERSION_MINOR < 6)
> + // llvm 3.5 and older version don't have CreateBitOrPointerCast() define
> + Type *srcTy = base->getType();
> + Type *dstTy = ptr->getType();
> + if (srcTy->isPointerTy() && dstTy->isIntegerTy())
> + base = Builder2.CreatePtrToInt(base, dstTy);
> + else if (srcTy->isIntegerTy() && dstTy->isPointerTy())
> + base = Builder2.CreateIntToPtr(base, dstTy);
> + else if (srcTy != dstTy)
> + base = Builder2.CreateBitCast(base, dstTy);
> +#else
> + base = Builder2.CreateBitOrPointerCast(base, ptr->getType());
> +#endif
> + basePhi->addIncoming(base, phi->getIncomingBlock(x));
> + }
> + pointerBaseMap.insert(std::make_pair(ptr, basePhi));
> + return basePhi;
> + } else {
> + ptr->dump();
> + GBE_ASSERT(0 && "Unhandled instruction in getBtiRegister\n");
> + return ptr;
> + }
> + }
> + }
> +
> + Value *GenWriter::getBtiRegister(Value *Val) {
> + typedef std::map<Value *, unsigned>::iterator BtiIter;
> + typedef std::map<Value *, Value *>::iterator BtiValueIter;
> + BtiIter found = BtiMap.find(Val);
> + BtiValueIter valueIter = BtiValueMap.find(Val);
> + if (valueIter != BtiValueMap.end())
> + return valueIter->second;
> +
> + if (found != BtiMap.end()) {
> + // the Val already got assigned an BTI, return it
> + Value *bti = ConstantInt::get(IntegerType::get(Val->getContext(), 32), found->second);
> + BtiValueMap.insert(std::make_pair(Val, bti));
> + return bti;
> + } else {
> + if (isSingleBti(Val)) {
> + PtrOrigMapIter iter = pointerOrigMap.find(Val);
> + Value * bti = getBtiRegister((*iter).second[0]);
> + BtiValueMap.insert(std::make_pair(Val, bti));
> + return bti;
> + } else {
> + if (isa<SelectInst>(Val)) {
> + SelectInst *si = dyn_cast<SelectInst>(Val);
> +
> + IRBuilder<> Builder(si->getParent());
> + PtrOrigMapIter iter = pointerOrigMap.find(Val);
> + Value *trueVal = getBtiRegister((*iter).second[0]);
> + Value *falseVal = getBtiRegister((*iter).second[1]);
> + Builder.SetInsertPoint(si);
> + Value *bti = Builder.CreateSelect(si->getCondition(), trueVal, falseVal);
> + BtiValueMap.insert(std::make_pair(Val, bti));
> + return bti;
> + } else if (isa<PHINode>(Val)) {
> + PHINode *phi = dyn_cast<PHINode>(Val);
> + IRBuilder<> Builder(phi->getParent());
> + Builder.SetInsertPoint(phi);
> +
> + PHINode *btiPhi = Builder.CreatePHI(IntegerType::get(Val->getContext(), 32), phi->getNumIncomingValues());
> + PtrOrigMapIter iter = pointerOrigMap.find(Val);
> + SmallVector<Value *, 4> &pointers = (*iter).second;
> + unsigned srcNum = pointers.size();
> + for (unsigned x = 0; x < srcNum; x++) {
> + Value *bti = NULL;
> + if (pointers[x] != Val) {
> + bti = getBtiRegister(pointers[x]);
> + } else {
> + bti = btiPhi;
> + }
> + btiPhi->addIncoming(bti, phi->getIncomingBlock(x));
> + }
> + BtiValueMap.insert(std::make_pair(Val, btiPhi));
> + return btiPhi;
> + } else {
> + Val->dump();
> + GBE_ASSERT(0 && "Unhandled instruction in getBtiRegister\n");
> + return Val;
> + }
> + }
> + }
> + }
> +
> + unsigned GenWriter::getNewBti(Value *origin) {
> + unsigned new_bti = 0;
> + if(origin->getName().equals(StringRef("__gen_ocl_printf_buf"))) {
> + new_bti = btiBase;
> + incBtiBase();
> + } else if (origin->getName().equals(StringRef("__gen_ocl_printf_index_buf"))) {
> + new_bti = btiBase;
> + incBtiBase();
> + }
> + else if (isa<GlobalVariable>(origin)
> + && dyn_cast<GlobalVariable>(origin)->isConstant()) {
> + new_bti = BTI_CONSTANT;
> + } else {
> + unsigned space = origin->getType()->getPointerAddressSpace();
> + switch (space) {
> + case 0:
> + new_bti = BTI_PRIVATE;
> + break;
> + case 1:
> + {
> + new_bti = btiBase;
> + incBtiBase();
> + break;
> + }
> + case 2:
> + new_bti = BTI_CONSTANT;
> +
> + break;
> + case 3:
> + new_bti = BTI_LOCAL;
> + break;
> + default:
> + GBE_ASSERT(0);
> + break;
> + }
> + }
> + return new_bti;
> + }
> +
> + MDNode *GenWriter::getKernelFunctionMetadata(Function *F) {
> + NamedMDNode *clKernels = TheModule->getNamedMetadata("opencl.kernels");
> + uint32_t ops = clKernels->getNumOperands();
> + for(uint32_t x = 0; x < ops; x++) {
> + MDNode* node = clKernels->getOperand(x);
> +#if LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR <= 5
> + Value * op = node->getOperand(0);
> +#else
> + auto *V = cast<ValueAsMetadata>(node->getOperand(0));
> + Value *op = V ? V->getValue() : NULL;
> +#endif
> + if(op == F) {
> + return node;
> + }
> + }
> + return NULL;
> + }
> +
> + void GenWriter::assignBti(Function &F) {
> + Module::GlobalListType &globalList = const_cast<Module::GlobalListType &> (TheModule->getGlobalList());
> + for(auto i = globalList.begin(); i != globalList.end(); i ++) {
> + GlobalVariable &v = *i;
> + if(!v.isConstantUsed()) continue;
> +
> + BtiMap.insert(std::make_pair(&v, getNewBti(&v)));
> + }
> + MDNode *typeNameNode = NULL;
> + MDNode *node = getKernelFunctionMetadata(&F);
> + for(uint j = 0; j < node->getNumOperands() - 1; j++) {
> + MDNode *attrNode = dyn_cast_or_null<MDNode>(node->getOperand(1 + j));
> + if (attrNode == NULL) break;
> + MDString *attrName = dyn_cast_or_null<MDString>(attrNode->getOperand(0));
> + if (!attrName) continue;
> + if (attrName->getString() == "kernel_arg_type") {
> + typeNameNode = attrNode;
> + }
> + }
> +
> + unsigned argID = 0;
> + ir::FunctionArgument::InfoFromLLVM llvmInfo;
> + for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I, argID++) {
> + llvmInfo.typeName= (cast<MDString>(typeNameNode->getOperand(1 + argID)))->getString();
> + if (I->getType()->isPointerTy() || llvmInfo.isImageType()) {
> + BtiMap.insert(std::make_pair(I, getNewBti(I)));
> + }
> + }
> +
> + BasicBlock &bb = F.getEntryBlock();
> + for (BasicBlock::iterator iter = bb.begin(), iterE = bb.end(); iter != iterE; ++iter) {
> + if (AllocaInst *ai = dyn_cast<AllocaInst>(iter)) {
> + BtiMap.insert(std::make_pair(ai, BTI_PRIVATE));
> + }
> + }
> + }
>
> void GenWriter::analyzePointerOrigin(Function &F) {
> + // used to record where the pointers get mixed (i.e. select or phi instruction)
> + std::set<Value *> mixedPtr;
> + // This is a two-pass algorithm, the 1st pass will try to update the pointer sources for
> + // every instruction reachable from pointers and record mix-point in this pass.
> + // The second pass will start from really mixed-pointer instruction like select or phinode.
> + // and update the sources correctly. For pointers reachable from mixed-pointer, we will set
> + // its direct mixed-pointer parent as it's pointer origin.
> +
> // GlobalVariable
> Module::GlobalListType &globalList = const_cast<Module::GlobalListType &> (TheModule->getGlobalList());
> for(auto i = globalList.begin(); i != globalList.end(); i ++) {
> GlobalVariable &v = *i;
> if(!v.isConstantUsed()) continue;
> - findPointerEscape(&v);
> + findPointerEscape(&v, mixedPtr, true);
> }
> // function argument
> for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I) {
> if (I->getType()->isPointerTy()) {
> - findPointerEscape(I);
> + findPointerEscape(I, mixedPtr, true);
> }
> }
> // alloca
> BasicBlock &bb = F.getEntryBlock();
> for (BasicBlock::iterator iter = bb.begin(), iterE = bb.end(); iter != iterE; ++iter) {
> if (AllocaInst *ai = dyn_cast<AllocaInst>(iter)) {
> - findPointerEscape(ai);
> + findPointerEscape(ai, mixedPtr, true);
> }
> }
> + // the second pass starts from mixed pointer
> + for (std::set<Value *>::iterator iter = mixedPtr.begin(); iter != mixedPtr.end(); ++iter) {
> + findPointerEscape(*iter, mixedPtr, false);
> + }
> +
> + for (std::set<Value *>::iterator iter = mixedPtr.begin(); iter != mixedPtr.end(); ++iter) {
> + getBtiRegister(*iter);
> + }
> + for (std::set<Value *>::iterator iter = mixedPtr.begin(); iter != mixedPtr.end(); ++iter) {
> + getPointerBase(*iter);
> + }
> }
>
> void getSequentialData(const ConstantDataSequential *cda, void *ptr, uint32_t &offset) {
> @@ -1253,11 +1649,9 @@ namespace gbe
> "Returned value for kernel functions is forbidden");
>
> // Loop over the kernel metadatas to set the required work group size.
> - NamedMDNode *clKernelMetaDatas = TheModule->getNamedMetadata("opencl.kernels");
> size_t reqd_wg_sz[3] = {0, 0, 0};
> size_t hint_wg_sz[3] = {0, 0, 0};
> ir::FunctionArgument::InfoFromLLVM llvmInfo;
> - MDNode *node = NULL;
> MDNode *addrSpaceNode = NULL;
> MDNode *typeNameNode = NULL;
> MDNode *accessQualNode = NULL;
> @@ -1267,16 +1661,7 @@ namespace gbe
> std::string functionAttributes;
>
> /* First find the meta data belong to this function. */
> - for(uint i = 0; i < clKernelMetaDatas->getNumOperands(); i++) {
> - node = clKernelMetaDatas->getOperand(i);
> -#if LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR <= 5
> - if (node->getOperand(0) == &F) break;
> -#else
> - auto *V = cast<ValueAsMetadata>(node->getOperand(0));
> - if (V && V->getValue() == &F) break;
> -#endif
> - node = NULL;
> - }
> + MDNode *node = getKernelFunctionMetadata(&F);
>
> /* because "-cl-kernel-arg-info", should always have meta data. */
> if (!F.arg_empty())
> @@ -1362,7 +1747,6 @@ namespace gbe
> functionAttributes += " ";
> }
> }
> - ctx.appendSurface(1, ir::ocl::stackbuffer);
>
> ctx.getFunction().setCompileWorkGroupSize(reqd_wg_sz[0], reqd_wg_sz[1], reqd_wg_sz[2]);
>
> @@ -1419,7 +1803,7 @@ namespace gbe
> const ir::Register reg = getRegister(I);
> if (llvmInfo.isImageType()) {
> ctx.input(argName, ir::FunctionArgument::IMAGE, reg, llvmInfo, 4, 4, 0);
> - ctx.getFunction().getImageSet()->append(reg, &ctx, incBtiBase());
> + ctx.getFunction().getImageSet()->append(reg, &ctx, BtiMap.find(I)->second);
> collectImageArgs(llvmInfo.accessQual, imageArgsInfo);
> continue;
> }
> @@ -1452,10 +1836,7 @@ namespace gbe
> const uint32_t align = getAlignmentByte(unit, pointed);
> switch (addrSpace) {
> case ir::MEM_GLOBAL:
> - globalPointer.insert(std::make_pair(I, btiBase));
> - ctx.appendSurface(btiBase, reg);
> - ctx.input(argName, ir::FunctionArgument::GLOBAL_POINTER, reg, llvmInfo, ptrSize, align, btiBase);
> - incBtiBase();
> + ctx.input(argName, ir::FunctionArgument::GLOBAL_POINTER, reg, llvmInfo, ptrSize, align, BtiMap.find(I)->second);
> break;
> case ir::MEM_LOCAL:
> ctx.input(argName, ir::FunctionArgument::LOCAL_POINTER, reg, llvmInfo, ptrSize, align, BTI_LOCAL);
> @@ -1806,14 +2187,10 @@ namespace gbe
> ctx.LOADI(ir::TYPE_S32, reg, ctx.newIntegerImmediate(con.getOffset(), ir::TYPE_S32));
> } else {
> if(v.getName().equals(StringRef("__gen_ocl_printf_buf"))) {
> - ctx.appendSurface(btiBase, ir::ocl::printfbptr);
> - ctx.getFunction().getPrintfSet()->setBufBTI(btiBase);
> - globalPointer.insert(std::make_pair(&v, incBtiBase()));
> + ctx.getFunction().getPrintfSet()->setBufBTI(BtiMap.find(const_cast<GlobalVariable*>(&v))->second);
> regTranslator.newScalarProxy(ir::ocl::printfbptr, const_cast<GlobalVariable*>(&v));
> } else if(v.getName().equals(StringRef("__gen_ocl_printf_index_buf"))) {
> - ctx.appendSurface(btiBase, ir::ocl::printfiptr);
> - ctx.getFunction().getPrintfSet()->setIndexBufBTI(btiBase);
> - globalPointer.insert(std::make_pair(&v, incBtiBase()));
> + ctx.getFunction().getPrintfSet()->setIndexBufBTI(BtiMap.find(const_cast<GlobalVariable*>(&v))->second);
> regTranslator.newScalarProxy(ir::ocl::printfiptr, const_cast<GlobalVariable*>(&v));
> } else if(v.getName().str().substr(0, 4) == ".str") {
> /* When there are multi printf statements in multi kernel fucntions within the same
> @@ -2045,6 +2422,7 @@ namespace gbe
> }
>
> ctx.startFunction(F.getName());
> +
> ir::Function &fn = ctx.getFunction();
> this->regTranslator.clear();
> this->labelMap.clear();
> @@ -2838,19 +3216,46 @@ namespace gbe
> CallSite::arg_iterator AE = CS.arg_end();
> GBE_ASSERT(AI != AE);
>
> + ir::AddressSpace addrSpace;
> +
> + Value *llvmPtr = *AI;
> + Value *bti = getBtiRegister(llvmPtr);
> + Value *ptrBase = getPointerBase(llvmPtr);
> + ir::Register pointer = this->getRegister(llvmPtr);
> + ir::Register baseReg = this->getRegister(ptrBase);
> +
> + ir::Register btiReg;
> + bool fixedBTI = false;
> + if (isa<ConstantInt>(bti)) {
> + fixedBTI = true;
> + unsigned index = cast<ConstantInt>(bti)->getZExtValue();
> + addrSpace = btiToGen(index);
> + ir::ImmediateIndex immIndex = ctx.newImmediate((uint32_t)index);
> + btiReg = ctx.reg(ir::FAMILY_DWORD);
> + ctx.LOADI(ir::TYPE_U32, btiReg, immIndex);
> + } else {
> + addrSpace = ir::MEM_MIXED;
> + btiReg = this->getRegister(bti);
> + }
> +
> + const ir::RegisterFamily pointerFamily = ctx.getPointerFamily();
> + const ir::Register ptr = ctx.reg(pointerFamily);
> + ctx.SUB(ir::TYPE_U32, ptr, pointer, baseReg);
> +
> const ir::Register dst = this->getRegister(&I);
>
> - ir::BTI bti;
> - gatherBTI(&I, bti);
> - const ir::AddressSpace addrSpace = btiToGen(bti);
> - vector<ir::Register> src;
> uint32_t srcNum = 0;
> + vector<ir::Register> src;
> + src.push_back(ptr);
> + srcNum++;
> + AI++;
> +
> while(AI != AE) {
> src.push_back(this->getRegister(*(AI++)));
> srcNum++;
> }
> const ir::Tuple srcTuple = ctx.arrayTuple(&src[0], srcNum);
> - ctx.ATOMIC(opcode, dst, addrSpace, bti, srcTuple);
> + ctx.ATOMIC(opcode, dst, addrSpace, btiReg, fixedBTI, srcTuple);
> }
>
> /* append a new sampler. should be called before any reference to
> @@ -3555,8 +3960,8 @@ namespace gbe
> void GenWriter::emitBatchLoadOrStore(const ir::Type type, const uint32_t elemNum,
> Value *llvmValues, const ir::Register ptr,
> const ir::AddressSpace addrSpace,
> - Type * elemType, bool isLoad, ir::BTI bti,
> - bool dwAligned) {
> + Type * elemType, bool isLoad, ir::Register bti,
> + bool dwAligned, bool fixedBTI) {
> const ir::RegisterFamily pointerFamily = ctx.getPointerFamily();
> uint32_t totalSize = elemNum * getFamilySize(getFamily(type));
> uint32_t msgNum = totalSize > 16 ? totalSize / 16 : 1;
> @@ -3602,79 +4007,18 @@ namespace gbe
>
> // Emit the instruction
> if (isLoad)
> - ctx.LOAD(type, tuple, addr, addrSpace, perMsgNum, dwAligned, bti);
> + ctx.LOAD(type, tuple, addr, addrSpace, perMsgNum, dwAligned, fixedBTI, bti);
> else
> - ctx.STORE(type, tuple, addr, addrSpace, perMsgNum, dwAligned, bti);
> - }
> - }
> -
> - // The idea behind is to search along the use-def chain, and find out all
> - // possible sources of the pointer. Then in later codeGen, we can emit
> - // read/store instructions to these BTIs gathered.
> - void GenWriter::gatherBTI(Value *insn, ir::BTI &bti) {
> - PtrOrigMapIter iter = pointerOrigMap.find(insn);
> - if (iter != pointerOrigMap.end()) {
> - SmallVectorImpl<Value *> &origins = iter->second;
> - uint8_t nBTI = 0;
> - for (unsigned i = 0; i < origins.size(); i++) {
> - uint8_t new_bti = 0;
> - Value *origin = origins[i];
> - // all constant put into constant cache, including __constant & const __private
> - if (isa<GlobalVariable>(origin)
> - && dyn_cast<GlobalVariable>(origin)->isConstant()) {
> - new_bti = BTI_CONSTANT;
> - } else {
> - unsigned space = origin->getType()->getPointerAddressSpace();
> - switch (space) {
> - case 0:
> - new_bti = BTI_PRIVATE;
> - break;
> - case 1:
> - {
> - GlobalPtrIter iter = globalPointer.find(origin);
> - GBE_ASSERT(iter != globalPointer.end());
> - new_bti = iter->second;
> - break;
> - }
> - case 2:
> - new_bti = BTI_CONSTANT;
> - break;
> - case 3:
> - new_bti = BTI_LOCAL;
> - break;
> - default:
> - GBE_ASSERT(0 && "address space not unhandled in gatherBTI()\n");
> - break;
> - }
> - }
> -
> - // avoid duplicate
> - bool bFound = false;
> - for (int j = 0; j < nBTI; j++) {
> - if (bti.bti[j] == new_bti) {
> - bFound = true; break;
> - }
> - }
> - if (bFound == false) {
> - bti.bti[nBTI++] = new_bti;
> - bti.count = nBTI;
> - }
> - }
> - } else {
> - insn->dump();
> - std::cerr << "Illegal pointer which is not from a valid memory space." << std::endl;
> - std::cerr << "Aborting..." << std::endl;
> - exit(-1);
> + ctx.STORE(type, tuple, addr, addrSpace, perMsgNum, dwAligned, fixedBTI, bti);
> }
> - GBE_ASSERT(bti.count <= MAX_MIXED_POINTER);
> }
> +
> // handle load of dword/qword with unaligned address
> - void GenWriter::emitUnalignedDQLoadStore(Value *llvmPtr, Value *llvmValues, ir::AddressSpace addrSpace, ir::BTI &binding, bool isLoad, bool dwAligned)
> + void GenWriter::emitUnalignedDQLoadStore(ir::Register ptr, Value *llvmValues, ir::AddressSpace addrSpace, ir::Register bti, bool isLoad, bool dwAligned, bool fixedBTI)
> {
> Type *llvmType = llvmValues->getType();
> const ir::Type type = getType(ctx, llvmType);
> unsigned byteSize = getTypeByteSize(unit, llvmType);
> - const ir::Register ptr = this->getRegister(llvmPtr);
>
> Type *elemType = llvmType;
> unsigned elemNum = 1;
> @@ -3704,13 +4048,13 @@ namespace gbe
> const ir::Tuple byteTuple = ctx.arrayTuple(&byteTupleData[0], byteSize);
>
> if (isLoad) {
> - ctx.LOAD(ir::TYPE_U8, byteTuple, ptr, addrSpace, byteSize, dwAligned, binding);
> + ctx.LOAD(ir::TYPE_U8, byteTuple, ptr, addrSpace, byteSize, dwAligned, fixedBTI, bti);
> ctx.BITCAST(type, ir::TYPE_U8, tuple, byteTuple, elemNum, byteSize);
> } else {
> ctx.BITCAST(ir::TYPE_U8, type, byteTuple, tuple, byteSize, elemNum);
> // FIXME: byte scatter does not handle correctly vector store, after fix that,
> // we can directly use on store instruction like:
> - // ctx.STORE(ir::TYPE_U8, byteTuple, ptr, addrSpace, byteSize, dwAligned, binding);
> + // ctx.STORE(ir::TYPE_U8, byteTuple, ptr, addrSpace, byteSize, dwAligned, fixedBTI, bti);
> const ir::RegisterFamily pointerFamily = ctx.getPointerFamily();
> for (uint32_t elemID = 0; elemID < byteSize; elemID++) {
> const ir::Register reg = byteTupleData[elemID];
> @@ -3725,7 +4069,7 @@ namespace gbe
> ctx.LOADI(ir::TYPE_S32, offset, immIndex);
> ctx.ADD(ir::TYPE_S32, addr, ptr, offset);
> }
> - ctx.STORE(type, addr, addrSpace, dwAligned, binding, reg);
> + ctx.STORE(type, addr, addrSpace, dwAligned, fixedBTI, bti, reg);
> }
> }
> }
> @@ -3738,10 +4082,31 @@ namespace gbe
> Value *llvmValues = getLoadOrStoreValue(I);
> Type *llvmType = llvmValues->getType();
> const bool dwAligned = (I.getAlignment() % 4) == 0;
> - const ir::Register ptr = this->getRegister(llvmPtr);
> - ir::BTI binding;
> - gatherBTI(&I, binding);
> - const ir::AddressSpace addrSpace = btiToGen(binding);
> + ir::AddressSpace addrSpace;
> + const ir::Register pointer = this->getRegister(llvmPtr);
> + const ir::RegisterFamily pointerFamily = ctx.getPointerFamily();
> +
> + Value *bti = getBtiRegister(llvmPtr);
> + Value *ptrBase = getPointerBase(llvmPtr);
> + ir::Register baseReg = this->getRegister(ptrBase);
> + bool zeroBase = false;
> + if (isa<ConstantPointerNull>(ptrBase)) {
> + zeroBase = true;
> + }
> +
> + ir::Register btiReg;
> + bool fixedBTI = false;
> + if (isa<ConstantInt>(bti)) {
> + fixedBTI = true;
> + unsigned index = cast<ConstantInt>(bti)->getZExtValue();
> + addrSpace = btiToGen(index);
> + ir::ImmediateIndex immIndex = ctx.newImmediate((uint32_t)index);
> + btiReg = ctx.reg(ir::FAMILY_DWORD);
> + ctx.LOADI(ir::TYPE_U32, btiReg, immIndex);
> + } else {
> + addrSpace = ir::MEM_MIXED;
> + btiReg = this->getRegister(bti);
> + }
>
> Type *scalarType = llvmType;
> if (!isScalarType(llvmType)) {
> @@ -3749,11 +4114,20 @@ namespace gbe
> scalarType = vectorType->getElementType();
> }
>
> + ir::Register ptr = ctx.reg(pointerFamily);
> + // FIXME: avoid subtraction zero at this stage is not a good idea,
> + // but later ArgumentLower pass need to match exact load/addImm pattern
> + // so, I avoid subtracting zero base to satisfy ArgumentLower pass.
> + if (!zeroBase)
> + ctx.SUB(ir::TYPE_U32, ptr, pointer, baseReg);
> + else
> + ptr = pointer;
> +
> if (!dwAligned
> && (scalarType == IntegerType::get(I.getContext(), 64)
> || scalarType == IntegerType::get(I.getContext(), 32))
> ) {
> - emitUnalignedDQLoadStore(llvmPtr, llvmValues, addrSpace, binding, isLoad, dwAligned);
> + emitUnalignedDQLoadStore(ptr, llvmValues, addrSpace, btiReg, isLoad, dwAligned, fixedBTI);
> return;
> }
> // Scalar is easy. We neednot build register tuples
> @@ -3761,9 +4135,9 @@ namespace gbe
> const ir::Type type = getType(ctx, llvmType);
> const ir::Register values = this->getRegister(llvmValues);
> if (isLoad)
> - ctx.LOAD(type, ptr, addrSpace, dwAligned, binding, values);
> + ctx.LOAD(type, ptr, addrSpace, dwAligned, fixedBTI, btiReg, values);
> else
> - ctx.STORE(type, ptr, addrSpace, dwAligned, binding, values);
> + ctx.STORE(type, ptr, addrSpace, dwAligned, fixedBTI, btiReg, values);
> }
> // A vector type requires to build a tuple
> else {
> @@ -3785,10 +4159,9 @@ namespace gbe
> // The code is going to be fairly different from types to types (based on
> // size of each vector element)
> const ir::Type type = getType(ctx, elemType);
> - const ir::RegisterFamily pointerFamily = ctx.getPointerFamily();
> const ir::RegisterFamily dataFamily = getFamily(type);
>
> - if(dataFamily == ir::FAMILY_DWORD && addrSpace != ir::MEM_CONSTANT && addrSpace != ir::MEM_MIXED) {
> + if(dataFamily == ir::FAMILY_DWORD && addrSpace != ir::MEM_CONSTANT) {
> // One message is enough here. Nothing special to do
> if (elemNum <= 4) {
> // Build the tuple data in the vector
> @@ -3807,19 +4180,19 @@ namespace gbe
>
> // Emit the instruction
> if (isLoad)
> - ctx.LOAD(type, tuple, ptr, addrSpace, elemNum, dwAligned, binding);
> + ctx.LOAD(type, tuple, ptr, addrSpace, elemNum, dwAligned, fixedBTI, btiReg);
> else
> - ctx.STORE(type, tuple, ptr, addrSpace, elemNum, dwAligned, binding);
> + ctx.STORE(type, tuple, ptr, addrSpace, elemNum, dwAligned, fixedBTI, btiReg);
> }
> // Not supported by the hardware. So, we split the message and we use
> // strided loads and stores
> else {
> - emitBatchLoadOrStore(type, elemNum, llvmValues, ptr, addrSpace, elemType, isLoad, binding, dwAligned);
> + emitBatchLoadOrStore(type, elemNum, llvmValues, ptr, addrSpace, elemType, isLoad, btiReg, dwAligned, fixedBTI);
> }
> }
> else if((dataFamily == ir::FAMILY_WORD && (isLoad || elemNum % 2 == 0)) ||
> (dataFamily == ir::FAMILY_BYTE && (isLoad || elemNum % 4 == 0))) {
> - emitBatchLoadOrStore(type, elemNum, llvmValues, ptr, addrSpace, elemType, isLoad, binding, dwAligned);
> + emitBatchLoadOrStore(type, elemNum, llvmValues, ptr, addrSpace, elemType, isLoad, btiReg, dwAligned, fixedBTI);
> } else {
> for (uint32_t elemID = 0; elemID < elemNum; elemID++) {
> if(regTranslator.isUndefConst(llvmValues, elemID))
> @@ -3839,9 +4212,9 @@ namespace gbe
> ctx.ADD(ir::TYPE_S32, addr, ptr, offset);
> }
> if (isLoad)
> - ctx.LOAD(type, addr, addrSpace, dwAligned, binding, reg);
> + ctx.LOAD(type, addr, addrSpace, dwAligned, fixedBTI, btiReg, reg);
> else
> - ctx.STORE(type, addr, addrSpace, dwAligned, binding, reg);
> + ctx.STORE(type, addr, addrSpace, dwAligned, fixedBTI, btiReg, reg);
> }
> }
> }
> --
> 2.3.6
>
> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet
More information about the Beignet
mailing list