[Libva] [PATCH 1/3] Add old version vme shaders
Zhao, Yakui
yakui.zhao at intel.com
Thu Jun 5 00:26:47 PDT 2014
On Wed, 2014-06-04 at 23:53 -0600, Li, Zhong wrote:
> On 06/05/2014 01:25 PM, Zhao, Yakui wrote:
> > On Wed, 2014-06-04 at 07:37 -0600, Zhong Li wrote:
> >> Old version vme shaders have lower gpu usage and better performance,
> >> but worse encoding quality.
> >> They can be used for encoding quality configuration.
> > Hi, Zhong
> >
> > Not sure whether you are recalling the older vme shader on Ivy?
> >
> > As far as I know, the older vme shader on Ivy can handle multi
> > macroblocks when one GPU thread is spawned. But in your shader it only
> > handle one macroblock.
> >
> > Thanks
> > Yakui
> [Zhong] Thanks for your reply.
> Can you tell me which older vme shader you mean, what
> version number on master or commit ID on staging?
> These sharders come from v1.0.20, which should meet
> performance and gpu usage requirement of our customer.
> BTW, is there distinct benefit when handling multi MB for
> one gpu thread? Generally speaking, performance benefit
> or quality benefit cause more gpu usage, this is not my
> expectation.
I am sorry to misunderstand that the VME shader on Sandybridge is
the older VME shader.
If you plan to use the older vme shader that handles one MB in
one GPU thread, maybe it is enough that one extra flag is added for the
current shader, which disables some complex features in current VME
shader. And it is easier.
>
>
> >> Signed-off-by: Zhong Li <zhong.li at intel.com>
> >> ---
> >> configure.ac | 1 +
> >> src/shaders/Makefile.am | 2 +-
> >> src/shaders/vme_old/Makefile.am | 70 +++++
> >> src/shaders/vme_old/gen6_vme_header.inc | 160 ++++++++++
> >> src/shaders/vme_old/gen7_vme_header.inc | 164 +++++++++++
> >> src/shaders/vme_old/inter_frame.asm | 104 +++++++
> >> src/shaders/vme_old/inter_frame.g6a | 2 +
> >> src/shaders/vme_old/inter_frame.g6b | 28 ++
> >> src/shaders/vme_old/inter_frame.g7a | 2 +
> >> src/shaders/vme_old/inter_frame.g7b | 28 ++
> >> src/shaders/vme_old/inter_frame_haswell.asm | 405 ++++++++++++++++++++++++++
> >> src/shaders/vme_old/inter_frame_haswell.g75a | 2 +
> >> src/shaders/vme_old/inter_frame_haswell.g75b | 137 +++++++++
> >> src/shaders/vme_old/intra_frame.asm | 130 +++++++++
> >> src/shaders/vme_old/intra_frame.g6a | 3 +
> >> src/shaders/vme_old/intra_frame.g6b | 47 +++
> >> src/shaders/vme_old/intra_frame.g7a | 2 +
> >> src/shaders/vme_old/intra_frame.g7b | 47 +++
> >> src/shaders/vme_old/intra_frame_haswell.asm | 160 ++++++++++
> >> src/shaders/vme_old/intra_frame_haswell.g75a | 2 +
> >> src/shaders/vme_old/intra_frame_haswell.g75b | 57 ++++
> >> src/shaders/vme_old/vme75.inc | 268 +++++++++++++++++
> >> 22 files changed, 1820 insertions(+), 1 deletion(-)
> >> create mode 100644 src/shaders/vme_old/Makefile.am
> >> create mode 100644 src/shaders/vme_old/gen6_vme_header.inc
> >> create mode 100644 src/shaders/vme_old/gen7_vme_header.inc
> >> create mode 100644 src/shaders/vme_old/inter_frame.asm
> >> create mode 100644 src/shaders/vme_old/inter_frame.g6a
> >> create mode 100644 src/shaders/vme_old/inter_frame.g6b
> >> create mode 100644 src/shaders/vme_old/inter_frame.g7a
> >> create mode 100644 src/shaders/vme_old/inter_frame.g7b
> >> create mode 100644 src/shaders/vme_old/inter_frame_haswell.asm
> >> create mode 100644 src/shaders/vme_old/inter_frame_haswell.g75a
> >> create mode 100644 src/shaders/vme_old/inter_frame_haswell.g75b
> >> create mode 100644 src/shaders/vme_old/intra_frame.asm
> >> create mode 100644 src/shaders/vme_old/intra_frame.g6a
> >> create mode 100644 src/shaders/vme_old/intra_frame.g6b
> >> create mode 100644 src/shaders/vme_old/intra_frame.g7a
> >> create mode 100644 src/shaders/vme_old/intra_frame.g7b
> >> create mode 100644 src/shaders/vme_old/intra_frame_haswell.asm
> >> create mode 100644 src/shaders/vme_old/intra_frame_haswell.g75a
> >> create mode 100644 src/shaders/vme_old/intra_frame_haswell.g75b
> >> create mode 100644 src/shaders/vme_old/vme75.inc
> >>
> >> diff --git a/configure.ac b/configure.ac
> >> index 3c0ab7c..65a521f 100644
> >> --- a/configure.ac
> >> +++ b/configure.ac
> >> @@ -182,6 +182,7 @@ AC_OUTPUT([
> >> src/shaders/render/Makefile
> >> src/shaders/utils/Makefile
> >> src/shaders/vme/Makefile
> >> + src/shaders/vme_old/Makefile
> >> src/wayland/Makefile
> >> ])
> >>
> >> diff --git a/src/shaders/Makefile.am b/src/shaders/Makefile.am
> >> index 9e3ec94..ce52857 100644
> >> --- a/src/shaders/Makefile.am
> >> +++ b/src/shaders/Makefile.am
> >> @@ -1,4 +1,4 @@
> >> -SUBDIRS = h264 mpeg2 render post_processing vme utils
> >> +SUBDIRS = h264 mpeg2 render post_processing vme vme_old utils
> >>
> >> EXTRA_DIST = gpp.py
> >>
> >> diff --git a/src/shaders/vme_old/Makefile.am b/src/shaders/vme_old/Makefile.am
> >> new file mode 100644
> >> index 0000000..12f0e28
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/Makefile.am
> >> @@ -0,0 +1,70 @@
> >> +VME_CORE = intra_frame.asm inter_frame.asm
> >> +VME75_CORE = intra_frame_haswell.asm inter_frame_haswell.asm
> >> +
> >> +INTEL_G6B = intra_frame.g6b inter_frame.g6b
> >> +INTEL_G6A = intra_frame.g6a inter_frame.g6a
> >> +INTEL_GEN6_INC = gen6_vme_header.inc
> >> +INTEL_GEN6_ASM = $(INTEL_G6A:%.g6a=%.gen6.asm)
> >> +
> >> +INTEL_G7B = intra_frame.g7b inter_frame.g7b
> >> +INTEL_G7A = intra_frame.g7a inter_frame.g7a
> >> +INTEL_GEN7_INC = gen7_vme_header.inc
> >> +INTEL_GEN7_ASM = $(INTEL_G7A:%.g7a=%.gen7.asm)
> >> +
> >> +INTEL_G75B = intra_frame_haswell.g75b inter_frame_haswell.g75b
> >> +INTEL_G75A = intra_frame_haswell.g75a inter_frame_haswell.g75a
> >> +INTEL_GEN75_INC = vme75.inc
> >> +INTEL_GEN75_ASM = $(INTEL_G75A:%.g75a=%.gen75.asm)
> >> +
> >> +TARGETS =
> >> +if HAVE_GEN4ASM
> >> +TARGETS += $(INTEL_G6B)
> >> +TARGETS += $(INTEL_G7B)
> >> +TARGETS += $(INTEL_G75B)
> >> +endif
> >> +
> >> +all-local: $(TARGETS)
> >> +
> >> +SUFFIXES = .g6a .g6b .g7a .g7b .gen6.asm .gen7.asm .g75a .g75b .gen75.asm
> >> +
> >> +if HAVE_GEN4ASM
> >> +$(INTEL_GEN6_ASM): $(VME_CORE) $(INTEL_GEN6_INC)
> >> +.g6a.gen6.asm:
> >> + $(AM_V_GEN)m4 $< > $@
> >> +.gen6.asm.g6b:
> >> + $(AM_V_GEN)$(GEN4ASM) -g 6 -o $@ $<
> >> +
> >> +$(INTEL_GEN7_ASM): $(VME_CORE) $(INTEL_GEN7_INC)
> >> +.g7a.gen7.asm:
> >> + $(AM_V_GEN)m4 $< > $@
> >> +.gen7.asm.g7b:
> >> + $(AM_V_GEN)$(GEN4ASM) -g 7 -o $@ $<
> >> +
> >> +
> >> +$(INTEL_GEN75_ASM): $(VME75_CORE) $(INTEL_GEN75_INC)
> >> +.g75a.gen75.asm:
> >> + $(AM_V_GEN)cpp -P $< > _vme0.$@ && \
> >> + m4 _vme0.$@ > $@ && \
> >> + rm _vme0.$@
> >> +.gen75.asm.g75b:
> >> + $(AM_V_GEN)$(GEN4ASM) -g 7.5 -o $@ $<
> >> +endif
> >> +
> >> +CLEANFILES = $(INTEL_GEN6_ASM) $(INTEL_GEN7_ASM) $(INTEL_GEN75_ASM)
> >> +
> >> +EXTRA_DIST = \
> >> + $(INTEL_G6A) \
> >> + $(INTEL_G6B) \
> >> + $(INTEL_G7A) \
> >> + $(INTEL_G7B) \
> >> + $(INTEL_G75A) \
> >> + $(INTEL_G75B) \
> >> + $(INTEL_GEN6_INC) \
> >> + $(INTEL_GEN7_INC) \
> >> + $(INTEL_GEN75_INC) \
> >> + $(VME_CORE) \
> >> + $(VME75_CORE) \
> >> + $(NULL)
> >> +
> >> +# Extra clean files so that maintainer-clean removes *everything*
> >> +MAINTAINERCLEANFILES = Makefile.in
> >> diff --git a/src/shaders/vme_old/gen6_vme_header.inc b/src/shaders/vme_old/gen6_vme_header.inc
> >> new file mode 100644
> >> index 0000000..b73e11c
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/gen6_vme_header.inc
> >> @@ -0,0 +1,160 @@
> >> +/*
> >> + * Copyright © <2010>, Intel Corporation.
> >> + *
> >> + * This program is licensed under the terms and conditions of the
> >> + * Eclipse Public License (EPL), version 1.0. The full text of the EPL is at
> >> + * http://www.opensource.org/licenses/eclipse-1.0.php.
> >> + *
> >> + */
> >> +// Modual name: ME_header.inc
> >> +//
> >> +// Global symbols define
> >> +//
> >> +
> >> +/*
> >> + * Constant
> >> + */
> >> +define(`VME_MESSAGE_TYPE_INTER', `1')
> >> +define(`VME_MESSAGE_TYPE_INTRA', `2')
> >> +define(`VME_MESSAGE_TYPE_MIXED', `3')
> >> +
> >> +define(`BLOCK_32X1', `0x0000001F')
> >> +define(`BLOCK_4X16', `0x000F0003')
> >> +
> >> +define(`LUMA_INTRA_16x16_DISABLE', `0x1')
> >> +define(`LUMA_INTRA_8x8_DISABLE', `0x2')
> >> +define(`LUMA_INTRA_4x4_DISABLE', `0x4')
> >> +
> >> +define(`INTRA_PRED_AVAIL_FLAG_AE', `0x60')
> >> +define(`INTRA_PRED_AVAIL_FLAG_B', `0x10')
> >> +define(`INTRA_PRED_AVAIL_FLAG_C', `0x8')
> >> +define(`INTRA_PRED_AVAIL_FLAG_D', `0x4')
> >> +
> >> +define(`BIND_IDX_VME', `0')
> >> +define(`BIND_IDX_VME_REF0', `1')
> >> +define(`BIND_IDX_VME_REF1', `2')
> >> +define(`BIND_IDX_OUTPUT', `3')
> >> +define(`BIND_IDX_INEP', `4')
> >> +
> >> +define(`SUB_PEL_MODE_INTEGER', `0x00000000')
> >> +define(`SUB_PEL_MODE_HALF', `0x00001000')
> >> +define(`SUB_PEL_MODE_QUARTER', `0x00003000')
> >> +
> >> +define(`INTER_SAD_NONE', `0x00000000')
> >> +define(`INTER_SAD_HAAR', `0x00200000')
> >> +
> >> +define(`INTRA_SAD_NONE', `0x00000000')
> >> +define(`INTRA_SAD_HAAR', `0x00800000')
> >> +
> >> +define(`INTER_PART_MASK', `0x7E000000')
> >> +
> >> +define(`REF_REGION_SIZE', `0x2020:UW')
> >> +
> >> +define(`BI_SUB_MB_PART_MASK', `0x0c000000')
> >> +define(`MAX_NUM_MV', `0x00000020')
> >> +define(`SEARCH_PATH_LEN', `0x00003F3F')
> >> +
> >> +define(`INTRA_PREDICTORE_MODE', `0x11111111:UD')
> >> +
> >> +define(`OBW_CACHE_TYPE', `5')
> >> +
> >> +define(`OBW_MESSAGE_TYPE', `8')
> >> +
> >> +define(`OBW_BIND_IDX', `BIND_IDX_OUTPUT')
> >> +
> >> +define(`OBW_CONTROL_0', `0') /* 1 OWord, low 128 bits */
> >> +define(`OBW_CONTROL_1', `1') /* 1 OWord, high 128 bits */
> >> +define(`OBW_CONTROL_2', `2') /* 2 OWords */
> >> +define(`OBW_CONTROL_3', `3') /* 4 OWords */
> >> +
> >> +define(`OBW_WRITE_COMMIT_CATEGORY', `1') /* write commit on Sandybrige */
> >> +
> >> +define(`OBW_HEADER_PRESENT', `1')
> >> +
> >> +/* GRF registers
> >> + * r0 header
> >> + * r1~r4 constant buffer (reserved)
> >> + * r5 inline data
> >> + * r6~r11 reserved
> >> + * r12 write back of VME message
> >> + * r13 write back of Oword Block Write
> >> + */
> >> +/*
> >> + * GRF 0 -- header
> >> + */
> >> +define(`thread_id_ub', `r0.20<0,1,0>:UB') /* thread id in payload */
> >> +
> >> +/*
> >> + * GRF 1~4 -- Constant Buffer (reserved)
> >> + */
> >> +
> >> +/*
> >> + * GRF 5 -- inline data
> >> + */
> >> +define(`inline_reg0', `r5')
> >> +define(`w_in_mb_uw', `inline_reg0.2')
> >> +define(`orig_xy_ub', `inline_reg0.0')
> >> +define(`orig_x_ub', `inline_reg0.0') /* in macroblock */
> >> +define(`orig_y_ub', `inline_reg0.1')
> >> +
> >> +/*
> >> + * GRF 6~11 -- reserved
> >> + */
> >> +
> >> +/*
> >> + * GRF 12~15 -- write back for VME message
> >> + */
> >> +define(`vme_wb', `r12')
> >> +define(`vme_wb0', `r12')
> >> +define(`vme_wb1', `r13')
> >> +define(`vme_wb2', `r14')
> >> +define(`vme_wb3', `r15')
> >> +
> >> +/*
> >> + * GRF 16 -- write back for Oword Block Write message with write commit bit
> >> + */
> >> +define(`obw_wb', `r16')
> >> +define(`obw_wb_length', `1')
> >> +
> >> +/*
> >> + * GRF 18~21 -- Intra Neighbor Edge Pixels
> >> + */
> >> +define(`INEP_ROW', `r18')
> >> +define(`INEP_COL0', `r20')
> >> +define(`INEP_COL1', `r21')
> >> +
> >> +/*
> >> + * temporary registers
> >> + */
> >> +define(`tmp_reg0', `r32')
> >> +define(`tmp_reg1', `r33')
> >> +define(`intra_part_mask_ub', `tmp_reg1.28')
> >> +define(`mb_intra_struct_ub', `tmp_reg1.29')
> >> +define(`tmp_reg2', `r34')
> >> +define(`tmp_x_w', `tmp_reg2.0')
> >> +define(`tmp_reg3', `r35')
> >> +
> >> +/*
> >> + * MRF registers
> >> + */
> >> +define(`msg_ind', `0')
> >> +define(`msg_reg0', `m0') /* m0 */
> >> +define(`msg_reg1', `m1') /* m1 */
> >> +define(`msg_reg2', `m2') /* m2 */
> >> +define(`msg_reg3', `m3') /* m3 */
> >> +define(`msg_reg4', `m4') /* m4 */
> >> +
> >> +/*
> >> + * VME message payload
> >> + */
> >> +define(`vme_msg_length', `4')
> >> +define(`vme_intra_wb_length', `1')
> >> +define(`vme_inter_wb_length', `4')
> >> +define(`vme_msg_ind', `msg_ind')
> >> +define(`vme_msg_0', `msg_reg0')
> >> +define(`vme_msg_1', `msg_reg1')
> >> +define(`vme_msg_2', `msg_reg2')
> >> +define(`vme_msg_3', `vme_msg_2')
> >> +define(`vme_msg_4', `msg_reg3')
> >> +
> >> +
> >> diff --git a/src/shaders/vme_old/gen7_vme_header.inc b/src/shaders/vme_old/gen7_vme_header.inc
> >> new file mode 100644
> >> index 0000000..9cec738
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/gen7_vme_header.inc
> >> @@ -0,0 +1,164 @@
> >> +/*
> >> + * Copyright © <2010>, Intel Corporation.
> >> + *
> >> + * This program is licensed under the terms and conditions of the
> >> + * Eclipse Public License (EPL), version 1.0. The full text of the EPL is at
> >> + * http://www.opensource.org/licenses/eclipse-1.0.php.
> >> + *
> >> + */
> >> +// Modual name: ME_header.inc
> >> +//
> >> +// Global symbols define
> >> +//
> >> +
> >> +/*
> >> + * Constant
> >> + */
> >> +define(`VME_MESSAGE_TYPE_INTER', `1')
> >> +define(`VME_MESSAGE_TYPE_INTRA', `2')
> >> +define(`VME_MESSAGE_TYPE_MIXED', `3')
> >> +
> >> +define(`BLOCK_32X1', `0x0000001F')
> >> +define(`BLOCK_4X16', `0x000F0003')
> >> +
> >> +define(`LUMA_INTRA_16x16_DISABLE', `0x1')
> >> +define(`LUMA_INTRA_8x8_DISABLE', `0x2')
> >> +define(`LUMA_INTRA_4x4_DISABLE', `0x4')
> >> +
> >> +define(`INTRA_PRED_AVAIL_FLAG_AE', `0x60')
> >> +define(`INTRA_PRED_AVAIL_FLAG_B', `0x10')
> >> +define(`INTRA_PRED_AVAIL_FLAG_C', `0x8')
> >> +define(`INTRA_PRED_AVAIL_FLAG_D', `0x4')
> >> +
> >> +define(`BIND_IDX_VME', `1')
> >> +define(`BIND_IDX_VME_REF0', `2')
> >> +define(`BIND_IDX_VME_REF1', `3')
> >> +define(`BIND_IDX_OUTPUT', `0')
> >> +define(`BIND_IDX_INEP', `4')
> >> +
> >> +define(`SUB_PEL_MODE_INTEGER', `0x00000000')
> >> +define(`SUB_PEL_MODE_HALF', `0x00001000')
> >> +define(`SUB_PEL_MODE_QUARTER', `0x00003000')
> >> +
> >> +define(`INTER_SAD_NONE', `0x00000000')
> >> +define(`INTER_SAD_HAAR', `0x00200000')
> >> +
> >> +define(`INTRA_SAD_NONE', `0x00000000')
> >> +define(`INTRA_SAD_HAAR', `0x00800000')
> >> +
> >> +define(`INTER_PART_MASK', `0x7E000000')
> >> +
> >> +define(`REF_REGION_SIZE', `0x2020:UW')
> >> +
> >> +define(`BI_SUB_MB_PART_MASK', `0x0c000000')
> >> +define(`MAX_NUM_MV', `0x00000020')
> >> +define(`SEARCH_PATH_LEN', `0x00003F3F')
> >> +
> >> +define(`INTRA_PREDICTORE_MODE', `0x11111111:UD')
> >> +
> >> +define(`OBW_CACHE_TYPE', `10')
> >> +
> >> +define(`OBW_MESSAGE_TYPE', `8')
> >> +
> >> +define(`OBW_BIND_IDX', `BIND_IDX_OUTPUT')
> >> +
> >> +define(`OBW_CONTROL_0', `0') /* 1 OWord, low 128 bits */
> >> +define(`OBW_CONTROL_1', `1') /* 1 OWord, high 128 bits */
> >> +define(`OBW_CONTROL_2', `2') /* 2 OWords */
> >> +define(`OBW_CONTROL_3', `3') /* 4 OWords */
> >> +
> >> +define(`OBW_WRITE_COMMIT_CATEGORY', `0') /* category on Ivybridge */
> >> +
> >> +define(`OBW_HEADER_PRESENT', `1')
> >> +
> >> +/* GRF registers
> >> + * r0 header
> >> + * r1~r4 constant buffer (reserved)
> >> + * r5 inline data
> >> + * r6~r11 reserved
> >> + * r12 write back of VME message
> >> + * r13 write back of Oword Block Write
> >> + */
> >> +/*
> >> + * GRF 0 -- header
> >> + */
> >> +define(`thread_id_ub', `r0.20<0,1,0>:UB') /* thread id in payload */
> >> +
> >> +/*
> >> + * GRF 1~4 -- Constant Buffer (reserved)
> >> + */
> >> +
> >> +/*
> >> + * GRF 5 -- inline data
> >> + */
> >> +define(`inline_reg0', `r5')
> >> +define(`w_in_mb_uw', `inline_reg0.2')
> >> +define(`orig_xy_ub', `inline_reg0.0')
> >> +define(`orig_x_ub', `inline_reg0.0') /* in macroblock */
> >> +define(`orig_y_ub', `inline_reg0.1')
> >> +
> >> +/*
> >> + * GRF 6~11 -- reserved
> >> + */
> >> +
> >> +/*
> >> + * GRF 12~15 -- write back for VME message
> >> + */
> >> +define(`vme_wb', `r12')
> >> +define(`vme_wb0', `r12')
> >> +define(`vme_wb1', `r13')
> >> +define(`vme_wb2', `r14')
> >> +define(`vme_wb3', `r15')
> >> +
> >> +/*
> >> + * GRF 16 -- reserved
> >> + */
> >> +/*
> >> + * write commit is removed on Ivybridge
> >> + */
> >> +define(`obw_wb', `null<1>:W')
> >> +define(`obw_wb_length', `0')
> >> +/*
> >> + * GRF 18~21 -- Intra Neighbor Edge Pixels
> >> + */
> >> +define(`INEP_ROW', `r18')
> >> +define(`INEP_COL0', `r20')
> >> +define(`INEP_COL1', `r21')
> >> +
> >> +/*
> >> + * temporary registers
> >> + */
> >> +define(`tmp_reg0', `r32')
> >> +define(`tmp_reg1', `r33')
> >> +define(`intra_part_mask_ub', `tmp_reg1.28')
> >> +define(`mb_intra_struct_ub', `tmp_reg1.29')
> >> +define(`tmp_reg2', `r34')
> >> +define(`tmp_x_w', `tmp_reg2.0')
> >> +define(`tmp_reg3', `r35')
> >> +
> >> +/*
> >> + * Message Payload registers
> >> + */
> >> +define(`msg_ind', `64')
> >> +define(`msg_reg0', `g64')
> >> +define(`msg_reg1', `g65')
> >> +define(`msg_reg2', `g66')
> >> +define(`msg_reg3', `g67')
> >> +define(`msg_reg4', `g68')
> >> +
> >> +/*
> >> + * VME message payload
> >> + */
> >> +define(`vme_msg_length', `5')
> >> +define(`vme_intra_wb_length', `1')
> >> +define(`vme_inter_wb_length', `6')
> >> +define(`vme_msg_ind', `msg_ind')
> >> +define(`vme_msg_0', `msg_reg0')
> >> +define(`vme_msg_1', `msg_reg1')
> >> +define(`vme_msg_2', `msg_reg2')
> >> +define(`vme_msg_3', `msg_reg3')
> >> +define(`vme_msg_4', `msg_reg4')
> >> +
> >> +
> >> +
> >> +
> >> diff --git a/src/shaders/vme_old/inter_frame.asm b/src/shaders/vme_old/inter_frame.asm
> >> new file mode 100644
> >> index 0000000..b42ecd9
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/inter_frame.asm
> >> @@ -0,0 +1,104 @@
> >> +/*
> >> + * Copyright © <2010>, Intel Corporation.
> >> + *
> >> + * This program is licensed under the terms and conditions of the
> >> + * Eclipse Public License (EPL), version 1.0. The full text of the EPL is at
> >> + * http://www.opensource.org/licenses/eclipse-1.0.php.
> >> + *
> >> + */
> >> +// Modual name: IntraFrame.asm
> >> +//
> >> +// Make intra predition estimation for Intra frame
> >> +//
> >> +
> >> +//
> >> +// Now, begin source code....
> >> +//
> >> +
> >> +/*
> >> + * __START
> >> + */
> >> +__INTER_START:
> >> +mov (16) tmp_reg0.0<1>:UD 0x0:UD {align1};
> >> +mov (16) tmp_reg2.0<1>:UD 0x0:UD {align1};
> >> +
> >> +/*
> >> + * VME message
> >> + */
> >> +/* m0 */
> >> +mul (2) tmp_reg0.8<1>:UW orig_xy_ub<2,2,1>:UB 16:UW {align1}; /* Source = (x, y) * 16 */
> >> +mul (2) tmp_reg0.0<1>:UW orig_xy_ub<2,2,1>:UB 16:UW {align1};
> >> +add (2) tmp_reg0.0<1>:W tmp_reg0.0<2,2,1>:W -8:W {align1}; /* Reference = (x-8,y-8)-(x+24,y+24) */
> >> +mov (1) tmp_reg0.12<1>:UD INTER_PART_MASK + INTER_SAD_HAAR + SUB_PEL_MODE_QUARTER:UD {align1}; /* 16x16 Source, 1/4 pixel, harr */
> >> +
> >> +mov (1) tmp_reg0.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +mov (1) tmp_reg0.22<1>:UW REF_REGION_SIZE {align1}; /* Reference Width&Height, 32x32 */
> >> +mov (8) vme_msg_0.0<1>:UD tmp_reg0.0<8,8,1>:UD {align1};
> >> +
> >> +/* m1 */
> >> +mov (1) tmp_reg1.4<1>:UD MAX_NUM_MV:UD {align1}; /* Default value MAX 32 MVs */
> >> +mov (1) tmp_reg1.8<1>:UD SEARCH_PATH_LEN:UD {align1};
> >> +
> >> +mov (8) vme_msg_1<1>:UD tmp_reg1.0<8,8,1>:UD {align1};
> >> +
> >> +/* m2 */
> >> +mov (8) vme_msg_2<1>:UD 0x0:UD {align1};
> >> +
> >> +/* m3 */
> >> +mov (8) vme_msg_3<1>:UD 0x0:UD {align1};
> >> +
> >> +/* m4 */
> >> +mov (8) vme_msg_4<1>:UD 0x0:UD {align1};
> >> +
> >> +send (8)
> >> + vme_msg_ind
> >> + vme_wb
> >> + null
> >> + vme(
> >> + BIND_IDX_VME,
> >> + 0,
> >> + 0,
> >> + VME_MESSAGE_TYPE_INTER
> >> + )
> >> + mlen vme_msg_length
> >> + rlen vme_inter_wb_length
> >> + {align1};
> >> +
> >> +/*
> >> + * Oword Block Write message
> >> + */
> >> +mul (1) tmp_reg3.8<1>:UD w_in_mb_uw<0,1,0>:UW orig_y_ub<0,1,0>:UB {align1};
> >> +add (1) tmp_reg3.8<1>:UD tmp_reg3.8<0,1,0>:UD orig_x_ub<0,1,0>:UB {align1};
> >> +mul (1) tmp_reg3.8<1>:UD tmp_reg3.8<0,1,0>:UD 0x4:UD {align1};
> >> +mov (1) tmp_reg3.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +mov (8) msg_reg0.0<1>:UD tmp_reg3.0<8,8,1>:UD {align1};
> >> +
> >> +mov (2) tmp_reg3.0<1>:UW vme_wb1.0<2,2,1>:UB {align1};
> >> +add (2) tmp_reg3.0<1>:W tmp_reg3.0<2,2,1>:W -32:W {align1};
> >> +
> >> +mov (8) msg_reg1.0<1>:UD tmp_reg3.0<8,8,0>:UD {align1};
> >> +
> >> +mov (8) msg_reg2.0<1>:UD tmp_reg3.0<8,8,0>:UD {align1};
> >> +
> >> +/* bind index 3, write 4 oword, msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_3,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 3
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +/*
> >> + * kill thread
> >> + */
> >> +mov (8) msg_reg0<1>:UD r0<8,8,1>:UD {align1};
> >> +send (16) msg_ind acc0<1>UW null thread_spawner(0, 0, 1) mlen 1 rlen 0 {align1 EOT};
> >> diff --git a/src/shaders/vme_old/inter_frame.g6a b/src/shaders/vme_old/inter_frame.g6a
> >> new file mode 100644
> >> index 0000000..d89588f
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/inter_frame.g6a
> >> @@ -0,0 +1,2 @@
> >> +include(`gen6_vme_header.inc')
> >> +include(`inter_frame.asm')
> >> diff --git a/src/shaders/vme_old/inter_frame.g6b b/src/shaders/vme_old/inter_frame.g6b
> >> new file mode 100644
> >> index 0000000..02dd806
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/inter_frame.g6b
> >> @@ -0,0 +1,28 @@
> >> + { 0x00800001, 0x24000061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24400061, 0x00000000, 0x00000000 },
> >> + { 0x00200041, 0x24082e29, 0x004500a0, 0x00100010 },
> >> + { 0x00200041, 0x24002e29, 0x004500a0, 0x00100010 },
> >> + { 0x00200040, 0x24003dad, 0x00450400, 0xfff8fff8 },
> >> + { 0x00000001, 0x240c0061, 0x00000000, 0x7e203000 },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00000001, 0x24160169, 0x00000000, 0x20202020 },
> >> + { 0x00600001, 0x20000022, 0x008d0400, 0x00000000 },
> >> + { 0x00000001, 0x24240061, 0x00000000, 0x00000020 },
> >> + { 0x00000001, 0x24280061, 0x00000000, 0x00003f3f },
> >> + { 0x00600001, 0x20200022, 0x008d0420, 0x00000000 },
> >> + { 0x00600001, 0x20400062, 0x00000000, 0x00000000 },
> >> + { 0x00600001, 0x20400062, 0x00000000, 0x00000000 },
> >> + { 0x00600001, 0x20600062, 0x00000000, 0x00000000 },
> >> + { 0x08600031, 0x21801cdd, 0x00000000, 0x08482000 },
> >> + { 0x00000041, 0x24684521, 0x000000a2, 0x000000a1 },
> >> + { 0x00000040, 0x24684421, 0x00000468, 0x000000a0 },
> >> + { 0x00000041, 0x24680c21, 0x00000468, 0x00000004 },
> >> + { 0x00000001, 0x24740231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x20000022, 0x008d0460, 0x00000000 },
> >> + { 0x00200001, 0x24600229, 0x004501a0, 0x00000000 },
> >> + { 0x00200040, 0x24603dad, 0x00450460, 0xffe0ffe0 },
> >> + { 0x00600001, 0x20200022, 0x008c0460, 0x00000000 },
> >> + { 0x00600001, 0x20400022, 0x008c0460, 0x00000000 },
> >> + { 0x05800031, 0x22001cdd, 0x00000000, 0x061b0303 },
> >> + { 0x00600001, 0x20000022, 0x008d0000, 0x00000000 },
> >> + { 0x07800031, 0x24001cc8, 0x00000000, 0x82000010 },
> >> diff --git a/src/shaders/vme_old/inter_frame.g7a b/src/shaders/vme_old/inter_frame.g7a
> >> new file mode 100644
> >> index 0000000..cb51f52
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/inter_frame.g7a
> >> @@ -0,0 +1,2 @@
> >> +include(`gen7_vme_header.inc')
> >> +include(`inter_frame.asm')
> >> diff --git a/src/shaders/vme_old/inter_frame.g7b b/src/shaders/vme_old/inter_frame.g7b
> >> new file mode 100644
> >> index 0000000..3d4fbb4
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/inter_frame.g7b
> >> @@ -0,0 +1,28 @@
> >> + { 0x00800001, 0x24000061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24400061, 0x00000000, 0x00000000 },
> >> + { 0x00200041, 0x24082e29, 0x004500a0, 0x00100010 },
> >> + { 0x00200041, 0x24002e29, 0x004500a0, 0x00100010 },
> >> + { 0x00200040, 0x24003dad, 0x00450400, 0xfff8fff8 },
> >> + { 0x00000001, 0x240c0061, 0x00000000, 0x7e203000 },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00000001, 0x24160169, 0x00000000, 0x20202020 },
> >> + { 0x00600001, 0x28000021, 0x008d0400, 0x00000000 },
> >> + { 0x00000001, 0x24240061, 0x00000000, 0x00000020 },
> >> + { 0x00000001, 0x24280061, 0x00000000, 0x00003f3f },
> >> + { 0x00600001, 0x28200021, 0x008d0420, 0x00000000 },
> >> + { 0x00600001, 0x28400061, 0x00000000, 0x00000000 },
> >> + { 0x00600001, 0x28600061, 0x00000000, 0x00000000 },
> >> + { 0x00600001, 0x28800061, 0x00000000, 0x00000000 },
> >> + { 0x08600031, 0x21801cbd, 0x00000800, 0x0a682001 },
> >> + { 0x00000041, 0x24684521, 0x000000a2, 0x000000a1 },
> >> + { 0x00000040, 0x24684421, 0x00000468, 0x000000a0 },
> >> + { 0x00000041, 0x24680c21, 0x00000468, 0x00000004 },
> >> + { 0x00000001, 0x24740231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0460, 0x00000000 },
> >> + { 0x00200001, 0x24600229, 0x004501a0, 0x00000000 },
> >> + { 0x00200040, 0x24603dad, 0x00450460, 0xffe0ffe0 },
> >> + { 0x00600001, 0x28200021, 0x008c0460, 0x00000000 },
> >> + { 0x00600001, 0x28400021, 0x008c0460, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x060a0300 },
> >> + { 0x00600001, 0x28000021, 0x008d0000, 0x00000000 },
> >> + { 0x07800031, 0x24001ca8, 0x00000800, 0x82000010 },
> >> diff --git a/src/shaders/vme_old/inter_frame_haswell.asm b/src/shaders/vme_old/inter_frame_haswell.asm
> >> new file mode 100644
> >> index 0000000..b6f8eb5
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/inter_frame_haswell.asm
> >> @@ -0,0 +1,405 @@
> >> +/*
> >> + * Copyright © <2010>, Intel Corporation.
> >> + *
> >> + * This program is licensed under the terms and conditions of the
> >> + * Eclipse Public License (EPL), version 1.0. The full text of the EPL is at
> >> + * http://www.opensource.org/licenses/eclipse-1.0.php.
> >> + *
> >> + */
> >> +// Modual name: IntraFrame.asm
> >> +//
> >> +// Make intra predition estimation for Intra frame
> >> +//
> >> +
> >> +//
> >> +// Now, begin source code....
> >> +//
> >> +
> >> +/*
> >> + * __START
> >> + */
> >> +__INTRA_START:
> >> +mov (16) tmp_reg0.0<1>:UD 0x0:UD {align1};
> >> +mov (16) tmp_reg2.0<1>:UD 0x0:UD {align1};
> >> +mov (16) tmp_reg4.0<1>:UD 0x0:UD {align1} ;
> >> +mov (16) tmp_reg6.0<1>:UD 0x0:UD {align1} ;
> >> +
> >> +shl (2) read0_header.0<1>:D orig_xy_ub<2,2,1>:UB 4:UW {align1}; /* (x, y) * 16 */
> >> +add (1) read0_header.0<1>:D read0_header.0<0,1,0>:D -8:W {align1}; /* X offset */
> >> +add (1) read0_header.4<1>:D read0_header.4<0,1,0>:D -1:W {align1}; /* Y offset */
> >> +mov (1) read0_header.8<1>:UD BLOCK_32X1 {align1};
> >> +mov (1) read0_header.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +
> >> +shl (2) read1_header.0<1>:D orig_xy_ub<2,2,1>:UB 4:UW {align1}; /* (x, y) * 16 */
> >> +add (1) read1_header.0<1>:D read1_header.0<0,1,0>:D -4:W {align1}; /* X offset */
> >> +mov (1) read1_header.8<1>:UD BLOCK_4X16 {align1};
> >> +mov (1) read1_header.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +
> >> +shl (2) vme_m0.8<1>:UW orig_xy_ub<2,2,1>:UB 4:UW {align1}; /* (x, y) * 16 */
> >> +mov (1) vme_m0.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +
> >> +mul (1) obw_m0.8<1>:UD w_in_mb_uw<0,1,0>:UW orig_y_ub<0,1,0>:UB {align1};
> >> +add (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD orig_x_ub<0,1,0>:UB {align1};
> >> +mul (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD 24:UD {align1};
> >> +mov (1) obw_m0.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +
> >> +/*
> >> + * Media Read Message -- fetch Luma neighbor edge pixels
> >> + */
> >> +/* ROW */
> >> +mov (8) msg_reg0.0<1>:UD read0_header.0<8,8,1>:UD {align1};
> >> +send (8) msg_ind INEP_ROW<1>:UB null read(BIND_IDX_INEP, 0, 0, 4) mlen 1 rlen 1 {align1};
> >> +
> >> +/* COL */
> >> +mov (8) msg_reg0.0<1>:UD read1_header.0<8,8,1>:UD {align1};
> >> +send (8) msg_ind INEP_COL0<1>:UB null read(BIND_IDX_INEP, 0, 0, 4) mlen 1 rlen 2 {align1};
> >> +
> >> +/* m2, get the MV/Mb cost passed from constant buffer when
> >> +spawning thread by MEDIA_OBJECT */
> >> +mov (8) vme_m2<1>:UD r1.0<8,8,1>:UD {align1};
> >> +
> >> +mov (8) vme_msg_2<1>:UD vme_m2.0<8,8,1>:UD {align1};
> >> +
> >> +/* m3 */
> >> +mov (8) vme_msg_3<1>:UD 0x0:UD {align1};
> >> +
> >> +/* m4 */
> >> +mov (1) INEP_ROW.0<1>:UD 0x0:UD {align1};
> >> +and (1) INEP_ROW.4<1>:UD INEP_ROW.4<0,1,0>:UD 0xFF000000:UD {align1};
> >> +mov (8) vme_msg_4<1>:UD INEP_ROW.0<8,8,1>:UD {align1};
> >> +
> >> +/* m5 */
> >> +mov (8) vme_msg_5<1>:UD 0x0:UD {align1};
> >> +mov (16) vme_msg_5.0<1>:UB INEP_COL0.3<32,8,4>:UB {align1};
> >> +mov (1) vme_msg_5.16<1>:UD INTRA_PREDICTORE_MODE {align1};
> >> +
> >> +/* the penalty for Intra mode */
> >> +mov (1) vme_msg_5.28<1>:UD 0x010101:UD {align1};
> >> +
> >> +
> >> +/* m6 */
> >> +
> >> +mov (8) vme_msg_6<1>:UD 0x0:UD {align1};
> >> +
> >> +/*
> >> + * SIC VME message
> >> + */
> >> +/* m0 */
> >> +mov (8) vme_msg_0.0<1>:UD vme_m0.0<8,8,1>:UD {align1};
> >> +mov (1) tmp_reg0.0<1>:UW LUMA_INTRA_MODE:UW {align1};
> >> +/* Use the Luma mode */
> >> +mov (1) vme_msg_4.5<1>:UB tmp_reg0.0<0,1,0>:UB {align1};
> >> +
> >> +/* m1 */
> >> +mov (1) intra_flag<1>:UW 0x0:UW {align1} ;
> >> +and.z.f0.0 (1) null<1>:UW transform_8x8_ub<0,1,0>:UB 1:UW {align1};
> >> +(f0.0) mov (1) intra_part_mask_ub<1>:UB LUMA_INTRA_8x8_DISABLE {align1};
> >> +
> >> +/* assign MB intra struct from the thread payload*/
> >> +mov (1) mb_intra_struct_ub<1>:UB input_mb_intra_ub<0,1,0>:UB {align1};
> >> +
> >> +/* Disable DC HAAR component when calculating HARR SATD block */
> >> +mov (1) tmp_reg0.0<1>:UW DC_HARR_DISABLE:UW {align1};
> >> +mov (1) vme_m1.30<1>:UB tmp_reg0.0<0,1,0>:UB {align1};
> >> +
> >> +mov (1) vme_m0.12<1>:UD INTRA_SAD_HAAR:UD {align1}; /* 16x16 Source, Intra_harr */
> >> +/* m0 */
> >> +mov (8) vme_msg_0.0<1>:UD vme_m0.0<8,8,1>:UD {align1};
> >> +mov (8) vme_msg_1<1>:UD vme_m1.0<8,8,1>:UD {align1};
> >> +
> >> +/* after verification it will be passed by using payload */
> >> +send (8)
> >> + vme_msg_ind
> >> + vme_wb<1>:UD
> >> + null
> >> + cre(
> >> + BIND_IDX_VME,
> >> + VME_SIC_MESSAGE_TYPE
> >> + )
> >> + mlen sic_vme_msg_length
> >> + rlen vme_wb_length
> >> + {align1};
> >> +/*
> >> + * Oword Block Write message
> >> + */
> >> +mov (8) msg_reg0.0<1>:UD obw_m0<8,8,1>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.0<1>:UD vme_wb.0<0,1,0>:UD {align1};
> >> +mov (1) msg_reg1.4<1>:UD vme_wb.16<0,1,0>:UD {align1};
> >> +mov (1) msg_reg1.8<1>:UD vme_wb.20<0,1,0>:UD {align1};
> >> +mov (1) msg_reg1.12<1>:UD vme_wb.24<0,1,0>:UD {align1};
> >> +
> >> +/* Distortion, Intra (17-16), */
> >> +mov (1) msg_reg1.16<1>:UW vme_wb.12<0,1,0>:UW {align1};
> >> +
> >> +mov (1) msg_reg1.20<1>:UD vme_wb.8<0,1,0>:UD {align1};
> >> +/* VME clock counts */
> >> +mov (1) msg_reg1.24<1>:UD vme_wb.28<0,1,0>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.28<1>:UD obw_m0.8<0,1,0>:UD {align1};
> >> +
> >> +/* bind index 3, write 2 oword (32bytes), msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_2,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 2
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +/* IME search */
> >> +mov (1) vme_m0.12<1>:UD SEARCH_CTRL_SINGLE + INTER_PART_MASK + INTER_SAD_HAAR:UD {align1}; /* 16x16 Source, harr */
> >> +mov (1) vme_m0.22<1>:UW REF_REGION_SIZE {align1}; /* Reference Width&Height, 48x40 */
> >> +
> >> +mov (1) vme_m0.0<1>:UD vme_m0.8<0,1,0>:UD {align1};
> >> +
> >> +add (1) vme_m0.0<1>:W vme_m0.0<0,1,0>:W -16:W {align1}; /* Reference = (x-16,y-12)-(x+32,y+28) */
> >> +add (1) vme_m0.2<1>:W vme_m0.2<0,1,0>:W -12:W {align1};
> >> +
> >> +mov (1) vme_m0.0<1>:W -16:W {align1};
> >> +mov (1) vme_m0.2<1>:W -12:W {align1};
> >> +
> >> +mov (1) vme_m0.4<1>:UD vme_m0.0<0,1,0>:UD {align1};
> >> +
> >> +mov (8) vme_msg_0.0<1>:UD vme_m0.0<8,8,1>:UD {align1};
> >> +
> >> +mov (1) vme_m1.0<1>:UD ADAPTIVE_SEARCH_ENABLE:ud {align1} ;
> >> +mov (1) vme_m1.4<1>:UD MAX_NUM_MV:UD {align1}; /* Default value MAX 32 MVs */
> >> +mov (1) vme_m1.8<1>:UD START_CENTER + SEARCH_PATH_LEN:UD {align1};
> >> +mov (8) vme_msg_1.0<1>:UD vme_m1.0<8,8,1>:UD {align1};
> >> +
> >> +mov (8) vme_msg_2<1>:UD vme_m2.0<8,8,1>:UD {align1};
> >> +/* M3/M4 search path */
> >> +
> >> +mov (1) vme_msg_3.0<1>:UD 0x01010101:UD {align1};
> >> +mov (1) vme_msg_3.4<1>:UD 0x10010101:UD {align1};
> >> +mov (1) vme_msg_3.8<1>:UD 0x0F0F0F0F:UD {align1};
> >> +mov (1) vme_msg_3.12<1>:UD 0x100F0F0F:UD {align1};
> >> +mov (1) vme_msg_3.16<1>:UD 0x01010101:UD {align1};
> >> +mov (1) vme_msg_3.20<1>:UD 0x10010101:UD {align1};
> >> +mov (1) vme_msg_3.24<1>:UD 0x0F0F0F0F:UD {align1};
> >> +mov (1) vme_msg_3.28<1>:UD 0x100F0F0F:UD {align1};
> >> +
> >> +mov (1) vme_msg_4.0<1>:UD 0x01010101:UD {align1};
> >> +mov (1) vme_msg_4.4<1>:UD 0x10010101:UD {align1};
> >> +mov (1) vme_msg_4.8<1>:UD 0x0F0F0F0F:UD {align1};
> >> +mov (1) vme_msg_4.12<1>:UD 0x000F0F0F:UD {align1};
> >> +
> >> +mov (4) vme_msg_4.16<1>:UD 0x0:UD {align1};
> >> +
> >> +send (8)
> >> + vme_msg_ind
> >> + vme_wb<1>:UD
> >> + null
> >> + vme(
> >> + BIND_IDX_VME,
> >> + 0,
> >> + 0,
> >> + VME_IME_MESSAGE_TYPE
> >> + )
> >> + mlen ime_vme_msg_length
> >> + rlen vme_wb_length {align1};
> >> +
> >> +/* Set Macroblock-shape/mode for FBR */
> >> +
> >> +mov (1) vme_m2.20<1>:UD 0x0:UD {align1};
> >> +mov (1) vme_m2.21<1>:UB vme_wb.25<0,1,0>:UB {align1};
> >> +mov (1) vme_m2.22<1>:UB vme_wb.26<0,1,0>:UB {align1};
> >> +
> >> +and (1) tmp_reg0.0<1>:UW vme_wb.0<0,1,0>:UW 0x03:UW {align1};
> >> +mov (1) vme_m2.20<1>:UB tmp_reg0.0<0,1,0>:UB {align1};
> >> +
> >> +/* Write IME inter info */
> >> +add (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD 0x02:UD {align1};
> >> +mov (8) msg_reg0.0<1>:UD obw_m0<8,8,1>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.0<1>:UD vme_wb.0<0,1,0>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.4<1>:UD vme_wb.24<0,1,0>:UD {align1};
> >> +/* Inter distortion of IME */
> >> +mov (1) msg_reg1.8<1>:UD vme_wb.8<0,1,0>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.12<1>:UD obw_m0.8<0,1,0>:UD {align1};
> >> +
> >> +/* bind index 3, write oword (16bytes), msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_0,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 2
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +/* Write IME MV */
> >> +add (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD 0x01:UD {align1};
> >> +mov (8) msg_reg0.0<1>:UD obw_m0<8,8,1>:UD {align1};
> >> +
> >> +mov (8) msg_reg1.0<1>:UD vme_wb1.0<8,8,1>:UD {align1};
> >> +mov (8) msg_reg2.0<1>:ud vme_wb2.0<8,8,1>:ud {align1};
> >> +mov (8) msg_reg3.0<1>:ud vme_wb3.0<8,8,1>:ud {align1};
> >> +mov (8) msg_reg4.0<1>:ud vme_wb4.0<8,8,1>:ud {align1};
> >> +/* bind index 3, write 8 oword (128 bytes), msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_8,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 5
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +/* Write IME RefID */
> >> +add (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD 0x08:UD {align1};
> >> +mov (8) msg_reg0.0<1>:UD obw_m0<8,8,1>:UD {align1};
> >> +
> >> +mov (8) msg_reg1.0<1>:UD vme_wb6.0<8,8,1>:UD {align1};
> >> +
> >> +/* bind index 3, write 2 oword (32bytes), msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_2,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 2
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +/* Send FBR message into CRE */
> >> +
> >> +mov (8) vme_msg_3.0<1>:UD vme_wb1.0<8,8,1>:UD {align1};
> >> +mov (8) vme_msg_4.0<1>:ud vme_wb2.0<8,8,1>:ud {align1};
> >> +mov (8) vme_msg_5.0<1>:ud vme_wb3.0<8,8,1>:ud {align1};
> >> +mov (8) vme_msg_6.0<1>:ud vme_wb4.0<8,8,1>:ud {align1};
> >> +
> >> +mov (1) vme_m0.12<1>:UD INTER_SAD_HAAR + SUB_PEL_MODE_QUARTER + FBR_BME_DISABLE:UD {align1}; /* 16x16 Source, 1/4 pixel, harr, BME disable */
> >> +mov (8) vme_msg_0.0<1>:UD vme_m0.0<8,8,1>:UD {align1};
> >> +mov (8) vme_msg_1.0<1>:UD vme_m1.0<8,8,1>:UD {align1};
> >> +
> >> +mov (8) vme_msg_2.0<1>:UD vme_m2.0<8,8,1>:UD {align1};
> >> +
> >> +/* after verification it will be passed by using payload */
> >> +send (8)
> >> + vme_msg_ind
> >> + vme_wb<1>:UD
> >> + null
> >> + cre(
> >> + BIND_IDX_VME,
> >> + VME_FBR_MESSAGE_TYPE
> >> + )
> >> + mlen fbr_vme_msg_length
> >> + rlen vme_wb_length
> >> + {align1};
> >> +
> >> +add (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD 0x02:UD {align1};
> >> +mov (8) msg_reg0.0<1>:UD obw_m0<8,8,1>:UD {align1};
> >> +/* write FME info */
> >> +mov (1) msg_reg1.0<1>:UD vme_wb.0<0,1,0>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.4<1>:UD vme_wb.24<0,1,0>:UD {align1};
> >> +/* Inter distortion of FME */
> >> +mov (1) msg_reg1.8<1>:UD vme_wb.8<0,1,0>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.12<1>:UD vme_m2.20<0,1,0>:UD {align1};
> >> +
> >> +/* bind index 3, write oword (16bytes), msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_0,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 2
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +/* Write FME/BME MV */
> >> +add (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD 0x01:UD {align1};
> >> +mov (8) msg_reg0.0<1>:UD obw_m0.0<8,8,1>:UD {align1};
> >> +
> >> +
> >> +mov (8) msg_reg1.0<1>:UD vme_wb1.0<8,8,1>:UD {align1};
> >> +mov (8) msg_reg2.0<1>:ud vme_wb2.0<8,8,1>:ud {align1};
> >> +mov (8) msg_reg3.0<1>:ud vme_wb3.0<8,8,1>:ud {align1};
> >> +mov (8) msg_reg4.0<1>:ud vme_wb4.0<8,8,1>:ud {align1};
> >> +/* bind index 3, write 8 oword (128 bytes), msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_8,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 5
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +/* Write FME/BME RefID */
> >> +add (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD 0x08:UD {align1};
> >> +mov (8) msg_reg0.0<1>:UD obw_m0<8,8,1>:UD {align1};
> >> +
> >> +mov (8) msg_reg1.0<1>:UD vme_wb6.0<8,8,1>:UD {align1};
> >> +
> >> +/* bind index 3, write 2 oword (32bytes), msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_2,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 2
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +__EXIT:
> >> +/*
> >> + * kill thread
> >> + */
> >> +mov (8) ts_msg_reg0<1>:UD r0<8,8,1>:UD {align1};
> >> +send (16) ts_msg_ind acc0<1>UW null thread_spawner(0, 0, 1) mlen 1 rlen 0 {align1 EOT};
> >> diff --git a/src/shaders/vme_old/inter_frame_haswell.g75a b/src/shaders/vme_old/inter_frame_haswell.g75a
> >> new file mode 100644
> >> index 0000000..e95ed93
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/inter_frame_haswell.g75a
> >> @@ -0,0 +1,2 @@
> >> +#include "vme75.inc"
> >> +#include "inter_frame_haswell.asm"
> >> diff --git a/src/shaders/vme_old/inter_frame_haswell.g75b b/src/shaders/vme_old/inter_frame_haswell.g75b
> >> new file mode 100644
> >> index 0000000..86971d4
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/inter_frame_haswell.g75b
> >> @@ -0,0 +1,137 @@
> >> + { 0x00800001, 0x24000061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24400061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24800061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24c00061, 0x00000000, 0x00000000 },
> >> + { 0x00200009, 0x24002e25, 0x004500a0, 0x00040004 },
> >> + { 0x00000040, 0x24003ca5, 0x00000400, 0xfff8fff8 },
> >> + { 0x00000040, 0x24043ca5, 0x00000404, 0xffffffff },
> >> + { 0x00000001, 0x240800e1, 0x00000000, 0x0000001f },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00200009, 0x24202e25, 0x004500a0, 0x00040004 },
> >> + { 0x00000040, 0x24203ca5, 0x00000420, 0xfffcfffc },
> >> + { 0x00000001, 0x242800e1, 0x00000000, 0x000f0003 },
> >> + { 0x00000001, 0x24340231, 0x00000014, 0x00000000 },
> >> + { 0x00200009, 0x24482e29, 0x004500a0, 0x00040004 },
> >> + { 0x00000001, 0x24540231, 0x00000014, 0x00000000 },
> >> + { 0x00000041, 0x24884521, 0x000000a2, 0x000000a1 },
> >> + { 0x00000040, 0x24884421, 0x00000488, 0x000000a0 },
> >> + { 0x00000041, 0x24880c21, 0x00000488, 0x00000018 },
> >> + { 0x00000001, 0x24940231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0400, 0x00000000 },
> >> + { 0x04600031, 0x23801cb1, 0x00000800, 0x02190004 },
> >> + { 0x00600001, 0x28000021, 0x008d0420, 0x00000000 },
> >> + { 0x04600031, 0x23a01cb1, 0x00000800, 0x02290004 },
> >> + { 0x00600001, 0x25600021, 0x008d0020, 0x00000000 },
> >> + { 0x00600001, 0x28400021, 0x008d0560, 0x00000000 },
> >> + { 0x00600001, 0x28600061, 0x00000000, 0x00000000 },
> >> + { 0x00000001, 0x23800061, 0x00000000, 0x00000000 },
> >> + { 0x00000005, 0x23840c21, 0x00000384, 0xff000000 },
> >> + { 0x00600001, 0x28800021, 0x008d0380, 0x00000000 },
> >> + { 0x00600001, 0x28a00061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x28a00231, 0x00cf03a3, 0x00000000 },
> >> + { 0x00000001, 0x28b00061, 0x00000000, 0x11111111 },
> >> + { 0x00000001, 0x28bc0061, 0x00000000, 0x00010101 },
> >> + { 0x00600001, 0x28c00061, 0x00000000, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0440, 0x00000000 },
> >> + { 0x00000001, 0x24000169, 0x00000000, 0x00010001 },
> >> + { 0x00000001, 0x28850231, 0x00000400, 0x00000000 },
> >> + { 0x00000001, 0x247c0169, 0x00000000, 0x00000000 },
> >> + { 0x01000005, 0x20002e28, 0x000000a4, 0x00010001 },
> >> + { 0x00010001, 0x247c00f1, 0x00000000, 0x00000002 },
> >> + { 0x00000001, 0x247d0231, 0x000000a5, 0x00000000 },
> >> + { 0x00000001, 0x24000169, 0x00000000, 0x00200020 },
> >> + { 0x00000001, 0x247e0231, 0x00000400, 0x00000000 },
> >> + { 0x00000001, 0x244c0061, 0x00000000, 0x00800000 },
> >> + { 0x00600001, 0x28000021, 0x008d0440, 0x00000000 },
> >> + { 0x00600001, 0x28200021, 0x008d0460, 0x00000000 },
> >> + { 0x0d600031, 0x21801ca1, 0x00000800, 0x0e782000 },
> >> + { 0x00600001, 0x28000021, 0x008d0480, 0x00000000 },
> >> + { 0x00000001, 0x28200021, 0x00000180, 0x00000000 },
> >> + { 0x00000001, 0x28240021, 0x00000190, 0x00000000 },
> >> + { 0x00000001, 0x28280021, 0x00000194, 0x00000000 },
> >> + { 0x00000001, 0x282c0021, 0x00000198, 0x00000000 },
> >> + { 0x00000001, 0x28300129, 0x0000018c, 0x00000000 },
> >> + { 0x00000001, 0x28340021, 0x00000188, 0x00000000 },
> >> + { 0x00000001, 0x28380021, 0x0000019c, 0x00000000 },
> >> + { 0x00000001, 0x283c0021, 0x00000488, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x040a0203 },
> >> + { 0x00000001, 0x244c0061, 0x00000000, 0x00200000 },
> >> + { 0x00000001, 0x24560169, 0x00000000, 0x28302830 },
> >> + { 0x00000001, 0x24400021, 0x00000448, 0x00000000 },
> >> + { 0x00000040, 0x24403dad, 0x00000440, 0xfff0fff0 },
> >> + { 0x00000040, 0x24423dad, 0x00000442, 0xfff4fff4 },
> >> + { 0x00000001, 0x244001ed, 0x00000000, 0xfff0fff0 },
> >> + { 0x00000001, 0x244201ed, 0x00000000, 0xfff4fff4 },
> >> + { 0x00000001, 0x24440021, 0x00000440, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0440, 0x00000000 },
> >> + { 0x00000001, 0x24600061, 0x00000000, 0x00000002 },
> >> + { 0x00000001, 0x24640061, 0x00000000, 0x00000020 },
> >> + { 0x00000001, 0x24680061, 0x00000000, 0x30003030 },
> >> + { 0x00600001, 0x28200021, 0x008d0460, 0x00000000 },
> >> + { 0x00600001, 0x28400021, 0x008d0560, 0x00000000 },
> >> + { 0x00000001, 0x28600061, 0x00000000, 0x01010101 },
> >> + { 0x00000001, 0x28640061, 0x00000000, 0x10010101 },
> >> + { 0x00000001, 0x28680061, 0x00000000, 0x0f0f0f0f },
> >> + { 0x00000001, 0x286c0061, 0x00000000, 0x100f0f0f },
> >> + { 0x00000001, 0x28700061, 0x00000000, 0x01010101 },
> >> + { 0x00000001, 0x28740061, 0x00000000, 0x10010101 },
> >> + { 0x00000001, 0x28780061, 0x00000000, 0x0f0f0f0f },
> >> + { 0x00000001, 0x287c0061, 0x00000000, 0x100f0f0f },
> >> + { 0x00000001, 0x28800061, 0x00000000, 0x01010101 },
> >> + { 0x00000001, 0x28840061, 0x00000000, 0x10010101 },
> >> + { 0x00000001, 0x28880061, 0x00000000, 0x0f0f0f0f },
> >> + { 0x00000001, 0x288c0061, 0x00000000, 0x000f0f0f },
> >> + { 0x00400001, 0x28900061, 0x00000000, 0x00000000 },
> >> + { 0x08600031, 0x21801ca1, 0x00000800, 0x0a784000 },
> >> + { 0x00000001, 0x25740061, 0x00000000, 0x00000000 },
> >> + { 0x00000001, 0x25750231, 0x00000199, 0x00000000 },
> >> + { 0x00000001, 0x25760231, 0x0000019a, 0x00000000 },
> >> + { 0x00000005, 0x24002d29, 0x00000180, 0x00030003 },
> >> + { 0x00000001, 0x25740231, 0x00000400, 0x00000000 },
> >> + { 0x00000040, 0x24880c21, 0x00000488, 0x00000002 },
> >> + { 0x00600001, 0x28000021, 0x008d0480, 0x00000000 },
> >> + { 0x00000001, 0x28200021, 0x00000180, 0x00000000 },
> >> + { 0x00000001, 0x28240021, 0x00000198, 0x00000000 },
> >> + { 0x00000001, 0x28280021, 0x00000188, 0x00000000 },
> >> + { 0x00000001, 0x282c0021, 0x00000488, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x040a0003 },
> >> + { 0x00000040, 0x24880c21, 0x00000488, 0x00000001 },
> >> + { 0x00600001, 0x28000021, 0x008d0480, 0x00000000 },
> >> + { 0x00600001, 0x28200021, 0x008d01a0, 0x00000000 },
> >> + { 0x00600001, 0x28400021, 0x008d01c0, 0x00000000 },
> >> + { 0x00600001, 0x28600021, 0x008d01e0, 0x00000000 },
> >> + { 0x00600001, 0x28800021, 0x008d0200, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x0a0a0403 },
> >> + { 0x00000040, 0x24880c21, 0x00000488, 0x00000008 },
> >> + { 0x00600001, 0x28000021, 0x008d0480, 0x00000000 },
> >> + { 0x00600001, 0x28200021, 0x008d0240, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x040a0203 },
> >> + { 0x00600001, 0x28600021, 0x008d01a0, 0x00000000 },
> >> + { 0x00600001, 0x28800021, 0x008d01c0, 0x00000000 },
> >> + { 0x00600001, 0x28a00021, 0x008d01e0, 0x00000000 },
> >> + { 0x00600001, 0x28c00021, 0x008d0200, 0x00000000 },
> >> + { 0x00000001, 0x244c0061, 0x00000000, 0x00243000 },
> >> + { 0x00600001, 0x28000021, 0x008d0440, 0x00000000 },
> >> + { 0x00600001, 0x28200021, 0x008d0460, 0x00000000 },
> >> + { 0x00600001, 0x28400021, 0x008d0560, 0x00000000 },
> >> + { 0x0d600031, 0x21801ca1, 0x00000800, 0x0e786000 },
> >> + { 0x00000040, 0x24880c21, 0x00000488, 0x00000002 },
> >> + { 0x00600001, 0x28000021, 0x008d0480, 0x00000000 },
> >> + { 0x00000001, 0x28200021, 0x00000180, 0x00000000 },
> >> + { 0x00000001, 0x28240021, 0x00000198, 0x00000000 },
> >> + { 0x00000001, 0x28280021, 0x00000188, 0x00000000 },
> >> + { 0x00000001, 0x282c0021, 0x00000574, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x040a0003 },
> >> + { 0x00000040, 0x24880c21, 0x00000488, 0x00000001 },
> >> + { 0x00600001, 0x28000021, 0x008d0480, 0x00000000 },
> >> + { 0x00600001, 0x28200021, 0x008d01a0, 0x00000000 },
> >> + { 0x00600001, 0x28400021, 0x008d01c0, 0x00000000 },
> >> + { 0x00600001, 0x28600021, 0x008d01e0, 0x00000000 },
> >> + { 0x00600001, 0x28800021, 0x008d0200, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x0a0a0403 },
> >> + { 0x00000040, 0x24880c21, 0x00000488, 0x00000008 },
> >> + { 0x00600001, 0x28000021, 0x008d0480, 0x00000000 },
> >> + { 0x00600001, 0x28200021, 0x008d0240, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x040a0203 },
> >> + { 0x00600001, 0x2e000021, 0x008d0000, 0x00000000 },
> >> + { 0x07800031, 0x24001ca8, 0x00000e00, 0x82000010 },
> >> diff --git a/src/shaders/vme_old/intra_frame.asm b/src/shaders/vme_old/intra_frame.asm
> >> new file mode 100644
> >> index 0000000..809b5f3
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/intra_frame.asm
> >> @@ -0,0 +1,130 @@
> >> +/*
> >> + * Copyright © <2010>, Intel Corporation.
> >> + *
> >> + * This program is licensed under the terms and conditions of the
> >> + * Eclipse Public License (EPL), version 1.0. The full text of the EPL is at
> >> + * http://www.opensource.org/licenses/eclipse-1.0.php.
> >> + *
> >> + */
> >> +// Modual name: IntraFrame.asm
> >> +//
> >> +// Make intra predition estimation for Intra frame
> >> +//
> >> +
> >> +//
> >> +// Now, begin source code....
> >> +//
> >> +
> >> +/*
> >> + * __START
> >> + */
> >> +__INTRA_START:
> >> +mov (16) tmp_reg0.0<1>:UD 0x0:UD {align1};
> >> +mov (16) tmp_reg2.0<1>:UD 0x0:UD {align1};
> >> +
> >> +/*
> >> + * Media Read Message -- fetch neighbor edge pixels
> >> + */
> >> +/* ROW */
> >> +mul (2) tmp_reg0.0<1>:D orig_xy_ub<2,2,1>:UB 16:UW {align1}; /* (x, y) * 16 */
> >> +add (1) tmp_reg0.0<1>:D tmp_reg0.0<0,1,0>:D -8:W {align1}; /* X offset */
> >> +add (1) tmp_reg0.4<1>:D tmp_reg0.4<0,1,0>:D -1:W {align1}; /* Y offset */
> >> +mov (1) tmp_reg0.8<1>:UD BLOCK_32X1 {align1};
> >> +mov (1) tmp_reg0.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +mov (8) msg_reg0.0<1>:UD tmp_reg0.0<8,8,1>:UD {align1};
> >> +send (8) msg_ind INEP_ROW<1>:UB null read(BIND_IDX_INEP, 0, 0, 4) mlen 1 rlen 1 {align1};
> >> +
> >> +/* COL */
> >> +mul (2) tmp_reg0.0<1>:D orig_xy_ub<2,2,1>:UB 16:UW {align1}; /* (x, y) * 16 */
> >> +add (1) tmp_reg0.0<1>:D tmp_reg0.0<0,1,0>:D -4:W {align1}; /* X offset */
> >> +mov (1) tmp_reg0.8<1>:UD BLOCK_4X16 {align1};
> >> +mov (1) tmp_reg0.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +mov (8) msg_reg0.0<1>:UD tmp_reg0.0<8,8,1>:UD {align1};
> >> +send (8) msg_ind INEP_COL0<1>:UB null read(BIND_IDX_INEP, 0, 0, 4) mlen 1 rlen 2 {align1};
> >> +
> >> +/*
> >> + * VME message
> >> + */
> >> +/* m0 */
> >> +mul (2) tmp_reg0.8<1>:UW orig_xy_ub<2,2,1>:UB 16:UW {align1}; /* (x, y) * 16 */
> >> +mov (1) tmp_reg0.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +mov (8) vme_msg_0.0<1>:UD tmp_reg0.0<8,8,1>:UD {align1};
> >> +
> >> +/* m1 */
> >> +mov (1) intra_part_mask_ub<1>:UB LUMA_INTRA_8x8_DISABLE + LUMA_INTRA_4x4_DISABLE {align1};
> >> +
> >> +cmp.nz.f0.0 (1) null<1>:UW orig_x_ub<0,1,0>:UB 0:UW {align1}; /* X != 0 */
> >> +(f0.0) add (1) mb_intra_struct_ub<1>:UB mb_intra_struct_ub<0,1,0>:UB INTRA_PRED_AVAIL_FLAG_AE {align1}; /* A */
> >> +
> >> +cmp.nz.f0.0 (1) null<1>:UW orig_y_ub<0,1,0>:UB 0:UW {align1}; /* Y != 0 */
> >> +(f0.0) add (1) mb_intra_struct_ub<1>:UB mb_intra_struct_ub<0,1,0>:UB INTRA_PRED_AVAIL_FLAG_B {align1}; /* B */
> >> +
> >> +mul.nz.f0.0 (1) null<1>:UW orig_x_ub<0,1,0>:UB orig_y_ub<0,1,0>:UB {align1}; /* X * Y != 0 */
> >> +(f0.0) add (1) mb_intra_struct_ub<1>:UB mb_intra_struct_ub<0,1,0>:UB INTRA_PRED_AVAIL_FLAG_D {align1}; /* D */
> >> +
> >> +add (1) tmp_x_w<1>:W orig_x_ub<0,1,0>:UB 1:UW {align1}; /* X + 1 */
> >> +add (1) tmp_x_w<1>:W w_in_mb_uw<0,1,0>:UW -tmp_x_w<0,1,0>:W {align1}; /* width - (X + 1) */
> >> +mul.nz.f0.0 (1) null<1>:UD tmp_x_w<0,1,0>:W orig_y_ub<0,1,0>:UB {align1}; /* (width - (X + 1)) * Y != 0 */
> >> +(f0.0) add (1) mb_intra_struct_ub<1>:UB mb_intra_struct_ub<0,1,0>:UB INTRA_PRED_AVAIL_FLAG_C {align1}; /* C */
> >> +
> >> +mov (8) vme_msg_1<1>:UD tmp_reg1.0<8,8,1>:UD {align1};
> >> +
> >> +/* m2 */
> >> +mov (8) vme_msg_2<1>:UD 0x0:UD {align1};
> >> +
> >> +/* m3 */
> >> +mov (8) vme_msg_3<1>:UD INEP_ROW.0<8,8,1>:UD {align1};
> >> +
> >> +/* m4 */
> >> +mov (8) vme_msg_4<1>:UD 0x0 {align1};
> >> +mov (16) vme_msg_4.0<1>:UB INEP_COL0.3<32,8,4>:UB {align1};
> >> +mov (1) vme_msg_4.16<1>:UD INTRA_PREDICTORE_MODE {align1};
> >> +
> >> +send (8)
> >> + vme_msg_ind
> >> + vme_wb
> >> + null
> >> + vme(
> >> + BIND_IDX_VME,
> >> + 0,
> >> + 0,
> >> + VME_MESSAGE_TYPE_INTRA
> >> + )
> >> + mlen vme_msg_length
> >> + rlen vme_intra_wb_length
> >> + {align1};
> >> +
> >> +/*
> >> + * Oword Block Write message
> >> + */
> >> +mul (1) tmp_reg3.8<1>:UD w_in_mb_uw<0,1,0>:UW orig_y_ub<0,1,0>:UB {align1};
> >> +add (1) tmp_reg3.8<1>:UD tmp_reg3.8<0,1,0>:UD orig_x_ub<0,1,0>:UB {align1};
> >> +mov (1) tmp_reg3.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +mov (8) msg_reg0.0<1>:UD tmp_reg3<8,8,1>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.0<1>:UD vme_wb.0<0,1,0>:UD {align1};
> >> +mov (1) msg_reg1.4<1>:UD vme_wb.16<0,1,0>:UD {align1};
> >> +mov (1) msg_reg1.8<1>:UD vme_wb.20<0,1,0>:UD {align1};
> >> +mov (1) msg_reg1.12<1>:UD vme_wb.24<0,1,0>:UD {align1};
> >> +/* bind index 3, write 1 oword, msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_0,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 2
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +/*
> >> + * kill thread
> >> + */
> >> +mov (8) msg_reg0<1>:UD r0<8,8,1>:UD {align1};
> >> +send (16) msg_ind acc0<1>UW null thread_spawner(0, 0, 1) mlen 1 rlen 0 {align1 EOT};
> >> diff --git a/src/shaders/vme_old/intra_frame.g6a b/src/shaders/vme_old/intra_frame.g6a
> >> new file mode 100644
> >> index 0000000..d39118c
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/intra_frame.g6a
> >> @@ -0,0 +1,3 @@
> >> +include(`gen6_vme_header.inc')
> >> +include(`intra_frame.asm')
> >> +
> >> diff --git a/src/shaders/vme_old/intra_frame.g6b b/src/shaders/vme_old/intra_frame.g6b
> >> new file mode 100644
> >> index 0000000..90ee252
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/intra_frame.g6b
> >> @@ -0,0 +1,47 @@
> >> + { 0x00800001, 0x24000061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24400061, 0x00000000, 0x00000000 },
> >> + { 0x00200041, 0x24002e25, 0x004500a0, 0x00100010 },
> >> + { 0x00000040, 0x24003ca5, 0x00000400, 0xfff8fff8 },
> >> + { 0x00000040, 0x24043ca5, 0x00000404, 0xffffffff },
> >> + { 0x00000001, 0x240800e1, 0x00000000, 0x0000001f },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x20000022, 0x008d0400, 0x00000000 },
> >> + { 0x04600031, 0x22401cd1, 0x00000000, 0x02188004 },
> >> + { 0x00200041, 0x24002e25, 0x004500a0, 0x00100010 },
> >> + { 0x00000040, 0x24003ca5, 0x00000400, 0xfffcfffc },
> >> + { 0x00000001, 0x240800e1, 0x00000000, 0x000f0003 },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x20000022, 0x008d0400, 0x00000000 },
> >> + { 0x04600031, 0x22801cd1, 0x00000000, 0x02288004 },
> >> + { 0x00200041, 0x24082e29, 0x004500a0, 0x00100010 },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x20000022, 0x008d0400, 0x00000000 },
> >> + { 0x00000001, 0x243c00f1, 0x00000000, 0x00000006 },
> >> + { 0x02000010, 0x20002e28, 0x000000a0, 0x00000000 },
> >> + { 0x00010040, 0x243d1e31, 0x0000043d, 0x00000060 },
> >> + { 0x02000010, 0x20002e28, 0x000000a1, 0x00000000 },
> >> + { 0x00010040, 0x243d1e31, 0x0000043d, 0x00000010 },
> >> + { 0x02000041, 0x20004628, 0x000000a0, 0x000000a1 },
> >> + { 0x00010040, 0x243d1e31, 0x0000043d, 0x00000004 },
> >> + { 0x00000040, 0x24402e2d, 0x000000a0, 0x00010001 },
> >> + { 0x00000040, 0x2440352d, 0x000000a2, 0x00004440 },
> >> + { 0x02000041, 0x200045a0, 0x00000440, 0x000000a1 },
> >> + { 0x00010040, 0x243d1e31, 0x0000043d, 0x00000008 },
> >> + { 0x00600001, 0x20200022, 0x008d0420, 0x00000000 },
> >> + { 0x00600001, 0x20400062, 0x00000000, 0x00000000 },
> >> + { 0x00600001, 0x20400022, 0x008d0240, 0x00000000 },
> >> + { 0x00600001, 0x206000e2, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x20600232, 0x00cf0283, 0x00000000 },
> >> + { 0x00000001, 0x20700062, 0x00000000, 0x11111111 },
> >> + { 0x08600031, 0x21801cdd, 0x00000000, 0x08184000 },
> >> + { 0x00000041, 0x24684521, 0x000000a2, 0x000000a1 },
> >> + { 0x00000040, 0x24684421, 0x00000468, 0x000000a0 },
> >> + { 0x00000001, 0x24740231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x20000022, 0x008d0460, 0x00000000 },
> >> + { 0x00000001, 0x20200022, 0x00000180, 0x00000000 },
> >> + { 0x00000001, 0x20240022, 0x00000190, 0x00000000 },
> >> + { 0x00000001, 0x20280022, 0x00000194, 0x00000000 },
> >> + { 0x00000001, 0x202c0022, 0x00000198, 0x00000000 },
> >> + { 0x05800031, 0x22001cdd, 0x00000000, 0x041b0003 },
> >> + { 0x00600001, 0x20000022, 0x008d0000, 0x00000000 },
> >> + { 0x07800031, 0x24001cc8, 0x00000000, 0x82000010 },
> >> diff --git a/src/shaders/vme_old/intra_frame.g7a b/src/shaders/vme_old/intra_frame.g7a
> >> new file mode 100644
> >> index 0000000..c43e739
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/intra_frame.g7a
> >> @@ -0,0 +1,2 @@
> >> +include(`gen7_vme_header.inc')
> >> +include(`intra_frame.asm')
> >> diff --git a/src/shaders/vme_old/intra_frame.g7b b/src/shaders/vme_old/intra_frame.g7b
> >> new file mode 100644
> >> index 0000000..cc063d8
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/intra_frame.g7b
> >> @@ -0,0 +1,47 @@
> >> + { 0x00800001, 0x24000061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24400061, 0x00000000, 0x00000000 },
> >> + { 0x00200041, 0x24002e25, 0x004500a0, 0x00100010 },
> >> + { 0x00000040, 0x24003ca5, 0x00000400, 0xfff8fff8 },
> >> + { 0x00000040, 0x24043ca5, 0x00000404, 0xffffffff },
> >> + { 0x00000001, 0x240800e1, 0x00000000, 0x0000001f },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0400, 0x00000000 },
> >> + { 0x04600031, 0x22401cb1, 0x00000800, 0x02190004 },
> >> + { 0x00200041, 0x24002e25, 0x004500a0, 0x00100010 },
> >> + { 0x00000040, 0x24003ca5, 0x00000400, 0xfffcfffc },
> >> + { 0x00000001, 0x240800e1, 0x00000000, 0x000f0003 },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0400, 0x00000000 },
> >> + { 0x04600031, 0x22801cb1, 0x00000800, 0x02290004 },
> >> + { 0x00200041, 0x24082e29, 0x004500a0, 0x00100010 },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0400, 0x00000000 },
> >> + { 0x00000001, 0x243c00f1, 0x00000000, 0x00000006 },
> >> + { 0x02000010, 0x20002e28, 0x000000a0, 0x00000000 },
> >> + { 0x00010040, 0x243d1e31, 0x0000043d, 0x00000060 },
> >> + { 0x02000010, 0x20002e28, 0x000000a1, 0x00000000 },
> >> + { 0x00010040, 0x243d1e31, 0x0000043d, 0x00000010 },
> >> + { 0x02000041, 0x20004628, 0x000000a0, 0x000000a1 },
> >> + { 0x00010040, 0x243d1e31, 0x0000043d, 0x00000004 },
> >> + { 0x00000040, 0x24402e2d, 0x000000a0, 0x00010001 },
> >> + { 0x00000040, 0x2440352d, 0x000000a2, 0x00004440 },
> >> + { 0x02000041, 0x200045a0, 0x00000440, 0x000000a1 },
> >> + { 0x00010040, 0x243d1e31, 0x0000043d, 0x00000008 },
> >> + { 0x00600001, 0x28200021, 0x008d0420, 0x00000000 },
> >> + { 0x00600001, 0x28400061, 0x00000000, 0x00000000 },
> >> + { 0x00600001, 0x28600021, 0x008d0240, 0x00000000 },
> >> + { 0x00600001, 0x288000e1, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x28800231, 0x00cf0283, 0x00000000 },
> >> + { 0x00000001, 0x28900061, 0x00000000, 0x11111111 },
> >> + { 0x08600031, 0x21801cbd, 0x00000800, 0x0a184001 },
> >> + { 0x00000041, 0x24684521, 0x000000a2, 0x000000a1 },
> >> + { 0x00000040, 0x24684421, 0x00000468, 0x000000a0 },
> >> + { 0x00000001, 0x24740231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0460, 0x00000000 },
> >> + { 0x00000001, 0x28200021, 0x00000180, 0x00000000 },
> >> + { 0x00000001, 0x28240021, 0x00000190, 0x00000000 },
> >> + { 0x00000001, 0x28280021, 0x00000194, 0x00000000 },
> >> + { 0x00000001, 0x282c0021, 0x00000198, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x040a0000 },
> >> + { 0x00600001, 0x28000021, 0x008d0000, 0x00000000 },
> >> + { 0x07800031, 0x24001ca8, 0x00000800, 0x82000010 },
> >> diff --git a/src/shaders/vme_old/intra_frame_haswell.asm b/src/shaders/vme_old/intra_frame_haswell.asm
> >> new file mode 100644
> >> index 0000000..64efd55
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/intra_frame_haswell.asm
> >> @@ -0,0 +1,160 @@
> >> +/*
> >> + * Copyright © <2010>, Intel Corporation.
> >> + *
> >> + * This program is licensed under the terms and conditions of the
> >> + * Eclipse Public License (EPL), version 1.0. The full text of the EPL is at
> >> + * http://www.opensource.org/licenses/eclipse-1.0.php.
> >> + *
> >> + */
> >> +// Modual name: IntraFrame.asm
> >> +//
> >> +// Make intra predition estimation for Intra frame
> >> +//
> >> +
> >> +//
> >> +// Now, begin source code....
> >> +//
> >> +
> >> +/*
> >> + * __START
> >> + */
> >> +__INTRA_START:
> >> +mov (16) tmp_reg0.0<1>:UD 0x0:UD {align1};
> >> +mov (16) tmp_reg2.0<1>:UD 0x0:UD {align1};
> >> +mov (16) tmp_reg4.0<1>:UD 0x0:UD {align1} ;
> >> +mov (16) tmp_reg6.0<1>:UD 0x0:UD {align1} ;
> >> +
> >> +shl (2) read0_header.0<1>:D orig_xy_ub<2,2,1>:UB 4:UW {align1}; /* (x, y) * 16 */
> >> +add (1) read0_header.0<1>:D read0_header.0<0,1,0>:D -8:W {align1}; /* X offset */
> >> +add (1) read0_header.4<1>:D read0_header.4<0,1,0>:D -1:W {align1}; /* Y offset */
> >> +mov (1) read0_header.8<1>:UD BLOCK_32X1 {align1};
> >> +mov (1) read0_header.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +
> >> +shl (2) read1_header.0<1>:D orig_xy_ub<2,2,1>:UB 4:UW {align1}; /* (x, y) * 16 */
> >> +add (1) read1_header.0<1>:D read1_header.0<0,1,0>:D -4:W {align1}; /* X offset */
> >> +mov (1) read1_header.8<1>:UD BLOCK_4X16 {align1};
> >> +mov (1) read1_header.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +
> >> +shl (2) vme_m0.8<1>:UW orig_xy_ub<2,2,1>:UB 4:UW {align1}; /* (x, y) * 16 */
> >> +mov (1) vme_m0.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +
> >> +mul (1) obw_m0.8<1>:UD w_in_mb_uw<0,1,0>:UW orig_y_ub<0,1,0>:UB {align1};
> >> +add (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD orig_x_ub<0,1,0>:UB {align1};
> >> +mul (1) obw_m0.8<1>:UD obw_m0.8<0,1,0>:UD 0x02:UD {align1};
> >> +mov (1) obw_m0.20<1>:UB thread_id_ub {align1}; /* dispatch id */
> >> +
> >> +/*
> >> + * Media Read Message -- fetch Luma neighbor edge pixels
> >> + */
> >> +/* ROW */
> >> +mov (8) msg_reg0.0<1>:UD read0_header.0<8,8,1>:UD {align1};
> >> +send (8) msg_ind INEP_ROW<1>:UB null read(BIND_IDX_INEP, 0, 0, 4) mlen 1 rlen 1 {align1};
> >> +
> >> +/* COL */
> >> +mov (8) msg_reg0.0<1>:UD read1_header.0<8,8,1>:UD {align1};
> >> +send (8) msg_ind INEP_COL0<1>:UB null read(BIND_IDX_INEP, 0, 0, 4) mlen 1 rlen 2 {align1};
> >> +
> >> +/* m2, get the MV/Mb cost passed by constant buffer
> >> +when creating EU thread by MEDIA_OBJECT */
> >> +mov (8) vme_msg_2<1>:UD r1.0<8,8,1>:UD {align1};
> >> +
> >> +/* m3 */
> >> +mov (8) vme_msg_3<1>:UD 0x0:UD {align1};
> >> +
> >> +/* m4 */
> >> +mov (1) INEP_ROW.0<1>:UD 0x0:UD {align1};
> >> +and (1) INEP_ROW.4<1>:UD INEP_ROW.4<0,1,0>:UD 0xFF000000:UD {align1};
> >> +mov (8) vme_msg_4<1>:UD INEP_ROW.0<8,8,1>:UD {align1};
> >> +
> >> +/* m5 */
> >> +mov (8) vme_msg_5<1>:UD 0x0:UD {align1};
> >> +mov (16) vme_msg_5.0<1>:UB INEP_COL0.3<32,8,4>:UB {align1};
> >> +mov (1) vme_msg_5.16<1>:UD INTRA_PREDICTORE_MODE {align1};
> >> +
> >> +/* the penalty for Intra mode */
> >> +mov (1) vme_msg_5.28<1>:UD 0x010101:UD {align1};
> >> +
> >> +
> >> +/* m6 */
> >> +
> >> +mov (8) vme_msg_6<1>:UD 0x0:UD {align1};
> >> +
> >> +/*
> >> + * VME message
> >> + */
> >> +/* m0 */
> >> +mov (8) vme_msg_0.0<1>:UD vme_m0.0<8,8,1>:UD {align1};
> >> +mov (1) tmp_reg0.0<1>:UW LUMA_INTRA_MODE:UW {align1};
> >> +/* Use the Luma mode */
> >> +mov (1) vme_msg_4.5<1>:UB tmp_reg0.0<0,1,0>:UB {align1};
> >> +
> >> +/* m1 */
> >> +mov (1) intra_flag<1>:UW 0x0:UW {align1} ;
> >> +and.z.f0.0 (1) null<1>:UW transform_8x8_ub<0,1,0>:UB 1:UW {align1};
> >> +(f0.0) mov (1) intra_part_mask_ub<1>:UB LUMA_INTRA_8x8_DISABLE {align1};
> >> +
> >> +/* assign MB intra struct from the thread payload*/
> >> +mov (1) mb_intra_struct_ub<1>:UB input_mb_intra_ub<0,1,0>:UB {align1};
> >> +
> >> +/* Disable DC HAAR component when calculating HARR SATD block */
> >> +mov (1) tmp_reg0.0<1>:UW DC_HARR_DISABLE:UW {align1};
> >> +mov (1) vme_m1.30<1>:UB tmp_reg0.0<0,1,0>:UB {align1};
> >> +
> >> +/* m0 */
> >> +mov (8) vme_msg_0.0<1>:UD vme_m0.0<8,8,1>:UD {align1};
> >> +mov (8) vme_msg_1<1>:UD vme_m1.0<8,8,1>:UD {align1};
> >> +
> >> +/* after verification it will be passed by using payload */
> >> +send (8)
> >> + vme_msg_ind
> >> + vme_wb<1>:UD
> >> + null
> >> + cre(
> >> + BIND_IDX_VME,
> >> + VME_SIC_MESSAGE_TYPE
> >> + )
> >> + mlen sic_vme_msg_length
> >> + rlen vme_wb_length
> >> + {align1};
> >> +/*
> >> + * Oword Block Write message
> >> + */
> >> +mov (8) msg_reg0.0<1>:UD obw_m0<8,8,1>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.0<1>:UD vme_wb.0<0,1,0>:UD {align1};
> >> +mov (1) msg_reg1.4<1>:UD vme_wb.16<0,1,0>:UD {align1};
> >> +mov (1) msg_reg1.8<1>:UD vme_wb.20<0,1,0>:UD {align1};
> >> +mov (1) msg_reg1.12<1>:UD vme_wb.24<0,1,0>:UD {align1};
> >> +
> >> +/* Distortion, Intra (17-16), */
> >> +mov (1) msg_reg1.16<1>:UW vme_wb.12<0,1,0>:UW {align1};
> >> +
> >> +mov (1) msg_reg1.20<1>:UD vme_wb.8<0,1,0>:UD {align1};
> >> +/* VME clock counts */
> >> +mov (1) msg_reg1.24<1>:UD vme_wb.28<0,1,0>:UD {align1};
> >> +
> >> +mov (1) msg_reg1.28<1>:UD obw_m0.8<0,1,0>:UD {align1};
> >> +
> >> +/* bind index 3, write 2 oword (32bytes), msg type: 8(OWord Block Write) */
> >> +send (16)
> >> + msg_ind
> >> + obw_wb
> >> + null
> >> + data_port(
> >> + OBW_CACHE_TYPE,
> >> + OBW_MESSAGE_TYPE,
> >> + OBW_CONTROL_2,
> >> + OBW_BIND_IDX,
> >> + OBW_WRITE_COMMIT_CATEGORY,
> >> + OBW_HEADER_PRESENT
> >> + )
> >> + mlen 2
> >> + rlen obw_wb_length
> >> + {align1};
> >> +
> >> +__EXIT:
> >> +/*
> >> + * kill thread
> >> + */
> >> +mov (8) ts_msg_reg0<1>:UD r0<8,8,1>:UD {align1};
> >> +send (16) ts_msg_ind acc0<1>UW null thread_spawner(0, 0, 1) mlen 1 rlen 0 {align1 EOT};
> >> diff --git a/src/shaders/vme_old/intra_frame_haswell.g75a b/src/shaders/vme_old/intra_frame_haswell.g75a
> >> new file mode 100644
> >> index 0000000..a690fdd
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/intra_frame_haswell.g75a
> >> @@ -0,0 +1,2 @@
> >> +#include "vme75.inc"
> >> +#include "intra_frame_haswell.asm"
> >> diff --git a/src/shaders/vme_old/intra_frame_haswell.g75b b/src/shaders/vme_old/intra_frame_haswell.g75b
> >> new file mode 100644
> >> index 0000000..5ae7a99
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/intra_frame_haswell.g75b
> >> @@ -0,0 +1,57 @@
> >> + { 0x00800001, 0x24000061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24400061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24800061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x24c00061, 0x00000000, 0x00000000 },
> >> + { 0x00200009, 0x24002e25, 0x004500a0, 0x00040004 },
> >> + { 0x00000040, 0x24003ca5, 0x00000400, 0xfff8fff8 },
> >> + { 0x00000040, 0x24043ca5, 0x00000404, 0xffffffff },
> >> + { 0x00000001, 0x240800e1, 0x00000000, 0x0000001f },
> >> + { 0x00000001, 0x24140231, 0x00000014, 0x00000000 },
> >> + { 0x00200009, 0x24202e25, 0x004500a0, 0x00040004 },
> >> + { 0x00000040, 0x24203ca5, 0x00000420, 0xfffcfffc },
> >> + { 0x00000001, 0x242800e1, 0x00000000, 0x000f0003 },
> >> + { 0x00000001, 0x24340231, 0x00000014, 0x00000000 },
> >> + { 0x00200009, 0x24482e29, 0x004500a0, 0x00040004 },
> >> + { 0x00000001, 0x24540231, 0x00000014, 0x00000000 },
> >> + { 0x00000041, 0x24884521, 0x000000a2, 0x000000a1 },
> >> + { 0x00000040, 0x24884421, 0x00000488, 0x000000a0 },
> >> + { 0x00000041, 0x24880c21, 0x00000488, 0x00000002 },
> >> + { 0x00000001, 0x24940231, 0x00000014, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0400, 0x00000000 },
> >> + { 0x04600031, 0x23801cb1, 0x00000800, 0x02190004 },
> >> + { 0x00600001, 0x28000021, 0x008d0420, 0x00000000 },
> >> + { 0x04600031, 0x23a01cb1, 0x00000800, 0x02290004 },
> >> + { 0x00600001, 0x28400021, 0x008d0020, 0x00000000 },
> >> + { 0x00600001, 0x28600061, 0x00000000, 0x00000000 },
> >> + { 0x00000001, 0x23800061, 0x00000000, 0x00000000 },
> >> + { 0x00000005, 0x23840c21, 0x00000384, 0xff000000 },
> >> + { 0x00600001, 0x28800021, 0x008d0380, 0x00000000 },
> >> + { 0x00600001, 0x28a00061, 0x00000000, 0x00000000 },
> >> + { 0x00800001, 0x28a00231, 0x00cf03a3, 0x00000000 },
> >> + { 0x00000001, 0x28b00061, 0x00000000, 0x11111111 },
> >> + { 0x00000001, 0x28bc0061, 0x00000000, 0x00010101 },
> >> + { 0x00600001, 0x28c00061, 0x00000000, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0440, 0x00000000 },
> >> + { 0x00000001, 0x24000169, 0x00000000, 0x00010001 },
> >> + { 0x00000001, 0x28850231, 0x00000400, 0x00000000 },
> >> + { 0x00000001, 0x247c0169, 0x00000000, 0x00000000 },
> >> + { 0x01000005, 0x20002e28, 0x000000a4, 0x00010001 },
> >> + { 0x00010001, 0x247c00f1, 0x00000000, 0x00000002 },
> >> + { 0x00000001, 0x247d0231, 0x000000a5, 0x00000000 },
> >> + { 0x00000001, 0x24000169, 0x00000000, 0x00200020 },
> >> + { 0x00000001, 0x247e0231, 0x00000400, 0x00000000 },
> >> + { 0x00600001, 0x28000021, 0x008d0440, 0x00000000 },
> >> + { 0x00600001, 0x28200021, 0x008d0460, 0x00000000 },
> >> + { 0x0d600031, 0x21801ca1, 0x00000800, 0x0e782000 },
> >> + { 0x00600001, 0x28000021, 0x008d0480, 0x00000000 },
> >> + { 0x00000001, 0x28200021, 0x00000180, 0x00000000 },
> >> + { 0x00000001, 0x28240021, 0x00000190, 0x00000000 },
> >> + { 0x00000001, 0x28280021, 0x00000194, 0x00000000 },
> >> + { 0x00000001, 0x282c0021, 0x00000198, 0x00000000 },
> >> + { 0x00000001, 0x28300129, 0x0000018c, 0x00000000 },
> >> + { 0x00000001, 0x28340021, 0x00000188, 0x00000000 },
> >> + { 0x00000001, 0x28380021, 0x0000019c, 0x00000000 },
> >> + { 0x00000001, 0x283c0021, 0x00000488, 0x00000000 },
> >> + { 0x0a800031, 0x20001cac, 0x00000800, 0x040a0203 },
> >> + { 0x00600001, 0x2e000021, 0x008d0000, 0x00000000 },
> >> + { 0x07800031, 0x24001ca8, 0x00000e00, 0x82000010 },
> >> diff --git a/src/shaders/vme_old/vme75.inc b/src/shaders/vme_old/vme75.inc
> >> new file mode 100644
> >> index 0000000..d48daa0
> >> --- /dev/null
> >> +++ b/src/shaders/vme_old/vme75.inc
> >> @@ -0,0 +1,268 @@
> >> +/*
> >> + * Copyright © <2010>, Intel Corporation.
> >> + *
> >> + * This program is licensed under the terms and conditions of the
> >> + * Eclipse Public License (EPL), version 1.0. The full text of the EPL is at
> >> + * http://www.opensource.org/licenses/eclipse-1.0.php.
> >> + *
> >> + */
> >> +// Modual name: ME_header.inc
> >> +//
> >> +// Global symbols define
> >> +//
> >> +
> >> +/*
> >> + * Constant
> >> + */
> >> +define(`VME_MESSAGE_TYPE_INTER', `1')
> >> +define(`VME_MESSAGE_TYPE_INTRA', `2')
> >> +define(`VME_MESSAGE_TYPE_MIXED', `3')
> >> +
> >> +define(`VME_SIC_MESSAGE_TYPE', `1')
> >> +define(`VME_IME_MESSAGE_TYPE', `2')
> >> +define(`VME_FBR_MESSAGE_TYPE', `3')
> >> +
> >> +define(`BLOCK_32X1', `0x0000001F')
> >> +define(`BLOCK_4X16', `0x000F0003')
> >> +define(`BLOCK_8X4', `0x00070003')
> >> +
> >> +define(`LUMA_INTRA_16x16_DISABLE', `0x1')
> >> +define(`LUMA_INTRA_8x8_DISABLE', `0x2')
> >> +define(`LUMA_INTRA_4x4_DISABLE', `0x4')
> >> +
> >> +define(`INTRA_PRED_AVAIL_FLAG_AE', `0x60')
> >> +define(`INTRA_PRED_AVAIL_FLAG_B', `0x10')
> >> +define(`INTRA_PRED_AVAIL_FLAG_C', `0x8')
> >> +define(`INTRA_PRED_AVAIL_FLAG_D', `0x4')
> >> +
> >> +define(`BIND_IDX_VME', `0')
> >> +define(`BIND_IDX_VME_REF0', `1')
> >> +define(`BIND_IDX_VME_REF1', `2')
> >> +define(`BIND_IDX_OUTPUT', `3')
> >> +define(`BIND_IDX_INEP', `4')
> >> +
> >> +define(`SUB_PEL_MODE_INTEGER', `0x00000000')
> >> +define(`SUB_PEL_MODE_HALF', `0x00001000')
> >> +define(`SUB_PEL_MODE_QUARTER', `0x00003000')
> >> +
> >> +define(`INTER_SAD_NONE', `0x00000000')
> >> +define(`INTER_SAD_HAAR', `0x00200000')
> >> +
> >> +define(`INTRA_SAD_NONE', `0x00000000')
> >> +define(`INTRA_SAD_HAAR', `0x00800000')
> >> +
> >> +define(`INTER_PART_MASK', `0x00000000')
> >> +
> >> +define(`SEARCH_CTRL_SINGLE', `0x00000000')
> >> +define(`SEARCH_CTRL_DUAL_START', `0x00000100')
> >> +define(`SEARCH_CTRL_DUAL_RECORD', `0x00000300')
> >> +define(`SEARCH_CTRL_DUAL_REFERENCE', `0x00000700')
> >> +
> >> +define(`REF_REGION_SIZE', `0x2830:UW')
> >> +
> >> +define(`BI_SUB_MB_PART_MASK', `0x0c000000')
> >> +define(`MAX_NUM_MV', `0x00000020')
> >> +define(`FB_PRUNING_ENABLE', `0x40000000')
> >> +
> >> +define(`SEARCH_PATH_LEN', `0x00003030')
> >> +define(`START_CENTER', `0x30000000')
> >> +
> >> +define(`ADAPTIVE_SEARCH_ENABLE', `0x00000002')
> >> +define(`INTRA_PREDICTORE_MODE', `0x11111111:UD')
> >> +
> >> +define(`INTER_VME_OUTPUT_IN_OWS', `10')
> >> +define(`INTER_VME_OUTPUT_MV_IN_OWS', `8')
> >> +
> >> +define(`INTRAMBFLAG_MASK', `0x00002000')
> >> +define(`MVSIZE_UW_BASE', `0x0040')
> >> +define(`MFC_MV32_BIT_SHIFT', `5')
> >> +define(`CBP_DC_YUV_UW', `0x000E')
> >> +
> >> +define(`DC_HARR_ENABLE', `0x0000')
> >> +define(`DC_HARR_DISABLE', `0x0020')
> >> +
> >> +define(`MV32_BIT_MASK', `0x0020')
> >> +define(`MV32_BIT_SHIFT', `5')
> >> +
> >> +define(`OBW_CACHE_TYPE', `10')
> >> +
> >> +
> >> +define(`OBW_MESSAGE_TYPE', `8')
> >> +
> >> +define(`OBW_BIND_IDX', `BIND_IDX_OUTPUT')
> >> +
> >> +define(`OBW_CONTROL_0', `0') /* 1 OWord, low 128 bits */
> >> +define(`OBW_CONTROL_1', `1') /* 1 OWord, high 128 bits */
> >> +define(`OBW_CONTROL_2', `2') /* 2 OWords */
> >> +define(`OBW_CONTROL_3', `3') /* 4 OWords */
> >> +define(`OBW_CONTROL_8', `4') /* 8 OWords */
> >> +
> >> +define(`FBR_BME_ENABLE', `0x00000000')
> >> +define(`FBR_BME_DISABLE', `0x00040000')
> >> +
> >> +define(`OBW_WRITE_COMMIT_CATEGORY', `0') /* category on Ivybridge */
> >> +
> >> +
> >> +define(`OBW_HEADER_PRESENT', `1')
> >> +
> >> +/* GRF registers
> >> + * r0 header
> >> + * r1~r4 constant buffer (reserved)
> >> + * r5 inline data
> >> + * r6~r11 reserved
> >> + * r12 write back of VME message
> >> + * r13 write back of Oword Block Write
> >> + */
> >> +/*
> >> + * GRF 0 -- header
> >> + */
> >> +define(`thread_id_ub', `r0.20<0,1,0>:UB') /* thread id in payload */
> >> +
> >> +/*
> >> + * GRF 1~4 -- Constant Buffer (reserved)
> >> + */
> >> +
> >> +/*
> >> + * GRF 5 -- inline data
> >> + */
> >> +define(`inline_reg0', `r5')
> >> +define(`w_in_mb_uw', `inline_reg0.2')
> >> +define(`orig_xy_ub', `inline_reg0.0')
> >> +define(`orig_x_ub', `inline_reg0.0') /* in macroblock */
> >> +define(`orig_y_ub', `inline_reg0.1')
> >> +define(`transform_8x8_ub', `inline_reg0.4')
> >> +define(`slice_edge_ub', `inline_reg0.4')
> >> +define(`num_macroblocks', `inline_reg0.6')
> >> +define(`input_mb_intra_ub', `inline_reg0.5')
> >> +
> >> +/*
> >> + * GRF 6~11 -- reserved
> >> + */
> >> +
> >> +/*
> >> + * GRF 12~15 -- write back for VME message
> >> + */
> >> +define(`vme_wb', `r12')
> >> +define(`vme_wb0', `r12')
> >> +define(`vme_wb1', `r13')
> >> +define(`vme_wb2', `r14')
> >> +define(`vme_wb3', `r15')
> >> +define(`vme_wb4', `r16')
> >> +define(`vme_wb5', `r17')
> >> +define(`vme_wb6', `r18')
> >> +define(`vme_ime_wb7', `r19')
> >> +define(`vme_ime_wb8', `r20')
> >> +define(`vme_ime_wb9', `r21')
> >> +define(`vme_ime_wb10', `r22')
> >> +
> >> +
> >> +/*
> >> + * GRF 24 -- write for VME output message
> >> + */
> >> +define(`obw_wb', `null<1>:W')
> >> +define(`obw_wb_length', `0')
> >> +
> >> +
> >> +/*
> >> + * GRF 28~30 -- Intra Neighbor Edge Pixels
> >> + */
> >> +define(`INEP_ROW', `r28')
> >> +define(`INEP_COL0', `r29')
> >> +define(`INEP_COL1', `r30')
> >> +
> >> +/*
> >> + * GRF 48~50 -- Chroma Neighbor Edge Pixels
> >> + */
> >> +define(`CHROMA_ROW', `r48')
> >> +define(`CHROMA_COL', `r49')
> >> +
> >> +/*
> >> + * temporary registers
> >> + */
> >> +define(`tmp_reg0', `r32')
> >> +define(`read0_header', `tmp_reg0')
> >> +define(`tmp_reg1', `r33')
> >> +define(`read1_header', `tmp_reg1')
> >> +define(`tmp_reg2', `r34')
> >> +define(`vme_m0', `tmp_reg2')
> >> +define(`tmp_reg3', `r35')
> >> +define(`vme_m1', `tmp_reg3')
> >> +define(`intra_flag', `vme_m1.28')
> >> +define(`intra_part_mask_ub', `vme_m1.28')
> >> +define(`mb_intra_struct_ub', `vme_m1.29')
> >> +define(`tmp_reg4', `r36')
> >> +define(`obw_m0', `tmp_reg4')
> >> +define(`tmp_reg5', `r37')
> >> +define(`obw_m1', `tmp_reg5')
> >> +define(`tmp_reg6', `r38')
> >> +define(`obw_m2', `tmp_reg6')
> >> +define(`tmp_reg7', `r39')
> >> +define(`obw_m3', `tmp_reg7')
> >> +define(`tmp_reg8', `r40')
> >> +define(`obw_m4', `tmp_reg8')
> >> +define(`tmp_reg9', `r41')
> >> +define(`tmp_x_w', `tmp_reg9.0')
> >> +define(`tmp_rega', `r42')
> >> +define(`tmp_ud0', `tmp_rega.0')
> >> +define(`tmp_ud1', `tmp_rega.4')
> >> +define(`tmp_ud2', `tmp_rega.8')
> >> +define(`tmp_ud3', `tmp_rega.12')
> >> +define(`tmp_uw0', `tmp_rega.0')
> >> +define(`tmp_uw1', `tmp_rega.2')
> >> +define(`tmp_uw2', `tmp_rega.4')
> >> +define(`tmp_uw3', `tmp_rega.6')
> >> +define(`tmp_uw4', `tmp_rega.8')
> >> +define(`tmp_uw5', `tmp_rega.10')
> >> +define(`tmp_uw6', `tmp_rega.12')
> >> +define(`tmp_uw7', `tmp_rega.14')
> >> +
> >> +define(`vme_m2', `r43')
> >> +/*
> >> + * MRF registers
> >> + */
> >> +
> >> +define(`msg_ind', `64')
> >> +define(`msg_reg0', `r64')
> >> +define(`msg_reg1', `r65')
> >> +define(`msg_reg2', `r66')
> >> +define(`msg_reg3', `r67')
> >> +define(`msg_reg4', `r68')
> >> +define(`msg_reg5', `r69')
> >> +define(`msg_reg6', `r70')
> >> +define(`msg_reg7', `r71')
> >> +define(`msg_reg8', `r72')
> >> +define(`msg_reg9', `r73')
> >> +
> >> +define(`ts_msg_ind', `112')
> >> +define(`ts_msg_reg0', `r112')
> >> +/*
> >> + * VME message payload
> >> + */
> >> +
> >> +define(`vme_intra_wb_length', `1')
> >> +define(`vme_wb_length', `7')
> >> +define(`sic_vme_msg_length', `7')
> >> +define(`fbr_vme_msg_length', `7')
> >> +define(`ime_vme_msg_length', `5')
> >> +
> >> +define(`vme_msg_ind', `msg_ind')
> >> +define(`vme_msg_0', `msg_reg0')
> >> +define(`vme_msg_1', `msg_reg1')
> >> +define(`vme_msg_2', `msg_reg2')
> >> +
> >> +define(`vme_msg_3', `msg_reg3')
> >> +define(`vme_msg_4', `msg_reg4')
> >> +
> >> +
> >> +define(`vme_msg_5', `msg_reg5')
> >> +define(`vme_msg_6', `msg_reg6')
> >> +define(`vme_msg_7', `msg_reg7')
> >> +define(`vme_msg_8', `msg_reg8')
> >> +define(`vme_msg_9', `msg_reg9')
> >> +
> >> +define(`BIND_IDX_CBCR', `6')
> >> +
> >> +
> >> +define(`LUMA_CHROMA_MODE', `0x0')
> >> +define(`LUMA_INTRA_MODE', `0x1')
> >> +define(`LUMA_INTRA_DISABLE', `0x2')
> >> --
> >> 1.7.9.5
> >>
> >> _______________________________________________
> >> Libva mailing list
> >> Libva at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/libva
> >
> >
> >
>
More information about the Libva
mailing list