[Beignet] [PATCH 2/3] Implement constant buffer based on constant cache.
Zhigang Gong
zhigang.gong at linux.intel.com
Tue Sep 3 20:17:25 PDT 2013
Ruiling,
The original use of constant buffer (curbe payload) does confuse us
sometime. I think it's a good time point to change it.
I'm ok with your suggestion using curbe directly.
-----Original Message-----
From: Song, Ruiling [mailto:ruiling.song at intel.com]
Sent: Wednesday, September 4, 2013 11:00 AM
To: Zhigang Gong
Cc: beignet at lists.freedesktop.org
Subject: RE: [Beignet] [PATCH 2/3] Implement constant buffer based on
constant cache.
For the comment part, I will refine it.
About function name, I also wanted to use intel_gpgpu_alloc_constant_buffer
as beignet use constant_buffer at many places which refer to the curbe
buffer. so, I hesitated to use alloc_constant_buffer. I also noticed the
function name is too long :(
What about changing existing " cl_gpgpu_upload_constants " "
intel_gpgpu_load_constant_buffer " into "cl_gpgpu_upload_curbes" "
intel_gpgpu_load_curbe_buffer "
If it is OK, I will change it, so that the symbols do not introduce
misunderstanding.
Thanks!
Ruiling
-----Original Message-----
From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
Sent: Wednesday, September 04, 2013 10:42 AM
To: Song, Ruiling
Cc: beignet at lists.freedesktop.org
Subject: Re: [Beignet] [PATCH 2/3] Implement constant buffer based on
constant cache.
Nice patch, some minor comments as below.
On Tue, Sep 03, 2013 at 03:42:36PM +0800, Ruiling Song wrote:
> Currently, simply allocate enough graphics memory as constant memory
space.
> And bind it to bti 2. Constant cache read are backed by dword scatter
read.
> Different from other data port messages, the address need to be dword
> aligned, and the addresses are in units of dword.
>
> The constant address space data are placed in order: first global
> constant, then the constant buffer kernel argument.
>
> Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> ---
> backend/src/backend/context.cpp | 12 +----
> backend/src/backend/gen_insn_selection.cpp | 30 +++++++++++-
> backend/src/backend/gen_reg_allocation.cpp | 1 -
> backend/src/backend/program.h | 2 -
> backend/src/ir/profile.cpp | 2 -
> backend/src/ir/profile.hpp | 5 +-
> backend/src/llvm/llvm_gen_backend.cpp | 9 +---
> src/cl_command_queue.c | 18 --------
> src/cl_command_queue.h | 3 --
> src/cl_command_queue_gen7.c | 69
++++++++++++++++++++++++----
> src/cl_driver.h | 3 ++
> src/cl_driver_defs.c | 1 +
> src/cl_gt_device.h | 2 +-
> src/cl_kernel.c | 10 ----
> src/intel/intel_driver.c | 2 +-
> src/intel/intel_gpgpu.c | 42 ++++++++++++++++-
> 16 files changed, 138 insertions(+), 73 deletions(-)
>
> diff --git a/backend/src/backend/context.cpp
> b/backend/src/backend/context.cpp index 5484869..ac3a243 100644
> --- a/backend/src/backend/context.cpp
> +++ b/backend/src/backend/context.cpp
> @@ -458,15 +458,6 @@ namespace gbe
> }
> });
> #undef INSERT_REG
> - this->newCurbeEntry(GBE_CURBE_GLOBAL_CONSTANT_OFFSET, 0,
sizeof(int));
> - specialRegs.insert(ir::ocl::constoffst);
> -
> - // Insert serialized global constant arrays if used
> - const ir::ConstantSet& constantSet = unit.getConstantSet();
> - if (constantSet.getConstantNum()) {
> - size_t size = constantSet.getDataSize();
> - this->newCurbeEntry(GBE_CURBE_GLOBAL_CONSTANT_DATA, 0, size);
> - }
>
> // Insert the number of threads
> this->newCurbeEntry(GBE_CURBE_THREAD_NUM, 0, sizeof(uint32_t));
> @@ -640,8 +631,7 @@ namespace gbe
> reg == ir::ocl::goffset0 ||
> reg == ir::ocl::goffset1 ||
> reg == ir::ocl::goffset2 ||
> - reg == ir::ocl::workdim ||
> - reg == ir::ocl::constoffst)
> + reg == ir::ocl::workdim)
> return true;
> return false;
> }
> diff --git a/backend/src/backend/gen_insn_selection.cpp
> b/backend/src/backend/gen_insn_selection.cpp
> index 1e81dac..f6f28ff 100644
> --- a/backend/src/backend/gen_insn_selection.cpp
> +++ b/backend/src/backend/gen_insn_selection.cpp
> @@ -2101,6 +2101,23 @@ namespace gbe
> sel.UNTYPED_READ(addr, dst.data(), valueNum, bti);
> }
>
> + void emitDWordGather(Selection::Opaque &sel,
> + const ir::LoadInstruction &insn,
> + GenRegister addr,
> + uint32_t bti) const
> + {
> + using namespace ir;
> + const uint32_t valueNum = insn.getValueNum();
> + const uint32_t simdWidth = sel.ctx.getSimdWidth();
> + GBE_ASSERT(valueNum == 1);
> + GenRegister dst = GenRegister::retype(sel.selReg(insn.getValue(0)),
GEN_TYPE_F);
> + // get dword based address
> + GenRegister addrDW = GenRegister::udxgrf(simdWidth,
sel.reg(FAMILY_DWORD));
> + sel.SHR(addrDW, GenRegister::retype(addr, GEN_TYPE_UD),
> + GenRegister::immud(2));
> +
> + sel.DWORD_GATHER(dst, addrDW, bti);
> + }
> +
> void emitRead64(Selection::Opaque &sel,
> const ir::LoadInstruction &insn,
> GenRegister addr, @@ -2171,8 +2188,17 @@
> namespace gbe
> GBE_ASSERT(sel.ctx.isScalarReg(insn.getValue(0)) == false);
> const Type type = insn.getValueType();
> const uint32_t elemSize = getByteScatterGatherSize(type);
> - if (insn.getAddressSpace() == MEM_CONSTANT)
> - this->emitIndirectMove(sel, insn, address);
> + if (insn.getAddressSpace() == MEM_CONSTANT) {
> + // XXX read 64bit constant through constant cache
It's better to add TODO in the above comment. And if you can point out:
why for the short/char data type, it can't be accessed from constant cache,
it will be a little bit clearer.
> + if(insn.isAligned() == true && elemSize ==
GEN_BYTE_SCATTER_QWORD)
> + this->emitRead64(sel, insn, address, 0x2);
> + else if(insn.isAligned() == true && elemSize ==
GEN_BYTE_SCATTER_DWORD)
> + this->emitDWordGather(sel, insn, address, 0x2);
> + else {
> + const GenRegister value = sel.selReg(insn.getValue(0));
> + this->emitByteGather(sel, insn, elemSize, address, value, 0x2);
> + }
> + }
> else if (insn.isAligned() == true && elemSize ==
GEN_BYTE_SCATTER_QWORD)
> this->emitRead64(sel, insn, address, space == MEM_LOCAL ? 0xfe :
0x00);
> else if (insn.isAligned() == true && elemSize ==
> GEN_BYTE_SCATTER_DWORD) diff --git
> a/backend/src/backend/gen_reg_allocation.cpp
> b/backend/src/backend/gen_reg_allocation.cpp
> index 0bb75a2..2abfb12 100644
> --- a/backend/src/backend/gen_reg_allocation.cpp
> +++ b/backend/src/backend/gen_reg_allocation.cpp
> @@ -573,7 +573,6 @@ namespace gbe
> allocatePayloadReg(GBE_CURBE_GROUP_NUM_Z, ocl::numgroup2);
> allocatePayloadReg(GBE_CURBE_STACK_POINTER, ocl::stackptr);
> allocatePayloadReg(GBE_CURBE_THREAD_NUM, ocl::threadn);
> - allocatePayloadReg(GBE_CURBE_GLOBAL_CONSTANT_OFFSET,
ocl::constoffst);
>
> // Group and barrier IDs are always allocated by the hardware in r0
> RA.insert(std::make_pair(ocl::groupid0, 1*sizeof(float))); //
> r0.1 diff --git a/backend/src/backend/program.h
> b/backend/src/backend/program.h index d20e7af..ff4d157 100644
> --- a/backend/src/backend/program.h
> +++ b/backend/src/backend/program.h
> @@ -70,8 +70,6 @@ enum gbe_curbe_type {
> GBE_CURBE_GROUP_NUM_Y,
> GBE_CURBE_GROUP_NUM_Z,
> GBE_CURBE_WORK_DIM,
> - GBE_CURBE_GLOBAL_CONSTANT_OFFSET,
> - GBE_CURBE_GLOBAL_CONSTANT_DATA,
> GBE_CURBE_IMAGE_INFO,
> GBE_CURBE_STACK_POINTER,
> GBE_CURBE_KERNEL_ARGUMENT,
> diff --git a/backend/src/ir/profile.cpp b/backend/src/ir/profile.cpp
> index 675018a..927e43d 100644
> --- a/backend/src/ir/profile.cpp
> +++ b/backend/src/ir/profile.cpp
> @@ -40,7 +40,6 @@ namespace ir {
> "stack_pointer",
> "block_ip",
> "barrier_id", "thread_number",
> - "const_curbe_offset",
> "work_dimension",
> };
>
> @@ -76,7 +75,6 @@ namespace ir {
> DECL_NEW_REG(FAMILY_WORD, blockip);
> DECL_NEW_REG(FAMILY_DWORD, barrierid);
> DECL_NEW_REG(FAMILY_DWORD, threadn);
> - DECL_NEW_REG(FAMILY_DWORD, constoffst);
> DECL_NEW_REG(FAMILY_DWORD, workdim);
> }
> #undef DECL_NEW_REG
> diff --git a/backend/src/ir/profile.hpp b/backend/src/ir/profile.hpp
> index 4b0ef5e..c79bc3b 100644
> --- a/backend/src/ir/profile.hpp
> +++ b/backend/src/ir/profile.hpp
> @@ -63,9 +63,8 @@ namespace ir {
> static const Register blockip = Register(19); // blockip
> static const Register barrierid = Register(20);// barrierid
> static const Register threadn = Register(21); // number of threads
> - static const Register constoffst = Register(22); // offset of global
constant array's curbe
> - static const Register workdim = Register(23); // work dimention.
> - static const uint32_t regNum = 24; // number of special
registers
> + static const Register workdim = Register(22); // work dimention.
> + static const uint32_t regNum = 23; // number of special
registers
> extern const char *specialRegMean[]; // special register
name.
> } /* namespace ocl */
>
> diff --git a/backend/src/llvm/llvm_gen_backend.cpp
> b/backend/src/llvm/llvm_gen_backend.cpp
> index 12d809d..e747d00 100644
> --- a/backend/src/llvm/llvm_gen_backend.cpp
> +++ b/backend/src/llvm/llvm_gen_backend.cpp
> @@ -1243,12 +1243,7 @@ namespace gbe
> ir::Register reg = ctx.reg(ir::RegisterFamily::FAMILY_DWORD);
> ir::Constant &con = unit.getConstantSet().getConstant(j ++);
> con.setReg(reg.value());
> - if(con.getOffset() != 0) {
> - ctx.LOADI(ir::TYPE_S32, reg,
ctx.newIntegerImmediate(con.getOffset(), ir::TYPE_S32));
> - ctx.ADD(ir::TYPE_S32, reg, ir::ocl::constoffst, reg);
> - } else {
> - ctx.MOV(ir::TYPE_S32, reg, ir::ocl::constoffst);
> - }
> + ctx.LOADI(ir::TYPE_S32, reg,
> + ctx.newIntegerImmediate(con.getOffset(), ir::TYPE_S32));
> }
>
> // Visit all the instructions and emit the IR registers or the
> value to @@ -2407,7 +2402,7 @@ namespace gbe
> const ir::Type type = getType(ctx, elemType);
> const ir::RegisterFamily pointerFamily =
> ctx.getPointerFamily();
>
> - if (type == ir::TYPE_FLOAT || type == ir::TYPE_U32 || type ==
ir::TYPE_S32) {
> + if ((type == ir::TYPE_FLOAT || type == ir::TYPE_U32 || type ==
> + ir::TYPE_S32) && addrSpace != ir::MEM_CONSTANT) {
> // One message is enough here. Nothing special to do
> if (elemNum <= 4) {
> // Build the tuple data in the vector diff --git
> a/src/cl_command_queue.c b/src/cl_command_queue.c index
> 9606d6b..2454db6 100644
> --- a/src/cl_command_queue.c
> +++ b/src/cl_command_queue.c
> @@ -150,24 +150,6 @@ cl_command_queue_bind_surface(cl_command_queue queue,
cl_kernel k)
> return CL_SUCCESS;
> }
>
> -LOCAL cl_int cl_command_queue_upload_constant_buffer(cl_kernel k,
> - char * dst)
> -{
> - int i;
> - for(i = 0; i < k->arg_n; i++) {
> - enum gbe_arg_type arg_type = gbe_kernel_get_arg_type(k->opaque, i);
> -
> - if(arg_type == GBE_ARG_CONSTANT_PTR && k->args[i].mem) {
> - uint32_t offset = gbe_kernel_get_curbe_offset(k->opaque,
GBE_CURBE_EXTRA_ARGUMENT, i+GBE_CONSTANT_BUFFER);
> - cl_mem mem = k->args[i].mem;
> - cl_buffer_map(mem->bo, 1);
> - void * addr = cl_buffer_get_virtual(mem->bo);
> - memcpy(dst + offset, addr, mem->size);
> - cl_buffer_unmap(mem->bo);
> - }
> - }
> - return CL_SUCCESS;
> -}
>
> #if USE_FULSIM
> extern void drm_intel_bufmgr_gem_stop_aubfile(cl_buffer_mgr);
> diff --git a/src/cl_command_queue.h b/src/cl_command_queue.h index
> 135d659..9fe1dd1 100644
> --- a/src/cl_command_queue.h
> +++ b/src/cl_command_queue.h
> @@ -76,8 +76,5 @@ extern cl_int
> cl_command_queue_bind_surface(cl_command_queue, cl_kernel);
>
> /* Bind all the image surfaces in the GPGPU state */ extern cl_int
> cl_command_queue_bind_image(cl_command_queue, cl_kernel);
> -
> -/*update constant buffer to final curbe */ -extern cl_int
> cl_command_queue_upload_constant_buffer(cl_kernel k, char * dst);
> #endif /* __CL_COMMAND_QUEUE_H__ */
>
> diff --git a/src/cl_command_queue_gen7.c b/src/cl_command_queue_gen7.c
> index 1d415d4..53a4711 100644
> --- a/src/cl_command_queue_gen7.c
> +++ b/src/cl_command_queue_gen7.c
> @@ -95,6 +95,62 @@ error:
> return err;
> }
>
> +static void
> +cl_upload_constant_buffer(cl_command_queue queue, cl_kernel ker) {
> + /* calculate constant buffer size */
> + int32_t arg;
> + size_t offset;
> + gbe_program prog = ker->program->opaque;
> + const int32_t arg_n = gbe_kernel_get_arg_num(ker->opaque);
> + size_t global_const_size =
> +gbe_program_get_global_constant_size(prog);
> + uint32_t constant_buf_size = 0;
> + for (arg = 0; arg < arg_n; ++arg) {
> + const enum gbe_arg_type type = gbe_kernel_get_arg_type(ker->opaque,
arg);
> + if (type == GBE_ARG_CONSTANT_PTR && ker->args[arg].mem) {
> + cl_mem mem = ker->args[arg].mem;
> + constant_buf_size += ALIGN(mem->size, 4);
> + }
> + }
> + if(global_const_size == 0 && constant_buf_size == 0)
> + return;
> +
> + cl_buffer bo =
> + cl_gpgpu_alloc_constant_address_space_buffer(queue->gpgpu,
> + constant_buf_size + global_const_size + 4); cl_buffer_map(bo, 1);
> + char * cst_addr = cl_buffer_get_virtual(bo); offset = 0; if
> + (global_const_size > 0) {
> + /* Write the global constant arrays */
> + gbe_program_get_global_constant_data(prog,
> + (char*)(cst_addr+offset)); } offset += ALIGN(global_const_size,
> + 4);
> +
> + if(global_const_size == 0) {
> + /* reserve 4 bytes to get rid of 0 address */
> + offset += 4;
> + }
> +
> + /* upload constant buffer argument */ int32_t curbe_offset = 0;
> + for (arg = 0; arg < arg_n; ++arg) {
> + const enum gbe_arg_type type = gbe_kernel_get_arg_type(ker->opaque,
arg);
> + if (type == GBE_ARG_CONSTANT_PTR && ker->args[arg].mem) {
> + cl_mem mem = ker->args[arg].mem;
> +
> + curbe_offset = gbe_kernel_get_curbe_offset(ker->opaque,
GBE_CURBE_KERNEL_ARGUMENT, arg);
> + assert(curbe_offset >= 0);
> + *(uint32_t *) (ker->curbe + curbe_offset) = offset;
> +
> + cl_buffer_map(mem->bo, 1);
> + void * addr = cl_buffer_get_virtual(mem->bo);
> + memcpy(cst_addr + offset, addr, mem->size);
> + cl_buffer_unmap(mem->bo);
> + offset += ALIGN(mem->size, 4);
> + }
> + }
> + cl_buffer_unmap(bo);
> +}
> +
> /* Will return the total amount of slm used */ static int32_t
> cl_curbe_fill(cl_kernel ker, @@ -122,7 +178,6 @@
> cl_curbe_fill(cl_kernel ker,
> UPLOAD(GBE_CURBE_GROUP_NUM_Z, global_wk_sz[2]/local_wk_sz[2]);
> UPLOAD(GBE_CURBE_THREAD_NUM, thread_n);
> UPLOAD(GBE_CURBE_WORK_DIM, work_dim);
> - UPLOAD(GBE_CURBE_GLOBAL_CONSTANT_OFFSET,
> gbe_kernel_get_curbe_offset(ker->opaque,
> GBE_CURBE_GLOBAL_CONSTANT_DATA, 0) + 32); #undef UPLOAD
>
> /* Write identity for the stack pointer. This is required by the
> stack pointer @@ -134,14 +189,6 @@ cl_curbe_fill(cl_kernel ker,
> int32_t i;
> for (i = 0; i < (int32_t) simd_sz; ++i) stackptr[i] = i;
> }
> -
> - /* Write global constant arrays */
> - if ((offset = gbe_kernel_get_curbe_offset(ker->opaque,
GBE_CURBE_GLOBAL_CONSTANT_DATA, 0)) >= 0) {
> - /* Write the global constant arrays */
> - gbe_program prog = ker->program->opaque;
> - gbe_program_get_global_constant_data(prog, ker->curbe + offset);
> - }
> -
> /* Handle the various offsets to SLM */
> const int32_t arg_n = gbe_kernel_get_arg_num(ker->opaque);
> int32_t arg, slm_offset = 0;
> @@ -242,6 +289,9 @@ cl_command_queue_ND_range_gen7(cl_command_queue queue,
> cl_setup_scratch(gpgpu, ker);
> /* Bind a stack if needed */
> cl_bind_stack(gpgpu, ker);
> +
> + cl_upload_constant_buffer(queue, ker);
> +
> cl_gpgpu_states_setup(gpgpu, &kernel);
>
> /* Curbe step 2. Give the localID and upload it to video memory */
> @@ -250,7 +300,6 @@ cl_command_queue_ND_range_gen7(cl_command_queue queue,
> TRY_ALLOC (final_curbe, (char*) alloca(thread_n * cst_sz));
> for (i = 0; i < thread_n; ++i) {
> memcpy(final_curbe + cst_sz * i, ker->curbe, cst_sz);
> - cl_command_queue_upload_constant_buffer(ker, final_curbe + cst_sz
* i);
> }
> TRY (cl_set_varying_payload, ker, final_curbe, local_wk_sz, simd_sz,
cst_sz, thread_n);
> cl_gpgpu_upload_constants(gpgpu, final_curbe, thread_n*cst_sz);
> diff --git a/src/cl_driver.h b/src/cl_driver.h index 0ce03fe..0977bc3
> 100644
> --- a/src/cl_driver.h
> +++ b/src/cl_driver.h
> @@ -161,6 +161,9 @@ extern cl_gpgpu_set_perf_counters_cb
> *cl_gpgpu_set_perf_counters; typedef void
> (cl_gpgpu_upload_constants_cb)(cl_gpgpu, const void* data, uint32_t
> size); extern cl_gpgpu_upload_constants_cb
> *cl_gpgpu_upload_constants;
>
> +typedef cl_buffer
> +(cl_gpgpu_alloc_constant_address_space_buffer_cb)(cl_gpgpu, uint32_t
> +size); extern cl_gpgpu_alloc_constant_address_space_buffer_cb
> +*cl_gpgpu_alloc_constant_address_space_buffer;
> +
> /* Setup all indirect states */
> typedef void (cl_gpgpu_states_setup_cb)(cl_gpgpu, cl_gpgpu_kernel
> *kernel); extern cl_gpgpu_states_setup_cb *cl_gpgpu_states_setup;
> diff --git a/src/cl_driver_defs.c b/src/cl_driver_defs.c index
> 7c4c866..d69e4af 100644
> --- a/src/cl_driver_defs.c
> +++ b/src/cl_driver_defs.c
> @@ -54,6 +54,7 @@ LOCAL cl_gpgpu_set_stack_cb *cl_gpgpu_set_stack =
> NULL; LOCAL cl_gpgpu_set_scratch_cb *cl_gpgpu_set_scratch = NULL;
> LOCAL cl_gpgpu_bind_image_cb *cl_gpgpu_bind_image = NULL; LOCAL
> cl_gpgpu_state_init_cb *cl_gpgpu_state_init = NULL;
> +LOCAL cl_gpgpu_alloc_constant_address_space_buffer_cb *
> +cl_gpgpu_alloc_constant_address_space_buffer = NULL;
> LOCAL cl_gpgpu_set_perf_counters_cb *cl_gpgpu_set_perf_counters =
> NULL; LOCAL cl_gpgpu_upload_constants_cb *cl_gpgpu_upload_constants =
> NULL; LOCAL cl_gpgpu_states_setup_cb *cl_gpgpu_states_setup = NULL;
> diff --git a/src/cl_gt_device.h b/src/cl_gt_device.h index
> f58e1fd..db4daa3 100644
> --- a/src/cl_gt_device.h
> +++ b/src/cl_gt_device.h
> @@ -51,7 +51,7 @@
> .single_fp_config = 0, /* XXX */
> .global_mem_cache_type = CL_READ_WRITE_CACHE, .global_mem_size = 4,
> -.max_constant_buffer_size = 64 << 10,
> +.max_constant_buffer_size = 512 << 10,
> .max_constant_args = 8,
> .error_correction_support = CL_FALSE, .host_unified_memory =
> CL_FALSE, diff --git a/src/cl_kernel.c b/src/cl_kernel.c index
> 12a08c5..4ba1c11 100644
> --- a/src/cl_kernel.c
> +++ b/src/cl_kernel.c
> @@ -186,16 +186,6 @@ cl_kernel_set_arg(cl_kernel k, cl_uint index,
> size_t sz, const void *value)
>
> mem = *(cl_mem*) value;
>
> - if(arg_type == GBE_ARG_CONSTANT_PTR) {
> - int32_t cbOffset;
> - cbOffset = gbe_kernel_set_const_buffer_size(k->opaque, index,
mem->size);
> - //constant ptr's curbe offset changed, update it
> - if(cbOffset >= 0) {
> - offset = gbe_kernel_get_curbe_offset(k->opaque,
GBE_CURBE_KERNEL_ARGUMENT, index);
> - *((uint32_t *)(k->curbe + offset)) = cbOffset; //cb offset in
curbe
> - }
> - }
> -
> cl_mem_add_ref(mem);
> if (k->args[index].mem)
> cl_mem_delete(k->args[index].mem);
> diff --git a/src/intel/intel_driver.c b/src/intel/intel_driver.c index
> 9959447..ef6e6c3 100644
> --- a/src/intel/intel_driver.c
> +++ b/src/intel/intel_driver.c
> @@ -380,7 +380,7 @@ cl_intel_driver_new(cl_context_prop props)
> /* We use the first 2 slots(0,1) for all the bufs.
should modify the above comments. Now we use 3 slots, 0,1 for normal
buffers an 2 for constant buffer.
> * Notify the gbe this base index, thus gbe can avoid conflicts
> * when it allocates slots for images*/
> - gbe_set_image_base_index(2);
> + gbe_set_image_base_index(3);
> exit:
> return driver;
> error:
> diff --git a/src/intel/intel_gpgpu.c b/src/intel/intel_gpgpu.c index
> 1301b66..eb15daa 100644
> --- a/src/intel/intel_gpgpu.c
> +++ b/src/intel/intel_gpgpu.c
> @@ -96,6 +96,7 @@ struct intel_gpgpu
> struct { drm_intel_bo *bo; } sampler_state_b;
> struct { drm_intel_bo *bo; } perf_b;
> struct { drm_intel_bo *bo; } scratch_b;
> + struct { drm_intel_bo *bo; } constant_b;
>
> uint32_t per_thread_scratch;
> struct {
> @@ -138,6 +139,9 @@ intel_gpgpu_delete(intel_gpgpu_t *gpgpu)
> if (gpgpu->scratch_b.bo)
> drm_intel_bo_unreference(gpgpu->scratch_b.bo);
>
> + if(gpgpu->constant_b.bo)
> + drm_intel_bo_unreference(gpgpu->constant_b.bo);
> +
> intel_batchbuffer_delete(gpgpu->batch);
> cl_free(gpgpu);
> }
> @@ -391,7 +395,7 @@ intel_gpgpu_state_init(intel_gpgpu_t *gpgpu,
> /* Binded buffers */
> gpgpu->binded_n = 0;
> gpgpu->img_bitmap = 0;
> - gpgpu->img_index_base = 2;
> + gpgpu->img_index_base = 3;
> gpgpu->sampler_bitmap = ~((1 << max_sampler_n) - 1);
>
> /* URB */
> @@ -404,7 +408,7 @@ intel_gpgpu_state_init(intel_gpgpu_t *gpgpu,
> dri_bo_unreference(gpgpu->curbe_b.bo);
> uint32_t size_cb = gpgpu->urb.num_cs_entries * gpgpu->urb.size_cs_entry
* 64;
> size_cb = ALIGN(size_cb, 4096);
> - bo = dri_bo_alloc(gpgpu->drv->bufmgr, "CONSTANT_BUFFER", size_cb,
> 64);
> + bo = dri_bo_alloc(gpgpu->drv->bufmgr, "CURBE_BUFFER", size_cb, 64);
> assert(bo);
> gpgpu->curbe_b.bo = bo;
>
> @@ -468,6 +472,39 @@ intel_gpgpu_set_buf_reloc_gen7(intel_gpgpu_t *gpgpu,
int32_t index, dri_bo* obj_
> obj_bo);
> }
>
> +static dri_bo*
> +intel_gpgpu_alloc_constant_address_space_buffer(intel_gpgpu_t *gpgpu,
> +uint32_t size)
How about just use intel_gpgpu_alloc_constant_buffer? The above function
name is a little bit long for me.
--Zhigang
> +{
> + uint32_t s = size - 1;
> + assert(size != 0);
> +
> + surface_heap_t *heap = gpgpu->surface_heap_b.bo->virtual;
> + gen7_surface_state_t *ss2 = (gen7_surface_state_t *)
> + heap->surface[2]; memset(ss2, 0, sizeof(gen7_surface_state_t));
> + ss2->ss0.surface_type = I965_SURFACE_BUFFER; ss0.surface_format =
> + ss2->I965_SURFACEFORMAT_RAW;
> + ss2->ss2.width = s & 0x7f; /* bits 6:0 of sz */
> + ss2->ss2.height = (s >> 7) & 0x3fff; /* bits 20:7 of sz */
> + ss2->ss3.depth = (s >> 21) & 0x3ff; /* bits 30:21 of sz */
> + ss2->ss5.cache_control = cc_llc_l3; heap->binding_table[2] =
> + offsetof(surface_heap_t, surface) + 2* sizeof(gen7_surface_state_t);
> +
> + if(gpgpu->constant_b.bo)
> + dri_bo_unreference(gpgpu->constant_b.bo);
> + gpgpu->constant_b.bo = drm_intel_bo_alloc(gpgpu->drv->bufmgr,
> +"CONSTANT_BUFFER", s, 64);
> + assert(gpgpu->constant_b.bo);
> + ss2->ss1.base_addr = gpgpu->constant_b.bo->offset;
> + dri_bo_emit_reloc(gpgpu->surface_heap_b.bo,
> + I915_GEM_DOMAIN_RENDER,
> + I915_GEM_DOMAIN_RENDER,
> + 0,
> + heap->binding_table[2] +
> + offsetof(gen7_surface_state_t, ss1),
> + gpgpu->constant_b.bo);
> + return gpgpu->constant_b.bo;
> +}
> +
> +
> /* Map address space with two 2GB surfaces. One surface for untyped
message and
> * one surface for byte scatters / gathers. Actually the HW does not
require two
> * surfaces but Fulsim complains
> @@ -926,6 +963,7 @@ intel_set_gpgpu_callbacks(void)
> cl_gpgpu_state_init = (cl_gpgpu_state_init_cb *)
intel_gpgpu_state_init;
> cl_gpgpu_set_perf_counters = (cl_gpgpu_set_perf_counters_cb *)
intel_gpgpu_set_perf_counters;
> cl_gpgpu_upload_constants = (cl_gpgpu_upload_constants_cb *)
> intel_gpgpu_upload_constants;
> + cl_gpgpu_alloc_constant_address_space_buffer =
> + (cl_gpgpu_alloc_constant_address_space_buffer_cb *)
> + intel_gpgpu_alloc_constant_address_space_buffer;
> cl_gpgpu_states_setup = (cl_gpgpu_states_setup_cb *)
intel_gpgpu_states_setup;
> cl_gpgpu_upload_samplers = (cl_gpgpu_upload_samplers_cb *)
intel_gpgpu_upload_samplers;
> cl_gpgpu_batch_reset = (cl_gpgpu_batch_reset_cb *)
> intel_gpgpu_batch_reset;
> --
> 1.7.9.5
>
> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet
More information about the Beignet
mailing list