[Beignet] [PATCH] Fix a bug in stack calculation.

Ruiling Song ruiling.song at intel.com
Mon Aug 5 00:14:39 PDT 2013


1. the thread_id is located in r0.5[0-8], so we need to get the correct bits.
2. also, we don't need so much stack size, max_compute_unit have already
   been treated as: #EU * max_thread_per_eu.

Signed-off-by: Ruiling Song <ruiling.song at intel.com>
---
 backend/src/backend/gen_context.cpp |    2 +-
 src/cl_command_queue_gen7.c         |    1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/backend/src/backend/gen_context.cpp b/backend/src/backend/gen_context.cpp
index e33d8da..12cc104 100644
--- a/backend/src/backend/gen_context.cpp
+++ b/backend/src/backend/gen_context.cpp
@@ -118,7 +118,7 @@ namespace gbe
     p->push();
       p->curr.execWidth = 1;
       p->curr.predicate = GEN_PREDICATE_NONE;
-      p->SHR(GenRegister::ud1grf(126,0), GenRegister::ud1grf(0,5), GenRegister::immud(10));
+      p->AND(GenRegister::ud1grf(126,0), GenRegister::ud1grf(0,5), GenRegister::immud(0x1ff));
       p->curr.execWidth = this->simdWidth;
       p->SHL(stackptr, stackptr, GenRegister::immud(perLaneShift));
       p->curr.execWidth = 1;
diff --git a/src/cl_command_queue_gen7.c b/src/cl_command_queue_gen7.c
index 048595c..8933213 100644
--- a/src/cl_command_queue_gen7.c
+++ b/src/cl_command_queue_gen7.c
@@ -180,7 +180,6 @@ cl_bind_stack(cl_gpgpu gpgpu, cl_kernel ker)
   assert(offset >= 0);
   stack_sz *= gbe_kernel_get_simd_width(ker->opaque);
   stack_sz *= device->max_compute_unit;
-  stack_sz *= device->max_thread_per_unit;
   cl_gpgpu_set_stack(gpgpu, offset, stack_sz, cc_llc_l3);
 }
 
-- 
1.7.9.5



More information about the Beignet mailing list