[Beignet] [PATCH] Fix a bug in stack calculation.
Ruiling Song
ruiling.song at intel.com
Mon Aug 5 00:14:39 PDT 2013
1. the thread_id is located in r0.5[0-8], so we need to get the correct bits.
2. also, we don't need so much stack size, max_compute_unit have already
been treated as: #EU * max_thread_per_eu.
Signed-off-by: Ruiling Song <ruiling.song at intel.com>
---
backend/src/backend/gen_context.cpp | 2 +-
src/cl_command_queue_gen7.c | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/backend/src/backend/gen_context.cpp b/backend/src/backend/gen_context.cpp
index e33d8da..12cc104 100644
--- a/backend/src/backend/gen_context.cpp
+++ b/backend/src/backend/gen_context.cpp
@@ -118,7 +118,7 @@ namespace gbe
p->push();
p->curr.execWidth = 1;
p->curr.predicate = GEN_PREDICATE_NONE;
- p->SHR(GenRegister::ud1grf(126,0), GenRegister::ud1grf(0,5), GenRegister::immud(10));
+ p->AND(GenRegister::ud1grf(126,0), GenRegister::ud1grf(0,5), GenRegister::immud(0x1ff));
p->curr.execWidth = this->simdWidth;
p->SHL(stackptr, stackptr, GenRegister::immud(perLaneShift));
p->curr.execWidth = 1;
diff --git a/src/cl_command_queue_gen7.c b/src/cl_command_queue_gen7.c
index 048595c..8933213 100644
--- a/src/cl_command_queue_gen7.c
+++ b/src/cl_command_queue_gen7.c
@@ -180,7 +180,6 @@ cl_bind_stack(cl_gpgpu gpgpu, cl_kernel ker)
assert(offset >= 0);
stack_sz *= gbe_kernel_get_simd_width(ker->opaque);
stack_sz *= device->max_compute_unit;
- stack_sz *= device->max_thread_per_unit;
cl_gpgpu_set_stack(gpgpu, offset, stack_sz, cc_llc_l3);
}
--
1.7.9.5
More information about the Beignet
mailing list