Mesa (master): i965: Account for poor address calculations in Haswell CS scratch size.
Kenneth Graunke
kwg at kemper.freedesktop.org
Sun Jun 12 08:00:59 UTC 2016
Module: Mesa
Branch: master
Commit: a7d029d3dfac1da2701be75ff4d1589ac562e916
URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=a7d029d3dfac1da2701be75ff4d1589ac562e916
Author: Kenneth Graunke <kenneth at whitecape.org>
Date: Thu Jun 9 16:11:46 2016 -0700
i965: Account for poor address calculations in Haswell CS scratch size.
Curro figured this out by investigating the simulator. Apparently
there's also a workaround in the Windows driver. I'm not sure it's
actually documented anywhere.
We were underallocating the scratch buffer by a factor of 128/70.
v2: Rename threads_per_subslice to scratch_ids_per_subslice
(suggested by Jordan Justen).
Cc: "12.0" <mesa-stable at lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
Reviewed-by: Francisco Jerez <currojerez at riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen at intel.com>
---
src/mesa/drivers/dri/i965/brw_cs.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_cs.c b/src/mesa/drivers/dri/i965/brw_cs.c
index c8598d6..22856b6 100644
--- a/src/mesa/drivers/dri/i965/brw_cs.c
+++ b/src/mesa/drivers/dri/i965/brw_cs.c
@@ -150,9 +150,28 @@ brw_codegen_cs_prog(struct brw_context *brw,
if (prog_data.base.total_scratch) {
const unsigned subslices = MAX2(brw->intelScreen->subslice_total, 1);
+
+ /* WaCSScratchSize:hsw
+ *
+ * Haswell's scratch space address calculation appears to be sparse
+ * rather than tightly packed. The Thread ID has bits indicating
+ * which subslice, EU within a subslice, and thread within an EU
+ * it is. There's a maximum of two slices and two subslices, so these
+ * can be stored with a single bit. Even though there are only 10 EUs
+ * per subslice, this is stored in 4 bits, so there's an effective
+ * maximum value of 16 EUs. Similarly, although there are only 7
+ * threads per EU, this is stored in a 3 bit number, giving an effective
+ * maximum value of 8 threads per EU.
+ *
+ * This means that we need to use 16 * 8 instead of 10 * 7 for the
+ * number of threads per subslice.
+ */
+ const unsigned scratch_ids_per_subslice =
+ brw->is_haswell ? 16 * 8 : brw->max_cs_threads;
+
brw_get_scratch_bo(brw, &brw->cs.base.scratch_bo,
prog_data.base.total_scratch *
- brw->max_cs_threads * subslices);
+ scratch_ids_per_subslice * subslices);
}
if (unlikely(INTEL_DEBUG & DEBUG_CS))
More information about the mesa-commit
mailing list