[Mesa-dev] [PATCH v2 11/30] i965/fs: add shuffle_32bit_load_result_to_64bit_data helper
Samuel Iglesias Gonsálvez
siglesias at igalia.com
Thu May 12 11:35:48 UTC 2016
From: Iago Toral Quiroga <itoral at igalia.com>
There will be a few places where we need to shuffle the result of a 32-bit
load into valid 64-bit data, so extract this logic into a separate helper
that we can reuse.
The shuffling needs to operate with WE_all set because we are changing the
layout of the data across the various channels. Otherwise we will run into
problems in non-uniform control-flow scenarios.
v2 (Curro):
- Use subscript() instead of stride()
- Assert on the input types rather than retyping.
- Use offset() instead of horiz_offset(), drop the multiplier definition.
- Do not use a temporary for the writes and drop force_writemask_all.
- Mark component_i as const.
- Make the function name lower case.
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 53 ++++++++++++++++++++++++++++++++++++
src/mesa/drivers/dri/i965/brw_fs.h | 5 ++++
2 files changed, 58 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 15d5759..4827dea 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -212,6 +212,59 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder &bld,
}
/**
+ * This helper takes the result of a load operation that reads 32-bit elements
+ * in this format:
+ *
+ * x x x x x x x x
+ * y y y y y y y y
+ * z z z z z z z z
+ * w w w w w w w w
+ *
+ * and shuffles the data to get this:
+ *
+ * x y x y x y x y
+ * x y x y x y x y
+ * z w z w z w z w
+ * z w z w z w z w
+ *
+ * Which is exactly what we want if the load is reading 64-bit components
+ * like doubles, where x represents the low 32-bit of the x double component
+ * and y represents the high 32-bit of the x double component (likewise with
+ * z and w for double component y). The parameter @components represents
+ * the number of 64-bit components present in @src. This would typically be
+ * 2 at most, since we can only fit 2 double elements in the result of a
+ * vec4 load.
+ *
+ * Notice that @dst and @src can be the same register.
+ */
+void
+fs_visitor::shuffle_32bit_load_result_to_64bit_data(const fs_builder &bld,
+ const fs_reg dst,
+ const fs_reg src,
+ uint32_t components)
+{
+ assert(type_sz(src.type) == 4);
+ assert(type_sz(dst.type) == 8);
+
+ /* A temporary that we will use to shuffle the 32-bit data of each
+ * component in the vector into valid 64-bit data. We can't write directly
+ * to dst because dst can be (and would usually be) the same as src
+ * and in that case the first MOV in the loop below would overwrite the
+ * data read in the second MOV.
+ */
+ fs_reg tmp = bld.vgrf(dst.type);
+
+ for (unsigned i = 0; i < components; i++) {
+ fs_reg component_i = offset(src, bld, 2 * i);
+
+ bld.MOV(subscript(tmp, src.type, 0), component_i);
+ bld.MOV(subscript(tmp, src.type, 1), offset(component_i, bld, 1));
+
+ bld.MOV(offset(dst, bld, i), tmp);
+ }
+}
+
+/**
* A helper for MOV generation for fixing up broken hardware SEND dependency
* handling.
*/
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
index b906f3d..1970ad0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -105,6 +105,11 @@ public:
uint32_t const_offset);
void DEP_RESOLVE_MOV(const brw::fs_builder &bld, int grf);
+ void shuffle_32bit_load_result_to_64bit_data(const brw::fs_builder &bld,
+ const fs_reg dst,
+ const fs_reg src,
+ uint32_t components);
+
bool run_fs(bool do_rep_send);
bool run_vs(gl_clip_plane *clip_planes);
bool run_tcs_single_patch();
--
2.5.0
More information about the mesa-dev
mailing list