[Mesa-dev] [PATCH 5/6] i965: Implement nir_intrinsic_shader_clock
Connor Abbott
cwabbott0 at gmail.com
Wed Oct 7 16:36:43 PDT 2015
On Wed, Oct 7, 2015 at 7:51 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
> Signed-off-by: Emil Velikov <emil.velikov at collabora.com>
> ---
> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 9 +++++++++
> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 10 ++++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 03fe680..bcb8f38 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1317,6 +1317,15 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
> break;
> }
>
> + case nir_intrinsic_shader_clock: {
> + /* We cannot do anything if there is an event, so ignore it for now */
> + fs_reg shader_clock = get_timestamp(bld);
get_timestamp() isn't doing quite what you want. If you look at the
definition of fs_visitor::get_timestamp(), you see this comment:
/* The caller wants the low 32 bits of the timestamp. Since it's running
* at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
* which is plenty of time for our purposes. It is identical across the
* EUs, but since it's tracking GPU core speed it will increment at a
* varying rate as render P-states change.
*
* The caller could also check if render P-states have changed (or anything
* else that might disrupt timing) by setting smear to 2 and checking if
* that field is != 0.
*/
When you read the ARF register, you get three components set in your
SIMD8 register, plus some junk because we have to read in four
components at a time. That is, this code in get_timestamp:
fs_reg ts = fs_reg(retype(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE,
BRW_ARF_TIMESTAMP,
0),
BRW_REGISTER_TYPE_UD));
bld.group(4, 0).exec_all().MOV(dst, ts);
will create a register called dst that looks like this:
------------------------------
| tm0.0 | tm0.1 | tm0.2 | junk |
------------------------------
Then doing dst.set_smear(0) will create a source that looks like this:
---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
---------------------------------------------------------------
Or, in SIMD16,
---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
---------------------------------------------------------------
---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
---------------------------------------------------------------
Whereas what you actually want is:
---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
---------------------------------------------------------------
---------------------------------------------------------------
| tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 |
---------------------------------------------------------------
Or, in SIMD16,
---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
---------------------------------------------------------------
---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
---------------------------------------------------------------
---------------------------------------------------------------
| tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 |
---------------------------------------------------------------
---------------------------------------------------------------
| tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 |
---------------------------------------------------------------
You can use get_timestamp(), except you need to save the result and
emit two more MOV's to appropriately-sized registers allocated using
bld.vgrf(BRW_REGISTER_TYPE_UD) using the same smear() method that the
get_timestamp() code uses (note that although it did smear(0), you can
still call .smear(1) on the result to pick out tm0.1) to pick out the
right components, and then a LOAD_PAYLOAD to combine them into the
destination.
> +
> + bld.MOV(retype(dest, brw_type_for_base_type(glsl_type::uvec2_type)),
> + shader_clock);
> + break;
> + }
> +
> case nir_intrinsic_image_size: {
> /* Get the referenced image variable and type. */
> const nir_variable *var = instr->variables[0]->var;
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index 41bd80d..f1de8d4 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -806,6 +806,16 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
> break;
> }
>
> + case nir_intrinsic_shader_clock: {
> + /* We cannot do anything if there is an event, so ignore it for now */
> + src_reg shader_clock = get_timestamp();
> + enum brw_reg_type type = brw_type_for_base_type(glsl_type::uvec2_type);
> +
> + dest = get_nir_dest(instr->dest, type);
> + emit(MOV(dest, retype(shader_clock, type)));
> + break;
> + }
> +
> default:
> unreachable("Unknown intrinsic");
> }
> --
> 2.5.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list