[Mesa-dev] [PATCH 5/6] i965: Implement nir_intrinsic_shader_clock

Connor Abbott cwabbott0 at gmail.com
Wed Oct 7 16:36:43 PDT 2015


On Wed, Oct 7, 2015 at 7:51 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
> Signed-off-by: Emil Velikov <emil.velikov at collabora.com>
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  9 +++++++++
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 10 ++++++++++
>  2 files changed, 19 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 03fe680..bcb8f38 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1317,6 +1317,15 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
>        break;
>     }
>
> +   case nir_intrinsic_shader_clock: {
> +      /* We cannot do anything if there is an event, so ignore it for now */
> +      fs_reg shader_clock = get_timestamp(bld);

get_timestamp() isn't doing quite what you want. If you look at the
definition of fs_visitor::get_timestamp(), you see this comment:

   /* The caller wants the low 32 bits of the timestamp.  Since it's running
    * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
    * which is plenty of time for our purposes.  It is identical across the
    * EUs, but since it's tracking GPU core speed it will increment at a
    * varying rate as render P-states change.
    *
    * The caller could also check if render P-states have changed (or anything
    * else that might disrupt timing) by setting smear to 2 and checking if
    * that field is != 0.
    */
When you read the ARF register, you get three components set in your
SIMD8 register, plus some junk because we have to read in four
components at a time. That is, this code in get_timestamp:

   fs_reg ts = fs_reg(retype(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE,
                                          BRW_ARF_TIMESTAMP,
                                          0),
                             BRW_REGISTER_TYPE_UD));

   bld.group(4, 0).exec_all().MOV(dst, ts);

will create a register called dst that looks like this:

 ------------------------------
| tm0.0 | tm0.1 | tm0.2 | junk |
 ------------------------------

Then doing dst.set_smear(0) will create a source that looks like this:

 ---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
 ---------------------------------------------------------------

Or, in SIMD16,

 ---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
 ---------------------------------------------------------------
 ---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
 ---------------------------------------------------------------

Whereas what you actually want is:

 ---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
 ---------------------------------------------------------------
 ---------------------------------------------------------------
| tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 |
 ---------------------------------------------------------------

Or, in SIMD16,

 ---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
 ---------------------------------------------------------------
 ---------------------------------------------------------------
| tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 | tm0.0 |
 ---------------------------------------------------------------
 ---------------------------------------------------------------
| tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 |
 ---------------------------------------------------------------
 ---------------------------------------------------------------
| tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 | tm0.1 |
 ---------------------------------------------------------------


You can use get_timestamp(), except you need to save the result and
emit two more MOV's to appropriately-sized registers allocated using
bld.vgrf(BRW_REGISTER_TYPE_UD) using the same smear() method that the
get_timestamp() code uses (note that although it did smear(0), you can
still call .smear(1) on the result to pick out tm0.1) to pick out the
right components, and then a LOAD_PAYLOAD to combine them into the
destination.

> +
> +      bld.MOV(retype(dest, brw_type_for_base_type(glsl_type::uvec2_type)),
> +              shader_clock);
> +      break;
> +   }
> +
>     case nir_intrinsic_image_size: {
>        /* Get the referenced image variable and type. */
>        const nir_variable *var = instr->variables[0]->var;
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index 41bd80d..f1de8d4 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -806,6 +806,16 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
>        break;
>     }
>
> +   case nir_intrinsic_shader_clock: {
> +      /* We cannot do anything if there is an event, so ignore it for now */
> +      src_reg shader_clock = get_timestamp();
> +      enum brw_reg_type type = brw_type_for_base_type(glsl_type::uvec2_type);
> +
> +      dest = get_nir_dest(instr->dest, type);
> +      emit(MOV(dest, retype(shader_clock, type)));
> +      break;
> +   }
> +
>     default:
>        unreachable("Unknown intrinsic");
>     }
> --
> 2.5.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list