[PATCH i-g-t 3/3] tests/intel/xe_query: Clarify delta engine_cycles calculation
Kamil Konieczny
kamil.konieczny at linux.intel.com
Fri Oct 11 17:11:48 UTC 2024
Hi Lucas,
On 2024-10-10 at 22:05:07 -0500, Lucas De Marchi wrote:
> Let the compiler do the wrap around width instead of doing different
> calculations depending on if the second value is lower than the first.
>
> The only reason for ts2.engine_cycles to be lower than ts1.engine_cycles
> is if a wrap around ocurred. The wrap around should happen
> wrt the width reported by the kernel. However that is not really true
> and kernel simply reports the value "as is". With Xe2, the value is
> actually 64-bits and width is currently reported as 36 (that's being
> changed in the kernel side). So, if engine_cycles would have to wrap
> around 64 bits for that condition to hold true, and in that case we'd
> calculate the delta by shifting (1 << ts2.width), giving a wrong result.
>
> Luckly, wrapping 64 bits means at 19.2Mhz takes forever and is not a
> problem in real life.
>
> Anyway, fix the math which also removes the ugly check by b >= a.
> Example:
>
> #include <stdio.h>
> #include <inttypes.h>
>
> int main(void)
> {
> uint64_t mask = (1ull << 36) - 1;
> uint64_t a, b;
>
> a = mask - 1;
> b = mask;
> printf("%"PRIu64"\n", (b - a) & mask);
>
> a = mask;
> b = (mask + 1) & mask;
> printf("%"PRIu64"\n", (b - a) & mask);
>
> a = UINT64_MAX & mask;
> b = 0 & mask;
Just zero is enough.
> printf("%"PRIu64"\n", (b - a) & mask);
Instead of repetition here, why not call a function?
void delta(uint64_t a, uint64 b, uint64 mask) {
printf("%"PRIu64"\n", ((b & mask) - (a & mask)) & mask);
}
delta(mask - 1, mask, mask);
delta(mask, mask + 1, mask);
delta(UINT64_MAX, 0, mask);
>
> return 0;
> }
>
> which does the correct wrap around and prints 1 in all cases.
Do we need this example code in commit message?
With or without it
Reviewed-by: Kamil Konieczny <kamil.konieczny at linux.intel.com>
>
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> ---
> tests/intel/xe_query.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/tests/intel/xe_query.c b/tests/intel/xe_query.c
> index 87ddb58bb..1566680e7 100644
> --- a/tests/intel/xe_query.c
> +++ b/tests/intel/xe_query.c
> @@ -15,6 +15,7 @@
> #include <string.h>
>
> #include "igt.h"
> +#include "linux_scaffold.h"
> #include "xe_drm.h"
> #include "xe/xe_ioctl.h"
> #include "xe/xe_query.h"
> @@ -738,6 +739,7 @@ __engine_cycles(int fd, struct drm_xe_engine_class_instance *hwe)
> #define NUM_SNAPSHOTS 10
> for (i = 0; i < NUM_SNAPSHOTS * ARRAY_SIZE(clock); i++) {
> int index = i / NUM_SNAPSHOTS;
> + uint64_t width_mask;
>
> ts1.eci = *hwe;
> ts1.clockid = clock[index].id;
> @@ -763,12 +765,12 @@ __engine_cycles(int fd, struct drm_xe_engine_class_instance *hwe)
>
> delta_cpu = ts2.cpu_timestamp - ts1.cpu_timestamp;
>
> - if (ts2.engine_cycles >= ts1.engine_cycles)
> - delta_cs = (ts2.engine_cycles - ts1.engine_cycles) *
> - NSEC_PER_SEC / eng_ref_clock;
> - else
> - delta_cs = (((1 << ts2.width) - ts2.engine_cycles) + ts1.engine_cycles) *
> - NSEC_PER_SEC / eng_ref_clock;
> + igt_assert_eq(ts1.width, ts2.width);
> + width_mask = GENMASK_ULL(ts1.width - 1, 0);
> + ts1.engine_cycles &= width_mask;
> + ts2.engine_cycles &= width_mask;
> + delta_cs = ((ts2.engine_cycles - ts1.engine_cycles) & width_mask) *
> + NSEC_PER_SEC / eng_ref_clock;
>
> calc_freq = (ts2.engine_cycles - ts1.engine_cycles) * NSEC_PER_SEC / delta_cpu;
>
> --
> 2.46.2
>
More information about the igt-dev
mailing list