[PATCH i-g-t] tests/intel/gem_exec_schedule: Update for MTL RCS/CCS w/a

Tue Jan 9 17:51:42 UTC 2024

On Tue, 09 Jan 2024 09:37:47 -0800, John Harrison wrote:
>
> On 1/8/2024 17:55, Dixit, Ashutosh wrote:
> > On Mon, 08 Jan 2024 16:12:00 -0800, John.C.Harrison at Intel.com wrote:
> > Hi John,
> >
> >> From: johnharr <johnharr at invalid-email.com>
> >>
> >> The RCS/CCS w/a means that if a spinners is sent to both RCS and CCS
> >> engines then the system is dead and requires a reset. Even
> >> pre-emptible spinners cannot be pre-empted due to the nature of the
> >> w/a. So do not get into that situation.
> >>
> >> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> >> ---
> >>   tests/intel/gem_exec_schedule.c | 14 ++++++++++++++
> >>   1 file changed, 14 insertions(+)
> >>
> >> diff --git a/tests/intel/gem_exec_schedule.c b/tests/intel/gem_exec_schedule.c
> >> index 183ee1a214bb..1b086733ad50 100644
> >> --- a/tests/intel/gem_exec_schedule.c
> >> +++ b/tests/intel/gem_exec_schedule.c
> >> @@ -443,6 +443,14 @@
> >>
> >>   IGT_TEST_DESCRIPTION("Check that we can control the order of execution");
> >>
> >> +/*
> >> + * On some platforms, there are hardware w/a's that are not compatible
> >> + * with these tests.
> > This is too general. I would basically add the commit description here in
> > the comment. Also, not sure if we add the WA HSD number in IGT's?
>
> Yeah, wasn't sure how much detail we should include here and was wanting to
> keep it general. I can add the text from the email description.

I think better to add some detail so that people will cleanly know the
reason why tests are skipped.

>
> >
> >> + */
> >> +static bool skip_bad_engine(int fd, const struct intel_execution_engine2 *e) {
> >> +	return IS_METEORLAKE(intel_get_drm_devid(fd)) && (e->class == I915_ENGINE_CLASS_COMPUTE);
> >> +}
> >> +
> >>   static unsigned int offset_in_page(void *addr)
> >>   {
> >>	return (uintptr_t)addr & 4095;
> >> @@ -2006,6 +2014,9 @@ static igt_spin_t *__noise(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
> >>	gem_context_set_priority(fd, ctx->id, prio);
> >>
> >>	for_each_ctx_engine(fd, ctx, e) {
> >> +		if (skip_bad_engine(fd, e))
> >> +			continue;
> >> +
> >>		if (spin == NULL) {
> >>			spin = __igt_spin_new(fd,
> >>					      .ahnd = ahnd,
> >> @@ -2313,6 +2324,9 @@ static void preempt_self(int fd, const intel_ctx_cfg_t *cfg,
> >>	n = 0;
> >>	gem_context_set_priority(fd, ctx[HI]->id, MIN_PRIO);
> >>	for_each_ctx_cfg_engine(fd, cfg, e) {
> >> +		if (skip_bad_engine(fd, e))
> >> +			continue;
> >> +
> >>		spin[n] = __igt_spin_new(fd,
> >>					 .ahnd = ahnd[NOISE],
> >>					 .ctx = ctx[NOISE],
> > Aren't there other places (even in this file) where spinners are being
> > scheduled on both RCS and CCS? We need this skip only in the two places above?
> The only pre-merge failures in CI for the w/a patch itself was these three
> tests - preempt-other, preempt-other-chain (same core path as
> preempt-other) and preempt-self. If I run the whole of the
> gem_exec_schedule test then I also get a failure on wide at rcs0 and wide at ccs0
> but I get those on a straight drm-tip tree without the w/a applied
> anyway. So yes, as far as I can tell, these are the only places where we
> have a problem.
>
> Note that simply running a spinner on all engines is not a problem. The
> problem is attempting to pre-empt that spinner. If the spinner is ended
> normally, then everything is fine.

I think this detail should also be added to the comment.

After taking care of the comment/commit message, this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit at intel.com>