[PATCH i-g-t] tests/intel/gem_exec_schedule: Update for MTL RCS/CCS w/a

John Harrison john.c.harrison at intel.com
Tue Jan 9 19:18:49 UTC 2024


On 1/9/2024 09:51, Dixit, Ashutosh wrote:
> On Tue, 09 Jan 2024 09:37:47 -0800, John Harrison wrote:
>> On 1/8/2024 17:55, Dixit, Ashutosh wrote:
>>> On Mon, 08 Jan 2024 16:12:00 -0800, John.C.Harrison at Intel.com wrote:
>>> Hi John,
>>>
>>>> From: johnharr <johnharr at invalid-email.com>
>>>>
>>>> The RCS/CCS w/a means that if a spinners is sent to both RCS and CCS
>>>> engines then the system is dead and requires a reset. Even
>>>> pre-emptible spinners cannot be pre-empted due to the nature of the
>>>> w/a. So do not get into that situation.
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>> ---
>>>>    tests/intel/gem_exec_schedule.c | 14 ++++++++++++++
>>>>    1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/tests/intel/gem_exec_schedule.c b/tests/intel/gem_exec_schedule.c
>>>> index 183ee1a214bb..1b086733ad50 100644
>>>> --- a/tests/intel/gem_exec_schedule.c
>>>> +++ b/tests/intel/gem_exec_schedule.c
>>>> @@ -443,6 +443,14 @@
>>>>
>>>>    IGT_TEST_DESCRIPTION("Check that we can control the order of execution");
>>>>
>>>> +/*
>>>> + * On some platforms, there are hardware w/a's that are not compatible
>>>> + * with these tests.
>>> This is too general. I would basically add the commit description here in
>>> the comment. Also, not sure if we add the WA HSD number in IGT's?
>> Yeah, wasn't sure how much detail we should include here and was wanting to
>> keep it general. I can add the text from the email description.
> I think better to add some detail so that people will cleanly know the
> reason why tests are skipped.
How about:

/*
  * The RCS/CCS workaround on MTL means that if a spinners is sent to both RCS
  * and CCS engines concurrently then the system is dead and requires a reset of
  * both engines to recover. Even pre-emptible spinners cannot be pre-empted due
  * to the nature of the w/a. The requests must run to completion or be reset.
  * And in the case of an certain IGT spinners, there is no completion until
  * after the pre-emption. The result being that the tests get killed by the
  * heartbeat and fail.
  * So avoid that situation when running on MTL by skipping the compute engines.
  */

John.


>>>> + */
>>>> +static bool skip_bad_engine(int fd, const struct intel_execution_engine2 *e) {
>>>> +	return IS_METEORLAKE(intel_get_drm_devid(fd)) && (e->class == I915_ENGINE_CLASS_COMPUTE);
>>>> +}
>>>> +
>>>>    static unsigned int offset_in_page(void *addr)
>>>>    {
>>>> 	return (uintptr_t)addr & 4095;
>>>> @@ -2006,6 +2014,9 @@ static igt_spin_t *__noise(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
>>>> 	gem_context_set_priority(fd, ctx->id, prio);
>>>>
>>>> 	for_each_ctx_engine(fd, ctx, e) {
>>>> +		if (skip_bad_engine(fd, e))
>>>> +			continue;
>>>> +
>>>> 		if (spin == NULL) {
>>>> 			spin = __igt_spin_new(fd,
>>>> 					      .ahnd = ahnd,
>>>> @@ -2313,6 +2324,9 @@ static void preempt_self(int fd, const intel_ctx_cfg_t *cfg,
>>>> 	n = 0;
>>>> 	gem_context_set_priority(fd, ctx[HI]->id, MIN_PRIO);
>>>> 	for_each_ctx_cfg_engine(fd, cfg, e) {
>>>> +		if (skip_bad_engine(fd, e))
>>>> +			continue;
>>>> +
>>>> 		spin[n] = __igt_spin_new(fd,
>>>> 					 .ahnd = ahnd[NOISE],
>>>> 					 .ctx = ctx[NOISE],
>>> Aren't there other places (even in this file) where spinners are being
>>> scheduled on both RCS and CCS? We need this skip only in the two places above?
>> The only pre-merge failures in CI for the w/a patch itself was these three
>> tests - preempt-other, preempt-other-chain (same core path as
>> preempt-other) and preempt-self. If I run the whole of the
>> gem_exec_schedule test then I also get a failure on wide at rcs0 and wide at ccs0
>> but I get those on a straight drm-tip tree without the w/a applied
>> anyway. So yes, as far as I can tell, these are the only places where we
>> have a problem.
>>
>> Note that simply running a spinner on all engines is not a problem. The
>> problem is attempting to pre-empt that spinner. If the spinner is ended
>> normally, then everything is fine.
> I think this detail should also be added to the comment.
>
> After taking care of the comment/commit message, this is:
>
> Reviewed-by: Ashutosh Dixit <ashutosh.dixit at intel.com>



More information about the igt-dev mailing list