[igt-dev] [PATCH i-g-t v2 1/2] tests/gem_exec_fence: Check stored values only for valid workloads
Karolina Drobnik
karolina.drobnik at intel.com
Fri Jul 29 07:38:43 UTC 2022
Hi Janusz,
Thanks a lot for taking a look at the patch.
On 28.07.2022 18:56, Janusz Krzysztofik wrote:
> On Tuesday, 26 July 2022 12:13:11 CEST Karolina Drobnik wrote:
>> From: Chris Wilson <chris at chris-wilson.co.uk>
>>
>> test_fence_await verifies if a fence used to pipeline work signals
>> correctly. await-hang and nb-await-hang test cases inject GPU hang,
>> which causes an erroneous state, meaning the fence will be signaled
>> without execution. The error causes an instant reset of the command
>> streamer for the hanging workload. This revealed a problem with how
>> we verify the fence state and results. The test assumes that the
>> error notification happens after a hang is declared, which takes
>> longer than submitting the next set of fences, making the test pass
>> every time. With the immediate reset, this might not happen, so the
>> assertion fails, as there are no active fences in the GPU hang case.
>>
>> Move the check for active fence to the path for non-hanging workload,
>> and verify results only in this scenario.
>>
>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> Signed-off-by: Karolina Drobnik <karolina.drobnik at intel.com>
>> ---
>> tests/i915/gem_exec_fence.c | 14 ++++++++------
>> 1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/tests/i915/gem_exec_fence.c b/tests/i915/gem_exec_fence.c
>> index d46914c2..260aa82c 100644
>> --- a/tests/i915/gem_exec_fence.c
>> +++ b/tests/i915/gem_exec_fence.c
>> @@ -350,18 +350,20 @@ static void test_fence_await(int fd, const intel_ctx_t
> *ctx,
>> /* Long, but not too long to anger preemption disable checks */
>> usleep(50 * 1000); /* 50 ms, typical preempt reset is 150+ms */
>>
>> - /* Check for invalidly completing the task early */
>> - igt_assert(fence_busy(spin->out_fence));
>> - for (int n = 0; n < i; n++)
>> - igt_assert_eq_u32(out[n], 0);
>> + if ((flags & HANG) == 0) {
>> + /* Check for invalidly completing the task early */
>> + igt_assert(fence_busy(spin->out_fence));
>> + for (int n = 0; n < i; n++)
>> + igt_assert_eq_u32(out[n], 0);
>
> AFAICU, in the 'hang' variant of the scenario we skip the above asserts
> because the spin batch could have already hanged, then its out fence already
> signalled and store batches waiting for that signal already executed. If
> that's the case, how do this variant of gem_exec_fence test asserts that the
> fence actually worked as expected?
With this change, yes, we would skip them. Still, the store batches
wouldn't be executed, as we reset the CS on hang as a part of the error
handling. For valid jobs, we expect to (1) see an active fence at the
beginning of the request, (2) get a signaled fence after the wait, (3)
store out[i] == i. With a hang, (1) and (3) would be false.
In this particular loop, we could have garbage here with hang, not 0s
(although, from my tests it looks like they are zeroed, but maybe I got
lucky)
>>
>> - if ((flags & HANG) == 0)
>> igt_spin_end(spin);
>> + }
>>
>> igt_waitchildren();
>>
>> gem_set_domain(fd, scratch, I915_GEM_DOMAIN_GTT, 0);
>> - while (i--)
>> + igt_assert(!fence_busy(spin->out_fence));
>
> We only check that the out fence of the presumably hanged spin batch no longer
> blocks execution of store batches.
This check applies to all workloads, all of them should be done with
work at this point
>> + while ((flags & HANG) == 0 && i--)
>
> Besides, why don't we at least assert successful results of store batches? Do
> we expect them not having their job done correctly when completed after the
> hang of the spin batch have occurred?
We don't expect them to store anything meaningful, because we get a
reset. So, this check only applies to well-behaved jobs.
All the best,
Karolina
> Am I missing something?
>
> Thanks,
> Janusz
>
>
>> igt_assert_eq_u32(out[i], i);
>> munmap(out, 4096);
>>
>>
>
>
>
>
More information about the igt-dev
mailing list