[igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Dec 1 12:14:27 UTC 2021


On 01/12/2021 11:46, Daniel Vetter wrote:
> On Wed, Dec 1, 2021 at 12:37 PM Radoslaw Szwichtenberg
> <radoslaw.szwichtenberg at intel.com> wrote:
>> On 01/12/2021 10:46, Tvrtko Ursulin wrote:
>>> On 30/11/2021 16:48, Matthew Brost wrote:
>>>>
>>>> IMO this fix is 100% correct as this is a known, tracked issue. It was
>>>> agreed upon (arch, i915, GuC team) that we just skip these tests with
>>>> GuC submission.
>> This does not look like a fix to me - you just disable test to hide the
>> result. If this issue is recorded with a bug, is tracked - why cant we
>> just let this test fail till we get this issue fixed?
> 
> This is correct in general, but sadly not for gem igts and selftests.
> The state of our validation suite is screwed up enough that
> unfortunately the safe starting point for failing tests is that the
> test is simply wrong, or too much just validating implementation
> details of the current platform/driver, while not actually validating
> stuff that should be tested for.
> 
>>> I915 team is here on upstream as well.
>>>
>>> Record those acks publicly would be my ask. Unless some security by
>>> obscurity is happening here? Until then from me it is a soft nack to
>>> keep disabling tests which show genuine weaknesses in GuC mode. Soft
>>> until we get a public record of exactly what is broken and in what
>>> circumstances, acked by architects publicly as you say they acked it
>>> somewhere. Commit message devoid of detail is not good enough.
>> This should be most probably documented in the bug, right? Here we
>> should just keep the test as is till the issue is fixed. I don't see how
>> docummenting an issue would enable us to just disable the test.
> 
> Sadly the situation is bad enough that I'm tempted to just drop a few
> thousand Acked-by: me tags in this thread for any case where a
> questionable testcase gets in the way. Unless someone can proof that
> it's a POR architectural requirement we're validating here.
> 
> I do agree though that really we should just delete such tests
> outright, not hide the mess on each platform individually.

One ack is enough, thousands shouldn't be needed. :) But, since the test 
by accident showed how GuC firmware can get apparently completely 
blocked and confused by innocent userspace operations, please apply that 
ack against something with a proper commit message.

Statements such as "it's just one more DoS", "agreed by the i915 team" 
(where?) and commit message devoid of detail are not at the standard you 
yourself are otherwise advocating.

And on the technical level I really would like to know why and how GuC 
ends up with a non-runnable item stuck at the top of it's scheduling 
queue. AKA being able to understand the issue fully.

Regards,

Tvrtko


More information about the igt-dev mailing list