[PATCH i-g-t] igt-runner fact checking
Lucas De Marchi
lucas.demarchi at intel.com
Fri Nov 8 16:24:05 UTC 2024
On Fri, Nov 08, 2024 at 06:30:59AM +0100, Peter Senna Tschudin wrote:
>Hi Lucas,
>
>Thank you for your detailed message, and sorry for my broken word wrap in the
>previous messages.
>
>On 07.11.2024 20:29, Lucas De Marchi wrote:
>> On Thu, Nov 07, 2024 at 06:48:39PM +0100, Peter Senna Tschudin wrote:
>>> Hi Lucas,
>>>
>>> Thank you for your reply.
>>>
>>> On 07.11.2024 16:55, Lucas De Marchi wrote:
>>>> +Ryszard to know if the CI part would be doable.
>>>>
>>>> On Thu, Nov 07, 2024 at 08:18:52AM +0100, Peter Senna Tschudin wrote:
>>>>>>> + scan_kernel_loaded_kmods(list, last_test);
>>>>>>
>>>>>>
>>>>>> So... igt_facts() hardcodes what is the info collected. Maybe we
>>>>>> should rather plug this into the hooks framework as part of
>>>>>> "built-in hooks".
>>>>>
>>>>> I do not follow what you are trying to communicate here. Also, please see
>>>>> what Zbigniew said here*. Facts are not test requirements, but the line is
>>>>> blurry. In the context here a fact is something in the environment that can
>>>>> change but that should not. An example is the disappearing GPU.
>>>>
>>>> $ ./build/runner/igt_runner --help | grep -A1 hook
>>>> --hook HOOK_STR
>>>> Forward HOOK_STR to the --hook option of each test.
>>>> --help-hook
>>>> Show detailed usage information for --hook.
>>>>
>>>> So... today we already have "generic support" for running arbitrary
>>>> things before/after each test/subtest/dynamic-subtest.
>>>>
>>>> $ cat << EOF > testlist.txt
>>>> igt at xe_exec_basic@once-basic
>>>> EOF
>>>> $ chmod +x lspci.sh
>>>> $ sudo ./build/runner/igt_runner \
>>>> --hook 'post-subtest:lspci -vv -d 8086:*:03xx' \
>>>> --test-list testlist.txt build/tests/ results/
>>>> $ head -n4 results/0/out.txt
>>>> 03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A580] (rev
>>>> 08) (prog-if 00 [VGA controller])
>>>> Subsystem: Intel Corporation Device 1425
>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>>>> Stepping- SERR- FastB2B- DisINTx-
>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>
>>>> So, instead of creating a separate thing called "igt_facts", what I'm
>>>> proposing is that we build this on top of the hooks and these "facts"
>>>> are simply built-in hooks that are commonly used. We may even keep them
>>>> as scripts rather than translate to C.
>>>
>>> I fail to understand why one would want to do that. It feels hackish and
>>> misses the point. I would support your suggestion if we were talking about a
>>> temporary debug tool.
>>>
>>> The igt-facts are not meant to be temporary, I dislike the overhead, and I
>>> dislike the hackish nature of this mechanism. Can you tell me what are the
>>> benefits of using these hooks? This is an honest question, I do not understand.
>>
>> Having it in C would be completely fine too if with that it feels less
>> hackish to you. Key thing here is how it's integrated in igt_runner and
>> tests. We already have a generic infra for that, yet we are trying to add
>> another one.
>
>
>In one hand having it in C can an improvement depending on how the hook calls
>the c function. On the other is worse as it will require us shipping dead code
>that requires a magic command line option to activate.
>
>You have potential to make it faster, but you kill one of the value propositions
>from the hook infra: the ability to add a hook without requiring changes to the
>code.
>
>>
>>>
>>>>
>>>> What I'd like is something liked this (with whatever name we decide):
>>>>
>>>> sudo ./build/runner/igt_runner --builtin-hooks lspci,drm-cards
>>>
>>> This is far from what I am proposing in purpose and in architecture.
>>>
>>>>
>>>> which is a shorthand or equivalent to:
>>>>
>>>> sudo ./build/runner/igt_runner \
>>>> --hook post-subtest:$PWD/scripts/built-in-hooks/lspci.sh \
>>>> --hook post-subtest:$PWD/scripts/built-in-hooks/drm-cards.sh \
>>>> --test-list testlist.txt build/tests/ results/
>>>
>>> For me this feels like a bad idea. It is probably an order of magnitude slower
>>> to run, and it is also granular in a way that I fail to understand. igt-facts
>>> should not be selected at runtime by the user because we do not know when the
>>> facts will change. Again, think of the disappearing GPU: it is a 1 in
>>> 1'000'000 kind of event, but is super valuable to know when it happens.
>>
>> So what? If we want it to run by default, we can: CI just have to add
>> it as argument that is passed by default. See all the additional options
>> CI already passes. It's also not like 1 in 1'000'000 that's happening.
>> For the next bug we are chasing, it's likely not about a card disappearing
>> and rather about something else that we will have to add another hook/fact.
>> I don't like the idea of infinitely adding these things without a quick
>> way to enable/disable.
>>
>> If you dislike the additional fork+exec, this would have the same effect
>> of what you did:
>>
>> sudo ./build/runner/igt_runner \
>> --hook post-subtest:builtin:scan_kernel_loaded_kmods \
>> ...
>>
>> Again, I'm more concerned about the integration than if this is in C or
>> in python or in bash.
>
>Judging by your suggestion the hooks are not adequate for the igt-facts. My take
>from "all the additional options CI already passes" is that we should fix that
>instead of making this problem measurably worse.
>
>The facts are supposed to be always enabled, and they should always run on CI
>and on the developer machine. It is ok to add new functionality when the
>existing hook infrastructure is meant to something different.
Ok. I still think they are actually providing a similar value, but it
seems both you and Ryszard think otherwise.
>
>From the architectural perspective, current hook infrastructure is meant for:
> - quick, as in not requiring a patch, changes at run time
> - quick, as in just run this now please, tools or debug that are temporary
>
>igt-facts are neither of these things.
>
>So no, no hooks from the command line for igt-facts.
>
>As for "It's also not like 1 in 1'000'000 that's happening.", I meant it
>literally. Count how many subtests CI runs between each time CI detects a
>disappearing GPU and I expect the number to be very large. For the V3 of my
>patch I counted 20'942 subtests on "IGTPW_12057" alone which is only a fraction
>of all subtests ran for a single patch.
We need to look at where it's reproducing... 2 tests: the xe_module_load
and the hotunplug and particuarly on discrete). Providing a way to
enable additional things on narrowed-down scenarios allows us to focus
on the specific issue.
Anyway, it seems you both think there's value in keeping this separate
and always enabled, so I will not insist.
Lucas De Marchi
>
>Peter
>
>[...]
>
>
More information about the igt-dev
mailing list