[RFC] How to test panic handlers, without crashing the kernel
Jocelyn Falempe
jfalempe at redhat.com
Tue Mar 5 16:52:40 UTC 2024
On 05/03/2024 17:23, Michael Kelley wrote:
> From: Guilherme G. Piccoli <gpiccoli at igalia.com> Sent: Monday, March 4, 2024 1:43 PM
>>
>> On 04/03/2024 18:12, John Ogness wrote:
>>> [...]
>>>> The second question is how to simulate a panic context in a
>>>> non-destructive way, so we can test the panic notifiers in CI, without
>>>> crashing the machine.
>>>
>>> I'm wondering if a "fake panic" can be implemented that quiesces all the
>>> other CPUs via NMI (similar to kdb) and then calls the panic
>>> notifiers. And finally releases everything back to normal. That might
>>> produce a fairly realistic panic situation and should be fairly
>>> non-destructive (depending on what the notifiers do and how long they
>>> take).
>>>
>>
>> Hi Jocelyn / John,
>>
>> one concern here is that the panic notifiers are kind of a no man's
>> land, so we can have very simple / safe ones, while others are
>> destructive in nature.
>>
>> An example of a good behaving notifier that is destructive is the
>> Hyper-V one, that destroys an essential host-guest interface (called
>> "vmbus connection"). What happens if we trigger this one just for
>> testing purposes in a debugfs interface? Likely the guest would die...
>>
>> [+CCing Michael Kelley here since he seems interested in panic and is
>> also expert in Hyper-V, just in case my example is bogus.]
>
> The Hyper-V example is valid. After hv_panic_vmbus_unload()
> is called, the VM won't be able to do any disk, network, or graphics
> frame buffer I/O. There's no recovery short of restarting the VM.
Thanks for the confirmation.
>
> Michael
>
> [I have retired from Microsoft. I'm still occasionally contributing
> to Linux kernel work with email mhklinux at outlook.com.]
>
>>
>> So, maybe the problem could be split in 2: the non-notifiers portion of
>> the panic path, and the the notifiers; maybe restricting the notifiers
>> you'd run is a way to circumvent the risks, like if you could pass a
>> list of the specific notifiers you aim to test, this could be
>> interesting. Let's see what the others think and thanks for your work in
>> the DRM panic notifier =)
Or maybe have two lists of panic notifiers, the safe and the destructive
list. So in case of fake panic, we can only call the safe notifiers.
>>
>> Cheers,
>>
>>
>> Guilherme
>
More information about the dri-devel
mailing list