[systemd-devel] Hardware watchdog support, slug speed.

Sébastien Luttringer seblu at seblu.net
Mon Mar 11 16:45:13 PDT 2013


On Mon, Mar 11, 2013 at 11:46 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> On Mon, 11.03.13 23:42, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:
>
>>
>> On Mon, Mar 11, 2013 at 11:11:19PM +0100, Sébastien Luttringer wrote:
>> Hi Sébastien,
>> thank you for the great bug report.
Thanks. I tried to come with technical facts :)

> Humm the ioctl() is supposed to be cheap. And it is on all hw I have
> tested it with. It appears that ob the hw in question it is not so
> cheap, but that really sounds like a driver issue to me.

Before everything, ioctl is a syscall, and even if the hardware /
driver is buggy, doing a syscall every loop turn (for nothing) is a
performance cost which should be avoided.
This is even more true on embedded devices where power is also
important. You can do more than 10 ioctl each seconds (10 context
swtich) in the pid 1.
I don't thinks you can tell it's really cheap.

>> > I could provide some patches to fix this if you are interested.
>> Definitely. There are some places in systemd where unexpected failure
>> results in excessive usage of resources. Patches (preferably two,
>> for the two separate issues you describe) are always welcome.
Ok, maybe I should wait for Lennart final words, to start hacking.

>
> Well, the ioctl issue above just indicates that the driver sucks, but
> given how weakly the kernel iface is defined this is generally not a
> reason not to continue to ping the hw.
Why there is a variable to define in how seconds the watchdog is
ping'ed if you don't honor it?
Before blaming the driver/kernel developers, we can look into "our"
interfaces and see why we don't respect it.

> To me this really appears as if the driver needs some updating, and we
> shouldn't attempt to tape over that by calling the ioctl less often.
Some hardware watchdog expect a ping between an interval not starting
at 0. We cannot respect this by our hysteric behaviour. Even if an API
was well define.
So I guess 10 ioctl/seconds (on my iTCO_wdt), only consume power and
slow down my laptop for free!

# modinfo iTCO_wdt|grep timeout
parm:           heartbeat:Watchdog timeout in seconds. 5..76 (TCO v1)
or 3..614 (TCO v2), default=30) (int)

Why not just call the ioctl at RuntimeWatchdogSec? As everyone except.
This let users choice how often they wants to call the hardware and
even let them adapt to a buggy hardware with an higher value.

> I mean, the whole logic of a watchdog is to ping it when we are still
> alive and well, so that it gets triggered when we aren't. By pinging
> them in every loop we do this when we are awake anyway, so it's
> basically free...
With all the respect I owe you Lennart, I disagree.
The logic is to ping at an fixed X interval, knowing that the watchdog
reset the hardware with no good news in an Y interval.
This means that a process must be able to "write" into a fd in Y seconds.
Fine tuning of watchdog can avoid some overload and buggy behaviour.

To resume:
- It's not free, it consume a lot of syscall by seconds in the main
systemd event loop.
- It doesn't honor the user configuration.
- It doesn't respect the hardware minimum value (not only because
there is no API).
- It affects all watchdog drivers, slowing all systems.
- It continue even if the initial configuration have failed, making
things worst by trying to reopen /dev/watchdog each loop turn.
- We are not able to use alternative watchdog device, when we have
more than one (case on my laptop).

Regards,

-- 
Sébastien "Seblu" Luttringer
https://www.seblu.net
GPG: 0x2072D77A


More information about the systemd-devel mailing list