[systemd-devel] Run "ipmitool power cycle" after lib/systemd/system-shutdown scripts
Adam Nielsen
a.nielsen at shikadi.net
Fri Feb 11 11:27:51 UTC 2022
> Then these remote management cards should allow to be restarted
> separately. The BMC I had to deal with allow that.
They do under normal circumstances, but for example I had a Dell R720
recently where the remote access controller (iDRAC) partially crashed.
It was still running but would no longer connect to the network. In
this case not even power cycling the server would fix it (because it
stays on while the server is off, so you can use it to remotely power
the server back on again). The only solution was to unplug the power
to both the server's power supplies, effectively power cycling the DRAC.
It is true I probably could've used the RACADM utility to reset it via
some other method, but I was unable to run it because of various
library incompatibilities and missing kernel modules, as I'm not
running one of the tiny number of supported Linux distributions.
> Why should the firmware need more than one second? There is no reason
> for that. So, one more point to avoid such a device.
I agree the firmware should be fast. But a lot of these devices are so
complex they now boot whole embedded operating systems. Things like
SAS cards, even normal consumer SATA drives, run their own embedded
operating systems, and if you can find the debug UARTs on the PCBs you
can even read the boot messages. Even LTO tape drives now have an
Ethernet connection on the drives themselves and provide a TCP stack
and DHCP server so you can connect a laptop directly to the drive to
perform diagnostics.
All that complexity leads to longer boot times as the firmware has to
do more and more during startup. I'm not saying it's a good thing, and
as I sit there waiting for 10 minutes while a Dell server does whatever
it has to do before it even attempts to load the OS, I often wonder
whether all that complexity is really necessary.
> Hmm, that would be very strange. Luckily until now, a normal reboot
> was totally fine in my experience with Dell and Supermicro servers.
I recently reflashed a Dell H710 onboard SAS controller, to disable the
RAID firmware and convert it into IT mode, where it provides direct
access to the connected drives. This allows Linux to access them
directly, without any proprietary RAID algorithms interfering with what
actually ends up on the disk.
In order to do this, not only did I have to power cycle the server, but
I also had to remove the battery from the SAS card in order to ensure
the last remnants of the original firmware were wiped from the
controller's onboard memory, otherwise it can persist even with the
server power supplies unplugged. So alas even Dell are not immune from
this.
> Yes, that is why I asked, so I will never buy such crappy hardware.
> With Dell and Supermicro servers, rebooting the system was all I
> needed. (The BMC is not reset, but can be done separately while the
> server itself is still running.)
You are probably just lucky. A good firmware update will reset the
device without even requiring a reboot. I can't speak for Supermicro
but plenty of Dell hardware certainly doesn't reset completely when you
reboot the machine, and sometimes not even when you power cycle it!
Most of the Dell BIOS updates I've performed (outside of Linux, by
booting from a USB stick) have automatically power cycled the server
when they are done, so even they don't just do a hardware reset.
But, if it's done correctly so a power cycle isn't necessary, does it
even matter? I would argue that if the Dell hardware you've used
hasn't done "clean" hardware resets but has hidden that fact so well
you haven't noticed, then it doesn't really seem to be a problem.
Cheers,
Adam.
More information about the systemd-devel
mailing list