[systemd-devel] Antw: [EXT] Re: [systemd‑devel] Run "ipmitool power cycle" after lib/systemd/system‑shutdown scripts
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Feb 11 11:35:22 UTC 2022
>>> Adam Nielsen <a.nielsen at shikadi.net> schrieb am 11.02.2022 um 12:27 in
Nachricht <20220211212751.14db2a63 at vorticon.teln.shikadi.net>:
>> Then these remote management cards should allow to be restarted
>> separately. The BMC I had to deal with allow that.
>
> They do under normal circumstances, but for example I had a Dell R720
> recently where the remote access controller (iDRAC) partially crashed.
> It was still running but would no longer connect to the network. In
> this case not even power cycling the server would fix it (because it
> stays on while the server is off, so you can use it to remotely power
> the server back on again). The only solution was to unplug the power
> to both the server's power supplies, effectively power cycling the DRAC.
>
> It is true I probably could've used the RACADM utility to reset it via
> some other method, but I was unable to run it because of various
> library incompatibilities and missing kernel modules, as I'm not
> running one of the tiny number of supported Linux distributions.
>
>> Why should the firmware need more than one second? There is no reason
>> for that. So, one more point to avoid such a device.
>
> I agree the firmware should be fast. But a lot of these devices are so
> complex they now boot whole embedded operating systems. Things like
> SAS cards, even normal consumer SATA drives, run their own embedded
> operating systems, and if you can find the debug UARTs on the PCBs you
> can even read the boot messages. Even LTO tape drives now have an
> Ethernet connection on the drives themselves and provide a TCP stack
> and DHCP server so you can connect a laptop directly to the drive to
> perform diagnostics.
>
> All that complexity leads to longer boot times as the firmware has to
> do more and more during startup. I'm not saying it's a good thing, and
> as I sit there waiting for 10 minutes while a Dell server does whatever
> it has to do before it even attempts to load the OS, I often wonder
> whether all that complexity is really necessary.
>
>> Hmm, that would be very strange. Luckily until now, a normal reboot
>> was totally fine in my experience with Dell and Supermicro servers.
>
> I recently reflashed a Dell H710 onboard SAS controller, to disable the
> RAID firmware and convert it into IT mode, where it provides direct
> access to the connected drives. This allows Linux to access them
> directly, without any proprietary RAID algorithms interfering with what
> actually ends up on the disk.
>
> In order to do this, not only did I have to power cycle the server, but
> I also had to remove the battery from the SAS card in order to ensure
> the last remnants of the original firmware were wiped from the
> controller's onboard memory, otherwise it can persist even with the
> server power supplies unplugged. So alas even Dell are not immune from
> this.
I think the real reason is that some RAID controllers "back up" the disk configuration in NVRAM, so if you want to reconfigure the disks completely, you'd probably have to erase the logical disks first, then upgrade the firmware, and then recreate the configuration.
>
>> Yes, that is why I asked, so I will never buy such crappy hardware.
>> With Dell and Supermicro servers, rebooting the system was all I
>> needed. (The BMC is not reset, but can be done separately while the
>> server itself is still running.)
>
> You are probably just lucky. A good firmware update will reset the
> device without even requiring a reboot. I can't speak for Supermicro
> but plenty of Dell hardware certainly doesn't reset completely when you
> reboot the machine, and sometimes not even when you power cycle it!
Well, if the hardware had NVRAM...
> Most of the Dell BIOS updates I've performed (outside of Linux, by
> booting from a USB stick) have automatically power cycled the server
> when they are done, so even they don't just do a hardware reset.
Why don't you upgrade firmware using the built-in IDRAC?
>
> But, if it's done correctly so a power cycle isn't necessary, does it
> even matter? I would argue that if the Dell hardware you've used
> hasn't done "clean" hardware resets but has hidden that fact so well
> you haven't noticed, then it doesn't really seem to be a problem.
>
> Cheers,
> Adam.
More information about the systemd-devel
mailing list