BUG Found and tracked down: @lymouthd: ply-event-loop.c:913: ply_event_loop_watch_for_timeout: Assertion `seconds > 0.0' failed.\n
olivier.lahaye1 at free.fr
olivier.lahaye1 at free.fr
Sat May 23 19:51:02 UTC 2020
Hi,
I'm trying to find where the problem is in the Plymouth crash I'm experiencing on CentOS-8 and openSuSE Leap 15.2 (it doesn't"t crash on CentOS-7 and CentOS-6)
I've reported the bug here: https://gitlab.freedesktop.org/plymouth/plymouth/-/issues/112 with full informations and intro needed to reproduce the problem.
In the meantime, I'm trying to find the bug by myself, but it's hard to debug in initrd environment, so I did analyste the source code and here is what I found:
The crash occurs when Plymouth tries to initialise DRM devices.
00:16:52.992 main.c:882:plymouth_should_show_default_splash : using default splash because kernel command line has option "splash"
00:16:52.994 ply-device-manager.c:886:create_devices_from_terminals : checking for consoles
00:16:52.994 ply-device-manager.c:664:add_consoles_from_file : opening /sys/class/tty/console/active
00:16:52.995 ply-device-manager.c:672:add_consoles_from_file : reading file
00:16:52.996 ply-device-manager.c:710:add_consoles_from_file : console /dev/tty7 found!
00:16:52.997 ply-device-manager.c:486:watch_for_udev_events : watching for udev graphics device add and remove events
00:16:52.998 ply-device-manager.c:346:create_devices_for_subsystem : creating objects for drm devices
@lymouthd: ply-event-loop.c:913: ply_event_loop_watch_for_timeout: Assertion `seconds > 0.0' failed.\n
in the strace output I can see:
access("/sys/devices/virtual/drm/ttm/uevent", F_OK) = 0
readlinkat(AT_FDCWD, "/sys/devices/virtual/drm/ttm/subsystem", "../../../../class/drm", 99) = 21
openat(AT_FDCWD, "/sys/devices/virtual/drm/ttm/uevent", O_RDONLY|O_CLOEXEC) = 10
fstat(10, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
fstat(10, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
read(10, "DEVTYPE=ttm\n", 4096) = 12
read(10, "", 4096) = 0
close(10) = 0
openat(AT_FDCWD, "/run/udev/data/+drm:ttm", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
write(2, "@lymouthd: ply-event-loop.c:913: ply_event_loop_watch_for_timeout: Assertion `seconds > 0.0' failed.\n", 101) = 101
Just after failing to look for /run/udev/data/+drm:ttm file,
Plymouth does abort because the timeout value seems to be a bad value?
HERE WE ARE.Thoses messages are misleading. The problem is far sooner (in fact, when reading Plymouth.defaults)
Looking at where the problem seems to occur, I suspected that the problem was here:
in src/ libply-splash-core/ply-device-manager.c at line 974 ply_event_loop_watch_for_timeout (in ply_device_manager_watch_devices() function))
Indeed, the console is correctly setup and this occurs at line 954 of that file, and there after, there is a specific ifdef HAVE_UDEV
According to the ABRT error, the device_timout variable which is a double is incorrect (not > 0.0). So either it is equal 0.0 (not initialized) or NaN or corrupted and seen as negative.
ply_device_manager_watch_devices() is called from main.c:1093 in load_devices(). The timeout value comes from state->device_timeout.
BUT at line 2241 in main(), state.device_timeout is set to NAN and at line 2278, the load_devices() is called with &state parameter!!!!
So, between line 2241 where state.device_timeout is set to NAN and line 2278 where it is used, the state.device_timout MUST be set to something > 0.0...
At line 2257, find_distribution_default_splash(&state) is called
in find_distribution_default_splash() line 428, load_settings is called
In my settings file initrd:/usr/share/plymouth/plymouthd.defaults I have the following content:
# SystemImager is the default theme
[Daemon]
Theme=systemimager
ShowDelay=0
# DeviceTimeout=5
in main.c:310, we have: timeout_string = ply_key_file_get_value (key_file, "Daemon", "DeviceTimeout");
Below, timeout_string is checked against NULL.
As the trace "Device timout is set to ...." is not in the log, This means that timeout _string is NULL.
And IT IS as ply_key_file_get_value() returns NULL if ply_key_file_get_raw_value() returns NULL, and ply_key_file_get_raw_value() returns NULL if group ("Daemon" here) is not found. (and it is not found as it is commented out)
THUS, state->device_timeout is still NAN value!!! BUG #1 (no default value AFAICT)
IMHO, there should be a else statement at line 317 that would set state->device_timeout to a default value (maybe 5.0) ?
BUG #2: at line 313 the result of atof() is not checked against value of 0.0 while the ply_event_loop_watch_for_timeout() require NON-NULL double.
WARNING #3: at line 313: strtod() should be used instead of atof() according to atof manual. More over, you don't take care of locale LC_NU MERIC! (use ply_key_file_get_double() which does instead)
BUG #4: at line 310: you should use state->device_timeout = ply_key_file_get_double (key_file, "Daemon", "DeviceTimeout", 5.0);
By that way, a default value would exists. Be careful, if strtod() fails in ply_key_file_get_double(), a value of 0.0 is returned. A test against this invalid value for DeviceTimeout MUST exists!
(also you can have NaN in config file that atof() know how to deal with. So MAKE TESTS!
BUG #5: Same problem for ShowDelay (though, I assume value of 0 is acceptable)
BUG #6: Same problem for DeviceScale (but there is no ply_key_file_get_unsigned_long() and surtout() provides 0 as default value)
Updating settings with this content allows Plymouthd to start:
# SystemImager is the default theme
[Daemon]
Theme=systemimager
ShowDelay=0.1
DeviceTimeout=2.0
IMHO, in order to fix things:
1 - replace lines 296 to 304 with:
state->splash_delay = ply_key_file_get_double (key_file, "Daemon", "DeviceTimeout", 5.0); /* replace 5.0 with define PLYMOUTH_DEFAULT_SPLASH_DELAY (hardcoded value is BAD) */
if (!(state->splash_delay > 0.0)) {
ply_trace ("Splash delay is invalid. defaulting to %d", PLYMOUTH_DEFAULT_SPLASH_DELAY);
state->splash_delay= PLYMOUTH_DEFAULT_SPLASH_DELAY;
}
2 - same operation for lines 308 to 317 regarding DeviceTimeout test must use ">= instead of > if zero is acceptable)
3 - Maybe check that what surtout() line 323 returns something that Plymouth can handle.
Please fix.
Best regards,
--
Olivier LAHAYE
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/plymouth/attachments/20200523/a8352ecd/attachment.htm>
More information about the plymouth
mailing list