What can a "freezed" X server status be, and a HowTo to get some kernel crash dumps
Bruno Kant
bkant at cloppy.net
Thu Feb 11 01:48:54 UTC 2016
Hello,
A few days ago, someone on the list wrote he will configure netconsole
to get some messages or debug data. But this will not work. When the box
freezes, its network is dead also.
I'm still busy with my video driver(s) and kernels. I'm now able to
decently and repetitively "freeze" any of 4.3.5, 4.5.0rc2 kernels, the
drm/intel (based on 4.5.0rc2) I cloned last week also. I am now also
able to get debug traces. Reproductible issues and some traces can be
helpfull to debug codes, so I share my findings, a quick HowTo.
WARNING: "freezes" can kill your file systems. This procedure (reboot
without sync) also, use at your own risks. Fedora and the kernels I'm
using each time recovered mine, after many "freezes" or crashes and such
reboots.
I recall my hardware, which is Shuttle XSV35V4 (Celeron, BIOS
XS35V400.400), the video is:
# lspci -nnk | egrep -iA3 "VGA"
00:02.0 VGA compatible controller [0300]: Intel Corporation Atom
Processor Z36xxx/Z37xxx Series Graphics & Display [8086:0f31] (rev 0e)
DeviceName: Onboard IGD
Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device [1297:4019]
Kernel driver in use: i915
I'm just booting on those different kernels, with either kernel debug
options or not. Then I open a gnome session and play Youtube videos with
Firefox. Sometime, for quicker crashes and dumps, I'm using also Chrome
and Neflix. But Firefox and Youtube are enough. That's enough to kill
the box, within hours and even within minutes. Play 3 videos
streams/channels/playlists on two monitors, switch videos to full
screen, then back to browser, and so on. This "freezes" the box,
whatever the standard kernel is. It is almost repetitive, takes more or
less time, some minutes, or couple of hours when videos just play, with
acceleration+DRI active. It is what I get with this Shuttle XSV35V4.
I never noticed any message related to the freezes in my syslog/journal.
Neitheir did I on the PC screen, neither over SSH, nor over my
netconsole. But I can now dump some nice debugs with following tools and
steps.
Set up SysReq, setup kdump, follow the common procedures. Check that
you get your dumps. Here is a tuto for Fedora:
https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes
For the next steps, use two keyboards. I'm using a common Cherry USB
keyboard and a Logitec wireless keyboard. I assume any pair of keyboards
will do the job.
Why two keyboad? One will interact with X, the other will interact with
your kernel. Once in gnome, you will have two active keyboards. On the
USB keyboard, press Alt+SysReq+r. Now the USB keyboard is detached from
X, will interacts with the kernel (check this in your /var/log/messages
or journalctl -f). Detach it immediatly after X/gnome and session
startup. The keyboard canot be detached anymore later, once the box is
"freezed". Keep now the USB keyboard for later. Or test if it works,
press Alt+SysReq+c on your detached keyboard, check if you get your core
dump...
Use now your second keyboard and mouse to interact with X. Play videos
with acceleration+DRI active in X. The box will die. Or maybe, play with
your favorite game, and try this same process...
When the box is dead, take your previously detached USB keyboard. Press
there Alt+SysReq+c. X/gnome will shutdown, and the core gets dumped (the
magic key "c").
With your dumped vmcore, you will get a text file corresponding to the
dmesg content (kernel messages, from the boot untill the core dump):
127.0.0.1-2016-02-10-19:20:47]# ls -al
total 109408
drwxr-xr-x 2 root root 4096 Feb 10 23:36 .
drwxr-xr-x. 15 root root 4096 Feb 10 23:37 ..
-rw------- 1 root root 111873588 Feb 10 19:20 vmcore
-rw-r--r-- 1 root root 73270 Feb 10 19:20 vmcore-dmesg.txt
If you used a debug kernel, you can open and read the vmcore content,
check basic and more dumped data. The dmesg content, the processor runq,
the ps list, and more at the time the dump was triggered. To read the
vmcore data, you will need the path to vmlinux built with your debug
kernel:
# crash /mnt/kernels/linux-4.3.5/vmlinux vmcore
For one "freeze", I have noticed that 3 CPU where idle and that only
Firefox remained active on my box... That status I noticed is below. Why
where almost all processes idle? According to the dumped ps list,
processes where still alive, but the CPU queues where empty. I'll
investigate this further.
Best regards
crash> runq
CPU 0 RUNQUEUE: ffff88023fc16c80
CURRENT: PID: 2913 TASK: ffff8800b7bd0000 COMMAND: "firefox"
RT PRIO_ARRAY: ffff88023fc16e30
[no tasks queued]
CFS RB_ROOT: ffff88023fc16d20
[no tasks queued]
CPU 1 RUNQUEUE: ffff88023fc96c80
CURRENT: PID: 0 TASK: ffff880236270000 COMMAND: "swapper/1"
RT PRIO_ARRAY: ffff88023fc96e30
[no tasks queued]
CFS RB_ROOT: ffff88023fc96d20
[no tasks queued]
CPU 2 RUNQUEUE: ffff88023fd16c80
CURRENT: PID: 0 TASK: ffff880236271c00 COMMAND: "swapper/2"
RT PRIO_ARRAY: ffff88023fd16e30
[no tasks queued]
CFS RB_ROOT: ffff88023fd16d20
[no tasks queued]
CPU 3 RUNQUEUE: ffff88023fd96c80
CURRENT: PID: 0 TASK: ffff880236273800 COMMAND: "swapper/3"
RT PRIO_ARRAY: ffff88023fd96e30
[no tasks queued]
CFS RB_ROOT: ffff88023fd96d20
[no tasks queued]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20160211/57e1117a/attachment-0001.html>
More information about the dri-devel
mailing list