[Bug 103359] [BAT][CI][GLK only] igt@* - incomplete - timeout/system hang

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Jan 24 15:47:49 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=103359

Imre Deak <imre.deak at intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tomi.p.sarvela at intel.com

--- Comment #77 from Imre Deak <imre.deak at intel.com> ---
(In reply to Imre Deak from comment #76)
> Marta, is this bug only about the failures with the
> 
> r8169 0000:01:00.0 enp1s0: rtl_phy_reset_cond == 1
> 
> error? In that case I would suggest closing this as NOTOURBUG. Until the
> r8169 fix the network adapter bug, we could switch to using another adapter.

I went through now the incompletes in the last 20 shard runs on GLK and found
the following two issues:

1. On GLKB4 there is most likely a hard disk/filesystem corruption problem
causing the hang. From [1]:

<6>[  216.832883] PM: suspend exit
<7>[  216.873312] [drm:drm_mode_addfb2] [FB:101]
<7>[  216.972244] [drm:drm_mode_setcrtc] [CRTC:75:pipe C]
<7>[  216.972310] [drm:drm_mode_setcrtc] [CONNECTOR:77:eDP-1]
<3>[  217.083301] atkbd serio0: Failed to enable keyboard on isa0060/serio0
<6>[  219.270898] r8169 0000:01:00.0 enp1s0: link up
<2>[  219.275742] EXT4-fs error (device sda2): __ext4_get_inode_loc:4611: inode
#13108593: block 52428919: comm dhclient: unable to read itable block
<3>[  219.276555] Buffer I/O error on dev sda2, logical block 0, lost sync page
write
<3>[  219.276589] Aborting journal on device sda2-8.
<3>[  219.276688] Buffer I/O error on dev sda2, logical block 58228736, lost
sync page write
<3>[  219.276718] JBD2: Error -5 detected when updating journal superblock for
sda2-8.
<0>[  219.276735] Kernel panic - not syncing: EXT4-fs (device sda2): panic
forced after error
<0>[  219.276735] 
<4>[  219.276753] CPU: 2 PID: 1659 Comm: dhclient Tainted: G     U          
4.15.0-rc8-CI-CI_DRM_3664+ #1
<4>[  219.276765] Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05),
BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  219.276776] Call Trace:
<4>[  219.276798]  dump_stack+0x5f/0x86
<4>[  219.276818]  panic+0xcf/0x20d
<4>[  219.276839]  ? ext4_handle_error.part.99+0x30/0xa0
<4>[  219.276870]  ext4_handle_error.part.99+0x91/0xa0
<4>[  219.276885]  __ext4_error_inode+0xbc/0x170
<4>[  219.276930]  __ext4_get_inode_loc+0x213/0x500
<4>[  219.276957]  ? ext4_dirty_inode+0x3e/0x60
<4>[  219.276972]  ext4_reserve_inode_write+0x47/0xb0
<4>[  219.276992]  ext4_mark_inode_dirty+0x63/0x270
<4>[  219.277006]  ? rcu_read_lock_sched_held+0x6f/0x80
<4>[  219.277033]  ext4_dirty_inode+0x3e/0x60
<4>[  219.277050]  __mark_inode_dirty+0x67/0x690
<4>[  219.277074]  generic_update_time+0x73/0xc0
<4>[  219.277089]  ? current_time+0x2d/0x60
<4>[  219.277110]  touch_atime+0x88/0xa0
<4>[  219.277135]  ext4_file_mmap+0x4f/0x60
<4>[  219.277151]  mmap_region+0x385/0x5c0
<4>[  219.277184]  do_mmap+0x3ad/0x520
<4>[  219.277212]  vm_mmap_pgoff+0x94/0xd0
<4>[  219.277247]  SyS_mmap_pgoff+0x166/0x200
<4>[  219.277276]  entry_SYSCALL_64_fastpath+0x22/0x8f
<4>[  219.277289] RIP: 0033:0x7fc8d14bcbda
<4>[  219.277300] RSP: 002b:00007ffe6cfa4ac8 EFLAGS: 00000246

All the other suspend failures on GLKB4 end with
<6>[  702.905785] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
<6>[  702.907826] ata1.00: configured for UDMA/133
<6>[  702.918738] ata1.00: Enabling discard_zeroes_data

which could confirm the same guess. Solution would be to replace the hard
disk/fix the filesystem corruption.

[1]
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3664/shard-glkb4/pstore6-1516440822_Panic_1.log

2. The following R8169 errors on (at least) GLKB6:
r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).

Solution here would be to use another network adapter.

It's possible I'm wrong and the hang is due to some other reason, but I'd like
to see first the above 2 issues fixed.

Adding Tomi.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20180124/9a4e8647/attachment-0001.html>


More information about the intel-gfx-bugs mailing list