[Bug 110246] [CI][SHARDS] Random tests - incomplete - empty stdout/stderr/dmesg - watchdog0: watchdog did not stop!

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Jul 8 12:04:13 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=110246

--- Comment #41 from Arek Hiler <arkadiusz.hiler at intel.com> ---
(In reply to CI Bug Log from comment #40)
> The CI Bug Log issue associated to this bug has been updated.
> 
> ### New filters associated
> 
> * HSW: igt at perf_pmu@idle-vcs0 - timeout - watchdog: watchdog0: watchdog did
> not stop!
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-hsw4/
> igt at perf_pmu@idle-vcs0.html

Ok this failure is quite something. 

1. in stdout/stderr 

Test requirement not met in function igt_device_set_master, file
../lib/igt_device.c:61:
Test requirement: __igt_device_set_master(fd) == 0
Can't become DRM master, please check if no other DRM client is running.
             command   pid dev master a   uid      magic
      kms_cursor_crc  6870   0   y    y     0          0
      kms_cursor_crc  6870   0   n    y     0          0
            perf_pmu  6999   0   n    y     0          0
            perf_pmu  6999   0   n    y     0          0
Received signal SIGQUIT.


This means that we have some other IGT test alive that should not have been
there. Runner should have killed them or failed earlier if they are not
killable. I have to double-check child handling in igt_runner.

2. run.log told us that we got EBADF

Failed to stop a watchdog: Bad file descriptor
Failed to stop a watchdog: Bad file descriptor
Timestamp 1562355594
/tmp/jenkins2496193922192201218.sh: line 118:  6978 Terminated             
stdbuf -o0 dmesg -rw > "$LOGDIR/$RUN/dmesg.log" 2>&1  (wd: /opt/igt)

So the runner was terminating just fine and even tried to stop watchdog (now we
can tell because logging added in comment #36) but it failed. I guess we need
to start logging fds while initializing and then when closing to see what is
going on.

Anyway, seems like the system was in a completely broken state (hanging
processes, defunct watchdogs). Let's see whether we see similar issues with
other failures.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190708/68e695e2/attachment.html>


More information about the intel-gfx-bugs mailing list