[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: adding state checker for gamma lut values (rev6)

Tue Apr 23 10:59:16 UTC 2019

On Thu, Apr 18, 2019 at 10:31:32AM +0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915: adding state checker for gamma lut values (rev6)
> URL   : https://patchwork.freedesktop.org/series/58039/
> State : failure

Hey,

This series is a rerun, which means that someone went to patchwork web
interface and clicked on "test this series again". This should be used
only when you thing there is something seriously wrong with CI results.

If you have something that looks like a false positive (e.g. a failure
in tests that your change should not affect see below).

> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_5952 -> Patchwork_12827
> ====================================================
> 
> Summary
> -------
> 
>   **FAILURE**
> 
>   Serious unknown changes coming with Patchwork_12827 absolutely need to be
>   verified manually.
>   
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_12827, please notify your bug team to allow them
>   to document this new failure mode, which will reduce false positives in CI.
> 
>   External URL: https://patchwork.freedesktop.org/api/1.0/series/58039/revisions/6/mbox/

This is the relevant part about false positives. Just provide some
explanation why you think this is a false positive and CC:
Martin <martin.peres at linux.intel.com> and/or
Lakshmi <lakshminarayana.vudum at intel.com>

> Possible new issues
> -------------------
> 
>   Here are the unknown changes that may have been introduced in Patchwork_12827:
> 
> ### IGT changes ###
> 
> #### Possible regressions ####
> 
>   * igt at runner@aborted:
>     - fi-ilk-650:         NOTRUN -> FAIL
>     - fi-pnv-d510:        NOTRUN -> FAIL
>     - fi-bdw-gvtdvm:      NOTRUN -> FAIL
>     - fi-hsw-peppy:       NOTRUN -> FAIL
>     - fi-gdg-551:         NOTRUN -> FAIL
>     - fi-snb-2520m:       NOTRUN -> FAIL
>     - fi-hsw-4770:        NOTRUN -> FAIL
>     - fi-bxt-j4205:       NOTRUN -> FAIL
>     - fi-whl-u:           NOTRUN -> FAIL
>     - fi-icl-u3:          NOTRUN -> FAIL
>     - fi-ivb-3770:        NOTRUN -> FAIL
>     - fi-bxt-dsi:         NOTRUN -> FAIL
>     - fi-byt-j1900:       NOTRUN -> FAIL
>     - fi-icl-y:           NOTRUN -> FAIL
>     - fi-blb-e6850:       NOTRUN -> FAIL
>     - fi-bsw-kefka:       NOTRUN -> FAIL
>     - fi-hsw-4770r:       NOTRUN -> FAIL
>     - fi-byt-clapper:     NOTRUN -> FAIL
>     - fi-bdw-5557u:       NOTRUN -> FAIL
>     - fi-bwr-2160:        NOTRUN -> FAIL
>     - fi-elk-e7500:       NOTRUN -> FAIL

This looks like a real issue though. Seems like igt_runner has aborted
execution on almost all the machines due to hitting a WARN.

Let's see the logs:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12822/fi-cfl-8700k/igt@runner@aborted.html
----------------------------------------------------------
Aborting.
Previous test: nothing
Next test: core_auth (basic-auth)

Kernel badly tainted (0x200) (check dmesg for details):
	(0x200) TAINT_WARN: WARN_ON has happened.
----------------------------------------------------------

(You can get them through patchwork -> see full logs -> red cell in the
igt at runner@aborted row, on a machcine that has not aborted prior to that.)

This tells us that the kernel was tainted in a way that may cause
unexpeted behavior if testing would coninued (TAINT_WARN), so we abort.

Here it happned before we even run a single tests ("Previous test:
nothing"), so we have to look for the result in the boot dmesg. Link to
boot log is on top of the page with the result (boot0).

https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12822/fi-cfl-8700k/boot0.log
----------------------------------------------------------
<4>[    3.842910] ------------[ cut here ]------------
<4>[    3.842911] pipe state doesn't match!
<4>[    3.842952] WARNING: CPU: 3 PID: 224 at drivers/gpu/drm/i915/intel_display.c:12700 intel_atomic_commit_tail+0x127d/0x12e0 [i915]
<4>[    3.842953] Modules linked in: i915 x86_pkg_temp_thermal mei_hdcp coretemp snd_hda_intel crct10dif_pclmul snd_hda_codec snd_hwdep snd_hda_core crc32_pclmul e1000e snd_pcm mei_me ghash_clmulni_intel mei ptp pps_core prime_numbers
<4>[    3.842961] CPU: 3 PID: 224 Comm: kworker/u24:7 Not tainted 5.1.0-rc5-CI-Patchwork_12822+ #1
<4>[    3.842962] Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.00 10/31/2017
<4>[    3.842964] Workqueue: events_unbound async_run_entry_fn
<4>[    3.842993] RIP: 0010:intel_atomic_commit_tail+0x127d/0x12e0 [i915]
<4>[    3.842995] Code: 45 0f b6 84 24 d2 07 00 00 e8 3f 36 3d e1 5e 5f e9 80 fa ff ff e8 d3 8d e2 e0 0f 0b 0f b6 0c 24 e9 42 fc ff ff e8 c3 8d e2 e0 <0f> 0b e9 d2 f3 ff ff e8 b7 8d e2 e0 0f 0b 49 8b 44 24 50 e9 ae fc
<4>[    3.842996] RSP: 0018:ffffc90001aa7998 EFLAGS: 00010282
<4>[    3.842997] RAX: 0000000000000000 RBX: ffff888253eb92a8 RCX: 0000000000000000
<4>[    3.842998] RDX: 0000000000000007 RSI: ffff88826274d9f8 RDI: 00000000ffffffff
<4>[    3.842998] RBP: ffff888255fe9158 R08: 0000000038571494 R09: 0000000000000000
<4>[    3.842999] R10: ffff888255fe9450 R11: 0000000000000000 R12: ffff8882524f5698
<4>[    3.843000] R13: ffff888252880758 R14: ffff888252880000 R15: ffff888252880750
<4>[    3.843001] FS:  0000000000000000(0000) GS:ffff8882668c0000(0000) knlGS:0000000000000000
<4>[    3.843002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[    3.843003] CR2: 00007ffe816c3ff8 CR3: 0000000005214003 CR4: 00000000003606e0
<4>[    3.843004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[    3.843004] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[    3.843005] Call Trace:
<4>[    3.843042]  intel_atomic_commit+0x240/0x2e0 [i915]
<4>[    3.843046]  restore_fbdev_mode_atomic+0x1da/0x1f0
<4>[    3.843054]  drm_fb_helper_restore_fbdev_mode_unlocked+0x42/0x90
<4>[    3.843056]  drm_fb_helper_set_par+0x24/0x50
<4>[    3.843087]  intel_fbdev_set_par+0x11/0x40 [i915]
<4>[    3.843089]  fbcon_init+0x45e/0x620
<4>[    3.843094]  visual_init+0xc9/0x130
<4>[    3.843096]  do_bind_con_driver+0x1f5/0x420
<4>[    3.843101]  do_take_over_console+0x71/0x190
<4>[    3.843104]  do_fbcon_takeover+0x53/0xb0
<4>[    3.843105]  notifier_call_chain+0x34/0x90
<4>[    3.843109]  blocking_notifier_call_chain+0x3f/0x60
<4>[    3.843110]  ? lock_fb_info+0x13/0x40
<4>[    3.843112]  register_framebuffer+0x2a5/0x360
<4>[    3.843119]  __drm_fb_helper_initial_config_and_unlock+0x2d4/0x560
<4>[    3.843152]  intel_fbdev_initial_config+0xf/0x20 [i915]
<4>[    3.843154]  async_run_entry_fn+0x34/0x160
<4>[    3.843156]  process_one_work+0x245/0x610
<4>[    3.843162]  worker_thread+0x37/0x380
<4>[    3.843164]  ? process_one_work+0x610/0x610
<4>[    3.843166]  kthread+0x119/0x130
<4>[    3.843168]  ? kthread_park+0x80/0x80
<4>[    3.843171]  ret_from_fork+0x3a/0x50
<4>[    3.843178] irq event stamp: 18642
<4>[    3.843181] hardirqs last  enabled at (18641): [<ffffffff811268c2>] vprintk_emit+0x302/0x320
<4>[    3.843183] hardirqs last disabled at (18642): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4>[    3.843184] softirqs last  enabled at (18480): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4>[    3.843186] softirqs last disabled at (18469): [<ffffffff810b5389>] irq_exit+0xa9/0xc0
<4>[    3.843216] WARNING: CPU: 3 PID: 224 at drivers/gpu/drm/i915/intel_display.c:12700 intel_atomic_commit_tail+0x127d/0x12e0 [i915]
<4>[    3.843217] ---[ end trace df64dd148cbc195e ]---
----------------------------------------------------------

So this is the reason why all the machines aborted so early.

I hope this helps,
Arek

PS. There is work being done to make those logs more easily accessible
tied to a test, but it's not a trivial problem to solve, so it will take
time.