[Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/5] drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request

Chris Wilson chris at chris-wilson.co.uk
Thu Nov 21 17:24:15 UTC 2019


Quoting Patchwork (2019-11-21 17:00:02)
> == Series Details ==
> 
> Series: series starting with [1/5] drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request
> URL   : https://patchwork.freedesktop.org/series/69834/
> State : failure
> 
> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_7400 -> Patchwork_15378
> ====================================================
> 
> Summary
> -------
> 
>   **FAILURE**
> 
>   Serious unknown changes coming with Patchwork_15378 absolutely need to be
>   verified manually.
>   
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_15378, please notify your bug team to allow them
>   to document this new failure mode, which will reduce false positives in CI.
> 
>   External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15378/index.html
> 
> Possible new issues
> -------------------
> 
>   Here are the unknown changes that may have been introduced in Patchwork_15378:
> 
> ### IGT changes ###
> 
> #### Possible regressions ####
> 
>   * igt at i915_selftest@live_gtt:
>     - fi-kbl-8809g:       [PASS][1] -> [INCOMPLETE][2]
>    [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7400/fi-kbl-8809g/igt@i915_selftest@live_gtt.html
>    [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15378/fi-kbl-8809g/igt@i915_selftest@live_gtt.html

Gah,

<0> [427.586298] kworker/-492     1.... 418613623us : intel_timeline_exit: intel_timeline_exit:374 GEM_BUG_ON(!atomic_read(&tl->active_count))
<0> [427.586302] ---------------------------------
<4> [427.586919] ------------[ cut here ]------------
<2> [427.586922] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:374!
<4> [427.586927] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [427.586929] CPU: 1 PID: 492 Comm: kworker/1:2 Tainted: G     U  W         5.4.0-rc8-CI-Patchwork_15378+ #1
<4> [427.586931] Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.03.01.0026.092720170729 09/27/2017
<4> [427.586987] Workqueue: events engine_retire [i915]
<4> [427.587029] RIP: 0010:intel_timeline_exit+0xd6/0x160 [i915]
<4> [427.587031] Code: 00 48 c7 c2 70 71 77 a0 48 c7 c7 fb e7 62 a0 e8 60 cb b6 e0 bf 01 00 00 00 e8 d6 9e b6 e0 31 f6 bf 09 00 00 00 e8 7a 12 a8 e0 <0f> 0b 48 81 c5 70 04 00 00 48 89 ef e8 49 b6 3b e1 f0 ff 8b 94 00
<4> [427.587033] RSP: 0018:ffffc9000068fdc0 EFLAGS: 00010297
<4> [427.587036] RAX: ffff888254740040 RBX: ffff888262d6e200 RCX: 0000000000000001
<4> [427.587037] RDX: 00000000000018c9 RSI: 0000000000000000 RDI: 0000000000000009
<4> [427.587039] RBP: ffff8882261cbc58 R08: 0000000000000000 R09: 0000000000000000
<4> [427.587040] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888225352068
<4> [427.587042] R13: ffff888225352000 R14: ffff88821d43fc40 R15: ffff8882253526d8
<4> [427.587059] FS:  0000000000000000(0000) GS:ffff88826b480000(0000) knlGS:0000000000000000
<4> [427.587060] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [427.587062] CR2: 00005651d791d390 CR3: 0000000002210002 CR4: 00000000003606e0
<4> [427.587063] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [427.587064] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [427.587066] Call Trace:
<4> [427.587098]  intel_context_exit_engine+0xe/0x70 [i915]
<4> [427.587141]  i915_request_retire+0x32b/0x920 [i915]
<4> [427.587234]  retire_requests+0x4d/0x60 [i915]
<4> [427.587371]  engine_retire+0x63/0xe0 [i915]

So we are now hitting timeline_exit in engine_retire before
timeline_enter in engine_park.

My expectation was that would be serialised by the timelines->lock...

But no, we use intel_timeline_exit:
        GEM_BUG_ON(!atomic_read(&tl->active_count));
        if (atomic_add_unless(&tl->active_count, -1, 1))
                return;

        spin_lock(&timelines->lock);
        if (atomic_dec_and_test(&tl->active_count))
                list_del(&tl->link);
        spin_unlock(&timelines->lock);

Hmm. Could bias the tl->active_count as we do engine_park?
-Chris


More information about the Intel-gfx mailing list