[Bug 109469] [CI][SHARDS] igt at gem_mmap_gtt@hang - fail - Failed assertion: !control->error

Wed Mar 6 18:42:36 UTC 2019

https://bugs.freedesktop.org/show_bug.cgi?id=109469

Martin Peres <martin.peres at free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED

--- Comment #6 from Martin Peres <martin.peres at free.fr> ---
(In reply to Chris Wilson from comment #5)
> commit 2caffbf1176256cc4f8d4e5c3c524fc689cb9876
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date:   Fri Feb 8 15:37:03 2019 +0000
> 
>     drm/i915: Revoke mmaps and prevent access to fence registers across reset
>     
>     Previously, we were able to rely on the recursive properties of
>     struct_mutex to allow us to serialise revoking mmaps and reacquiring the
>     FENCE registers with them being clobbered over a global device reset.
>     I then proceeded to throw out the baby with the bath water in order to
>     pursue a struct_mutex-less reset.
>     
>     Perusing LWN for alternative strategies, the dilemma on how to serialise
>     access to a global resource on one side was answered by
>     https://lwn.net/Articles/202847/ -- Sleepable RCU:
>     
>         1  int readside(void) {
>         2      int idx;
>         3      rcu_read_lock();
>         4      if (nomoresrcu) {
>         5          rcu_read_unlock();
>         6          return -EINVAL;
>         7      }
>         8      idx = srcu_read_lock(&ss);
>         9      rcu_read_unlock();
>         10     /* SRCU read-side critical section. */
>         11     srcu_read_unlock(&ss, idx);
>         12     return 0;
>         13 }
>         14
>         15 void cleanup(void)
>         16 {
>         17     nomoresrcu = 1;
>         18     synchronize_rcu();
>         19     synchronize_srcu(&ss);
>         20     cleanup_srcu_struct(&ss);
>         21 }
>     
>     No more worrying about stop_machine, just an uber-complex mutex,
>     optimised for reads, with the overhead pushed to the rare reset path.
>     
>     However, we do run the risk of a deadlock as we allocate underneath the
>     SRCU read lock, and the allocation may require a GPU reset, causing a
>     dependency cycle via the in-flight requests. We resolve that by declaring
>     the driver wedged and cancelling all in-flight rendering.
>     
>     v2: Use expedited rcu barriers to match our earlier timing
>     characteristics.
>     v3: Try to annotate locking contexts for sparse
>     v4: Reduce selftest lock duration to avoid a reset deadlock with fences
>     v5: s/srcu/reset_backoff_srcu/
>     v6: Remove more stale comments
>     
>     Testcase: igt/gem_mmap_gtt/hang
>     Fixes: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on
> struct_mutex")
>     Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>     Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-2-
> chris at chris-wilson.co.uk

It did the trick! Used to be seen every run, but not anymore for the past 3.5
weeks! Closing :)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190306/3ab715e2/attachment.html>