<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><span class="vcard"><a class="email" href="mailto:martin.peres@free.fr" title="Martin Peres <martin.peres@free.fr>"> <span class="fn">Martin Peres</span></a>
</span> changed
<a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED - [CI][SHARDS] igt@gem_exec_await@wide-contexts - fail/dmesg-fail - Failed assertion: !"GPU hung""
href="https://bugs.freedesktop.org/show_bug.cgi?id=106680">bug 106680</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">i915 platform</td>
<td>
</td>
<td>CNL, CFL
</td>
</tr>
<tr>
<td style="text-align:right;">Resolution</td>
<td>FIXED
</td>
<td>---
</td>
</tr>
<tr>
<td style="text-align:right;">Status</td>
<td>RESOLVED
</td>
<td>REOPENED
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED - [CI][SHARDS] igt@gem_exec_await@wide-contexts - fail/dmesg-fail - Failed assertion: !"GPU hung""
href="https://bugs.freedesktop.org/show_bug.cgi?id=106680#c8">Comment # 8</a>
on <a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED - [CI][SHARDS] igt@gem_exec_await@wide-contexts - fail/dmesg-fail - Failed assertion: !"GPU hung""
href="https://bugs.freedesktop.org/show_bug.cgi?id=106680">bug 106680</a>
from <span class="vcard"><a class="email" href="mailto:martin.peres@free.fr" title="Martin Peres <martin.peres@free.fr>"> <span class="fn">Martin Peres</span></a>
</span></b>
<pre>(In reply to Chris Wilson from <a href="show_bug.cgi?id=106680#c7">comment #7</a>)
<span class="quote">> commit 11abf0c5a021af683b8fe12b0d30fb1226d60e0f
> Author: Chris Wilson <<a href="mailto:chris@chris-wilson.co.uk">chris@chris-wilson.co.uk</a>>
> Date: Fri Sep 14 09:00:15 2018 +0100
>
> drm/i915: Limit the backpressure for i915_request allocation
>
> If we try and fail to allocate a i915_request, we apply some
> backpressure on the clients to throttle the memory allocations coming
> from i915.ko. Currently, we wait until completely idle, but this is far
> too heavy and leads to some situations where the only escape is to
> declare a client hung and reset the GPU. The intent is to only ratelimit
> the allocation requests and to allow ourselves to recycle requests and
> memory from any long queues built up by a client hog.
>
> Although the system memory is inherently a global resources, we don't
> want to overly penalize an unlucky client to pay the price of reaping a
> hog. To reduce the influence of one client on another, we can instead of
> waiting for the entire GPU to idle, impose a barrier on the local client.
> (One end goal for request allocation is for scalability to many
> concurrent allocators; simultaneous execbufs.)
>
> To prevent ourselves from getting caught out by long running requests
> (requests that may never finish without userspace intervention, whom we
> are blocking) we need to impose a finite timeout, ideally shorter than
> hangcheck. A long time ago Paul McKenney suggested that RCU users should
> ratelimit themselves using judicious use of cond_synchronize_rcu(). This
> gives us the opportunity to reduce our indefinite wait for the GPU to
> idle to a wait for the RCU grace period of the previous allocation along
> this timeline to expire, satisfying both the local and finite properties
> we desire for our ratelimiting.
>
> There are still a few global steps (reclaim not least amongst those!)
> when we exhaust the immediate slab pool, at least now the wait is itself
> decoupled from struct_mutex for our glorious highly parallel future!
>
> Bugzilla: <a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED - [CI][SHARDS] igt@gem_exec_await@wide-contexts - fail/dmesg-fail - Failed assertion: !"GPU hung""
href="show_bug.cgi?id=106680">https://bugs.freedesktop.org/show_bug.cgi?id=106680</a>
> Signed-off-by: Chris Wilson <<a href="mailto:chris@chris-wilson.co.uk">chris@chris-wilson.co.uk</a>>
> Cc: Tvrtko Ursulin <<a href="mailto:tvrtko.ursulin@intel.com">tvrtko.ursulin@intel.com</a>>
> Cc: Joonas Lahtinen <<a href="mailto:joonas.lahtinen@linux.intel.com">joonas.lahtinen@linux.intel.com</a>>
> Cc: Daniel Vetter <<a href="mailto:daniel.vetter@ffwll.ch">daniel.vetter@ffwll.ch</a>>
> Reviewed-by: Tvrtko Ursulin <<a href="mailto:tvrtko.ursulin@intel.com">tvrtko.ursulin@intel.com</a>>
> Link:
> <a href="https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-1">https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-1</a>-
> <a href="mailto:chris@chris-wilson.co.uk">chris@chris-wilson.co.uk</a>
>
> Pretty confident!</span >
Unfortunately, this is still not fixed, as it is happening at the same rate as
before:
<a href="https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4948/shard-kbl2/igt@gem_exec_await@wide-contexts.html">https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4948/shard-kbl2/igt@gem_exec_await@wide-contexts.html</a>
Starting subtest: wide-contexts
(gem_exec_await:1079) igt_aux-CRITICAL: Test assertion failure function
sig_abort, file ../lib/igt_aux.c:500:
(gem_exec_await:1079) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest wide-contexts failed.
And we also seen issues on WHL and CNL:
<a href="https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_124/fi-cnl-u/igt@gem_exec_await@wide-contexts.html">https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_124/fi-cnl-u/igt@gem_exec_await@wide-contexts.html</a>
<a href="https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_121/fi-whl-u/igt@gem_exec_await@wide-contexts.html">https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_121/fi-whl-u/igt@gem_exec_await@wide-contexts.html</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
<li>You are on the CC list for the bug.</li>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>