<html> <head> <base href="https://bugs.freedesktop.org/"> </head> <body><span class="vcard"><a class="email" href="mailto:martin.peres@free.fr" title="Martin Peres <martin.peres@free.fr>"> <span class="fn">Martin Peres</span></a> </span> changed <a class="bz_bug_link bz_status_REOPENED " title="REOPENED - [CI][SHARDS] igt@gem_exec_await@wide-contexts - fail/dmesg-fail - Failed assertion: !"GPU hung"" href="https://bugs.freedesktop.org/show_bug.cgi?id=106680">bug 106680</a> <br> <table border="1" cellspacing="0" cellpadding="8"> <tr> <th>What</th> <th>Removed</th> <th>Added</th> </tr> <tr> <td style="text-align:right;">i915 platform</td> <td> </td> <td>CNL, CFL </td> </tr> <tr> <td style="text-align:right;">Resolution</td> <td>FIXED </td> <td>--- </td> </tr> <tr> <td style="text-align:right;">Status</td> <td>RESOLVED </td> <td>REOPENED </td> </tr></table> <p> <div> <b><a class="bz_bug_link bz_status_REOPENED " title="REOPENED - [CI][SHARDS] igt@gem_exec_await@wide-contexts - fail/dmesg-fail - Failed assertion: !"GPU hung"" href="https://bugs.freedesktop.org/show_bug.cgi?id=106680#c8">Comment # 8</a> on <a class="bz_bug_link bz_status_REOPENED " title="REOPENED - [CI][SHARDS] igt@gem_exec_await@wide-contexts - fail/dmesg-fail - Failed assertion: !"GPU hung"" href="https://bugs.freedesktop.org/show_bug.cgi?id=106680">bug 106680</a> from <span class="vcard"><a class="email" href="mailto:martin.peres@free.fr" title="Martin Peres <martin.peres@free.fr>"> <span class="fn">Martin Peres</span></a> </span></b> <pre>(In reply to Chris Wilson from <a href="show_bug.cgi?id=106680#c7">comment #7</a>) <span class="quote">> commit 11abf0c5a021af683b8fe12b0d30fb1226d60e0f > Author: Chris Wilson <<a href="mailto:chris@chris-wilson.co.uk">chris@chris-wilson.co.uk</a>> > Date: Fri Sep 14 09:00:15 2018 +0100 > > drm/i915: Limit the backpressure for i915_request allocation > > If we try and fail to allocate a i915_request, we apply some > backpressure on the clients to throttle the memory allocations coming > from i915.ko. Currently, we wait until completely idle, but this is far > too heavy and leads to some situations where the only escape is to > declare a client hung and reset the GPU. The intent is to only ratelimit > the allocation requests and to allow ourselves to recycle requests and > memory from any long queues built up by a client hog. > > Although the system memory is inherently a global resources, we don't > want to overly penalize an unlucky client to pay the price of reaping a > hog. To reduce the influence of one client on another, we can instead of > waiting for the entire GPU to idle, impose a barrier on the local client. > (One end goal for request allocation is for scalability to many > concurrent allocators; simultaneous execbufs.) > > To prevent ourselves from getting caught out by long running requests > (requests that may never finish without userspace intervention, whom we > are blocking) we need to impose a finite timeout, ideally shorter than > hangcheck. A long time ago Paul McKenney suggested that RCU users should > ratelimit themselves using judicious use of cond_synchronize_rcu(). This > gives us the opportunity to reduce our indefinite wait for the GPU to > idle to a wait for the RCU grace period of the previous allocation along > this timeline to expire, satisfying both the local and finite properties > we desire for our ratelimiting. > > There are still a few global steps (reclaim not least amongst those!) > when we exhaust the immediate slab pool, at least now the wait is itself > decoupled from struct_mutex for our glorious highly parallel future! > > Bugzilla: <a class="bz_bug_link bz_status_REOPENED " title="REOPENED - [CI][SHARDS] igt@gem_exec_await@wide-contexts - fail/dmesg-fail - Failed assertion: !"GPU hung"" href="show_bug.cgi?id=106680">https://bugs.freedesktop.org/show_bug.cgi?id=106680</a> > Signed-off-by: Chris Wilson <<a href="mailto:chris@chris-wilson.co.uk">chris@chris-wilson.co.uk</a>> > Cc: Tvrtko Ursulin <<a href="mailto:tvrtko.ursulin@intel.com">tvrtko.ursulin@intel.com</a>> > Cc: Joonas Lahtinen <<a href="mailto:joonas.lahtinen@linux.intel.com">joonas.lahtinen@linux.intel.com</a>> > Cc: Daniel Vetter <<a href="mailto:daniel.vetter@ffwll.ch">daniel.vetter@ffwll.ch</a>> > Reviewed-by: Tvrtko Ursulin <<a href="mailto:tvrtko.ursulin@intel.com">tvrtko.ursulin@intel.com</a>> > Link: > <a href="https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-1">https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-1</a>- > <a href="mailto:chris@chris-wilson.co.uk">chris@chris-wilson.co.uk</a> > > Pretty confident!</span > Unfortunately, this is still not fixed, as it is happening at the same rate as before: <a href="https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4948/shard-kbl2/igt@gem_exec_await@wide-contexts.html">https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4948/shard-kbl2/igt@gem_exec_await@wide-contexts.html</a> Starting subtest: wide-contexts (gem_exec_await:1079) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:500: (gem_exec_await:1079) igt_aux-CRITICAL: Failed assertion: !"GPU hung" Subtest wide-contexts failed. And we also seen issues on WHL and CNL: <a href="https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_124/fi-cnl-u/igt@gem_exec_await@wide-contexts.html">https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_124/fi-cnl-u/igt@gem_exec_await@wide-contexts.html</a> <a href="https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_121/fi-whl-u/igt@gem_exec_await@wide-contexts.html">https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_121/fi-whl-u/igt@gem_exec_await@wide-contexts.html</a></pre> </div> </p> <hr> <span>You are receiving this mail because:</span> <ul> <li>You are the QA Contact for the bug.</li> <li>You are on the CC list for the bug.</li> <li>You are the assignee for the bug.</li> </ul> </body> </html>