[Bug 110848] Everything using GPU gets stuck after running+killing parallel Media loads (after running 3D benchmarks)

Fri Aug 2 23:13:41 UTC 2019

https://bugs.freedesktop.org/show_bug.cgi?id=110848

--- Comment #33 from Chris Wilson <chris at chris-wilson.co.uk> ---
Fwiw,

commit 515b8b7e935ef3f59c4efda04a3b05353ed6fbb7 (HEAD -> drm-intel-next-queued,
drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date:   Fri Aug 2 22:21:37 2019 +0100

    drm/i915: Flush the freed object list on file close

    As we increase the number of RCU objects, it becomes easier for us to
    have several hundred thousand objects in the deferred RCU free queues.
    An example is gem_ctx_create/files which continually creates active
    contexts, which are not immediately freed upon close as they are kept
    alive by outstanding requests. This lack of backpressure allows the
    context objects to persist until they overwhelm and starve the system.
    We can increase our backpressure by flushing the freed object queue upon
    closing the device fd which should then not impact other clients.

    Testcase: igt/gem_ctx_create/*files
    Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
    Reviewed-by: Matthew Auld <matthew.auld at intel.com>
    Link:
https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-2-chris@chris-wilson.co.uk

commit 1aff1903d0ff53f055088a77948ac8d8224d42db
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date:   Fri Aug 2 22:21:36 2019 +0100

    drm/i915: Hide unshrinkable context objects from the shrinker

    The shrinker cannot touch objects used by the contexts (logical state
    and ring). Currently we mark those as "pin_global" to let the shrinker
    skip over them, however, if we remove them from the shrinker lists
    entirely, we don't event have to include them in our shrink accounting.

    By keeping the unshrinkable objects in our shrinker tracking, we report
    a large number of objects available to be shrunk, and leave the shrinker
    deeply unsatisfied when we fail to reclaim those. The shrinker will
    persist in trying to reclaim the unavailable objects, forcing the system
    into a livelock (not even hitting the dread oomkiller).

    v2: Extend unshrinkable protection for perma-pinned scratch and guc
    allocations (Tvrtko)
    v3: Notice that we should be pinned when marking unshrinkable and so the
    link cannot be empty; merge duplicate paths.

    Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
    Reviewed-by: Matthew Auld <matthew.auld at intel.com>
    Link:
https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-1-chris@chris-wilson.co.uk

It didn't seem to help earlier though -- but I still think are related.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190802/854ea671/attachment.html>