[Intel-gfx] [PATCH] drm/i915: Revert shrinker changes from "Track unbound pages"

Chris Wilson chris at chris-wilson.co.uk
Thu Jan 10 17:58:30 CET 2013


On Thu, 10 Jan 2013 18:03:00 +0100, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> This partially reverts
> 
> commit 6c085a728cf000ac1865d66f8c9b52935558b328
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date:   Mon Aug 20 11:40:46 2012 +0200
> 
>     drm/i915: Track unbound pages
> 
> Closer inspection of that patch revealed a bunch of unrelated changes
> in the shrinker:
> - The shrinker count is now in pages instead of objects.
> - For counting the shrinkable objects the old code only looked at the
>   inactive list, the new code looks at all bounds objects (including
>   pinned ones). That is obviously in addition to the new unbound list.
> - The shrinker cound is no longer scaled with
>   sysctl_vfs_cache_pressure. Note though that with the default tuning
>   value of vfs_cache_pressue = 100 this doesn't affect the shrinker
>   behaviour.
> - When actually shrinking objects, the old code first dropped
>   purgeable objects, then normal (inactive) objects. Only then did it,
>   in a last-ditch effort idle the gpu and evict everything. The new
>   code omits the intermediate step of evicting normal inactive
>   objects.
> 
> Safe for the first change, which seems benign, and the shrinker count
> scaling, which is a bit a different story, the endresult of all these
> changes is that the shrinker is _much_ more likely to fall back to the
> last-ditch resort of idling the gpu and evicting everything.  The old
> code could only do that if something else evicted lots of objects
> meanwhile (since without any other changes the nr_to_scan will be
> smaller than the object count).
> 
> Reverting the vfs_cache_pressure behaviour itself is a bit bogus: Only
> dentry/inode object caches should scale their shrinker counts with
> vfs_cache_pressure. Originally I've had that change reverted, too. But
> Chris Wilson insisted that it's too bogus and shouldn't again see the
> light of day.
> 
> Hence revert all these other changes and restore the old shrinker
> behaviour, with the minor adjustment that we now first scan the
> unbound list, then the inactive list for each object category
> (purgeable or normal).
> 
> A similar patch has been tested by a few people affected by the gen4/5
> hangs which started to appear in 3.7, which some people bisected to
> the "drm/i915: Track unbound pages" commit. But just disabling the
> unbound logic alone didn't change things at all.
> 
> Note that this patch doesn't fix the referenced bugs, it only hides
> the underlying bug(s) well enough to restore pre-3.7 behaviour. The
> key to achieve that is to massively reduce the likelyhood of going
> into a full gpu stall and evicting everything.
> 
> v2: Reword commit message a bit, taking Chris Wilson's comment into
> account.
> 
> v3: On Chris Wilson's insistency, do not reinstate the rather bogus
> vfs_cache_pressure change.
> 
> Tested-by: Greg KH <gregkh at linuxfoundation.org>
> Tested-by: Dave Kleikamp <dave.kleikamp at oracle.com>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
> References: https://bugs.freedesktop.org/show_bug.cgi?id=57122
> References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
> References: https://bugs.freedesktop.org/show_bug.cgi?id=57136
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: stable at vger.kernel.org
> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>

Acked-by: Chris Wilson <chris at chris-wilson.co.uk>

I just hope the clue bat descends soonest before we find another way of
triggering the spurious hangs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list