[Intel-gfx] [PATCH] drm/i915: Prefer random replacement before eviction search
Chris Wilson
chris at chris-wilson.co.uk
Wed Jan 11 10:02:14 UTC 2017
Performing an eviction search can be very, very slow especially for a
range restricted replacement. For example, a workload like
gem_concurrent_blit will populate the entire GTT and then cause aperture
thrashing. Since the GTT is a mix of active and inactive tiny objects,
we have to search through almost 400k objects before finding anything
inside the mappable region, and as this search is required before every
operation performance falls off a cliff.
Instead of performing the full search, we do a trial replacement of the
node at a random location fitting the specified restrictions. We lose
the strict LRU property of the GTT in exchange for avoiding the slow
search (several orders of runtime improvement for gem_concurrent_blit
4KiB-global-gtt, e.g. from 5000s to 20s). The loss of LRU replacement is
(later) mitigated firstly by only doing replacement if we find no
freespace and secondly by execbuf doing a PIN_NONBLOCK search first before
it starts thrashing (i.e. the random replacement will only occur from the
already inactive set of objects).
v2: Ascii-art, and check preconditions
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 52 +++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a3ea32f79d86..9aa53bdf5a48 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -24,6 +24,7 @@
*/
#include <linux/log2.h>
+#include <linux/random.h>
#include <linux/seq_file.h>
#include <linux/stop_machine.h>
@@ -3581,12 +3582,38 @@ int i915_gem_gtt_reserve(struct i915_address_space *vm,
return err;
}
+static u64 random_offset(u64 start, u64 end, u64 len, u64 align)
+{
+ u64 range, addr;
+
+ GEM_BUG_ON(range_overflows(start, len, end));
+ GEM_BUG_ON(round_up(start, align) > round_down(end - len, align));
+
+ range = round_down(end - len, align) - round_up(start, align);
+ if (range) {
+ if (sizeof(unsigned long) == sizeof(u64)) {
+ addr = get_random_long();
+ } else {
+ addr = get_random_int();
+ if (range > U32_MAX) {
+ addr <<= 32;
+ addr |= get_random_int();
+ }
+ }
+ div64_u64_rem(addr, range, &addr);
+ start += addr;
+ }
+
+ return round_up(start, align);
+}
+
int i915_gem_gtt_insert(struct i915_address_space *vm,
struct drm_mm_node *node,
u64 size, u64 alignment, unsigned long color,
u64 start, u64 end, unsigned int flags)
{
u32 search_flag, alloc_flag;
+ u64 offset;
int err;
lockdep_assert_held(&vm->i915->drm.struct_mutex);
@@ -3629,6 +3656,31 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
if (err != -ENOSPC)
return err;
+ /* No free space, pick a slot at random.
+ *
+ * There is a pathological case here using a GTT shared between
+ * mmap and GPU (i.e. ggtt/aliasing_ppgtt but not full-ppgtt):
+ *
+ * |<-- 256 MiB aperture -->||<-- 1792 MiB unmappable -->|
+ * (64k objects) (448k objects)
+ *
+ * Now imagine that the eviction LRU is ordered top-down (just because
+ * pathology meets real life), and that we need to evict an object to
+ * make room inside the aperture. The eviction scan then has to walk
+ * the list 448k before it finds one within range. And now imagine that
+ * it has to search for a new hole between every byte inside the memcpy,
+ * for several simultaneous clients.
+ *
+ * On a full-ppgtt system, if we have run out of available space, there
+ * will be lots and lots of objects in the eviction list!
+ */
+ offset = random_offset(start, end,
+ size, alignment ?: I915_GTT_MIN_ALIGNMENT);
+ err = i915_gem_gtt_reserve(vm, node, size, offset, color, flags);
+ if (err != -ENOSPC)
+ return err;
+
+ /* Randomly selected placement is pinned, do a search */
err = i915_gem_evict_something(vm, size, alignment, color,
start, end, flags);
if (err)
--
2.11.0
More information about the Intel-gfx
mailing list