[Intel-gfx] [PATCH v2 3/3] drm/i915/selftests: Exercise adding requests to a full GGTT

Wed Oct 11 12:45:05 UTC 2017

Quoting Tvrtko Ursulin (2017-10-11 13:33:28)
> 
> On 10/10/2017 22:38, Chris Wilson wrote:
> > A bug recently encountered involved the issue where are we were
> > submitting requests to different ppGTT, each would pin a segment of the
> > GGTT for its logical context and ring. However, this is invisible to
> > eviction as we do not tie the context/ring VMA to a request and so do
> > not automatically wait upon it them (instead they are marked as pinned,
> > prevent eviction entirely). Instead the eviction code must flush those
> 
> preventing?
> 
> > contexts by switching to the kernel context. This selftest tries to
> > fill the GGTT with contexts to exercise a path where the
> > switch-to-kernel-context failed to make forward progress and we fail
> > with ENOSPC.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> >   drivers/gpu/drm/i915/selftests/i915_gem_evict.c    | 121 +++++++++++++++++++++
> >   .../gpu/drm/i915/selftests/i915_live_selftests.h   |   1 +
> >   drivers/gpu/drm/i915/selftests/mock_context.c      |   6 +-
> >   3 files changed, 123 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > index 5ea373221f49..53df8926be7f 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > @@ -24,6 +24,8 @@
> >   
> >   #include "../i915_selftest.h"
> >   
> > +#include "mock_context.h"
> > +#include "mock_drm.h"
> >   #include "mock_gem_device.h"
> >   
> >   static int populate_ggtt(struct drm_i915_private *i915)
> > @@ -325,6 +327,116 @@ static int igt_evict_vm(void *arg)
> >       return err;
> >   }
> >   
> > +static int igt_evict_contexts(void *arg)
> > +{
> > +     struct drm_i915_private *i915 = arg;
> > +     struct intel_engine_cs *engine;
> > +     enum intel_engine_id id;
> > +     struct reserved {
> > +             struct drm_mm_node node;
> > +             struct reserved *next;
> > +     } *reserved = NULL;
> > +     unsigned long count;
> > +     int err = 0;
> > +
> > +     /* Make the GGTT appear small (but leave just enough to function) */
> 
> How does it do that?
> 
> > +     count = 0;
> > +     mutex_lock(&i915->drm.struct_mutex);
> > +     do {
> > +             struct reserved *r;
> > +
> > +             r = kcalloc(1, sizeof(*r), GFP_KERNEL);
> > +             if (!r) {
> > +                     err = -ENOMEM;
> > +                     goto out_locked;
> > +             }
> > +
> > +             if (i915_gem_gtt_insert(&i915->ggtt.base, &r->node,
> > +                                     1ul << 20, 0, I915_COLOR_UNEVICTABLE,
> > +                                     16ul << 20, i915->ggtt.base.total,
> > +                                     PIN_NOEVICT)) {
> > +                     kfree(r);
> > +                     break;
> > +             }
> > +
> > +             r->next = reserved;
> > +             reserved = r;
> > +
> > +             count++;
> > +     } while (1);
> > +     mutex_unlock(&i915->drm.struct_mutex);
> > +     pr_info("Filled GGTT with %lu 1MiB nodes\n", count);
> 
> Oh right, this helps. :)
> 
> > +
> > +     /* Overfill the GGTT with context objects and so try to evict one. */
> 
> Can GGTT be divisible by 1MiB in which case the above filling would fill 
> everything and make any eviction impossible? Do you need an assert that 
> there is a little bit of space left for below to work?

There's a subtle restriction that we only try to fill above 16MiB in the
GGTT. So the assumption is that leaves us enough space to fit at least
one request/context (and its preliminary setup, such as golden render
state). Failure to have enough GGTT space to issue that first request
leads to fun. Might have to make that a little more tolerant in future.

> > +     for_each_engine(engine, i915, id) {
> > +             struct i915_sw_fence *fence;
> > +             struct drm_file *file;
> > +             unsigned long count = 0;
> 
> No shadows variable warnings here?

W=1, brave man! My bad forgot to remove as I reused it for the counter
above.

> > +             mutex_lock(&i915->drm.struct_mutex);
> > +             do {
> > +                     struct drm_i915_gem_request *rq;
> > +                     struct i915_gem_context *ctx;
> > +
> > +                     ctx = live_context(i915, file);
> > +                     if (!ctx)
> > +                             break;
> > +
> > +                     rq = i915_gem_request_alloc(engine, ctx);
> > +                     if (IS_ERR(rq)) {
> > +                             if (PTR_ERR(rq) != -ENOMEM) {
> > +                                     pr_err("Unexpected error from request alloc (ctx hw id %u, on %s): %d\n",
> > +                                            ctx->hw_id, engine->name,
> > +                                            (int)PTR_ERR(rq));
> > +                                     err = PTR_ERR(rq);
> > +                             }
> > +                             break;
> 
> Comment on why it is OK to stop the test on ENOMEM would be good.

Because the first versions ran out of memory before filling the GGTT :)
Now that there is a reasonable cap on the GGTT size, we can just let it
ENOMEM and fixup later if required.
-Chris