[Intel-gfx] [PATCH] drm/i915/selftests: Exercise CS TLB invalidation

Thu Sep 19 13:39:59 UTC 2019

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2019-09-19 13:57:45)
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > Check that we are correctly invalidating the TLB at the start of a
>> > batch after updating the GTT.
>> >
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 227 ++++++++++++++++++
>> >  1 file changed, 227 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> > index 598c18d10640..f8709b332bd7 100644
>> > --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> > @@ -25,13 +25,16 @@
>> >  #include <linux/list_sort.h>
>> >  #include <linux/prime_numbers.h>
>> >  
>> > +#include "gem/i915_gem_context.h"
>> >  #include "gem/selftests/mock_context.h"
>> > +#include "gt/intel_context.h"
>> >  
>> >  #include "i915_random.h"
>> >  #include "i915_selftest.h"
>> >  
>> >  #include "mock_drm.h"
>> >  #include "mock_gem_device.h"
>> > +#include "igt_flush_test.h"
>> >  
>> >  static void cleanup_freed_objects(struct drm_i915_private *i915)
>> >  {
>> > @@ -1705,6 +1708,229 @@ int i915_gem_gtt_mock_selftests(void)
>> >       return err;
>> >  }
>> >  
>> > +static int context_sync(struct intel_context *ce)
>> > +{
>> > +     struct i915_request *rq;
>> > +     long timeout;
>> > +
>> > +     rq = intel_context_create_request(ce);
>> > +     if (IS_ERR(rq))
>> > +             return PTR_ERR(rq);
>> > +
>> > +     i915_request_get(rq);
>> > +     i915_request_add(rq);
>> > +
>> > +     timeout = i915_request_wait(rq, 0, HZ / 5);
>> > +     i915_request_put(rq);
>> > +
>> > +     return timeout < 0 ? -EIO : 0;
>> > +}
>> > +
>> > +static int submit_batch(struct intel_context *ce, u64 addr)
>> > +{
>> > +     struct i915_request *rq;
>> > +     int err;
>> > +
>> > +     rq = intel_context_create_request(ce);
>> > +     if (IS_ERR(rq))
>> > +             return PTR_ERR(rq);
>> > +
>> > +     err = 0;
>> > +     if (rq->engine->emit_init_breadcrumb) /* detect a hang */
>> 
>> for seqno write?
>
> If we expect an initial breadcrumb, we use it during reset to identify
> if we are inside a payload (as opposed to being inside the semaphore
> wait). So we need to emit the breadcrumb if we may hang in our batch.
>
>> > +             err = rq->engine->emit_init_breadcrumb(rq);
>> > +     if (err == 0)
>> > +             err = rq->engine->emit_bb_start(rq, addr, 0, 0);
>> > +
>> 
>> In context_sync part we grabbed a reference. In here we
>> don't. I don't see how we can get away without it even
>> if we don't wait in here.
>
> Hmm, I suppose you can argue that we do now have a later deref in the
> spinner. That wasn't there before...
>
>> > +     vma = i915_vma_instance(out, vm, NULL);
>> > +     if (IS_ERR(vma)) {
>> > +             err = PTR_ERR(vma);
>> > +             goto out_batch;
>> > +     }
>> > +
>> > +     err = i915_vma_pin(vma, 0, 0,
>> > +                        PIN_USER |
>> > +                        PIN_OFFSET_FIXED |
>> > +                        (vm->total - PAGE_SIZE));
>> > +     if (err)
>> > +             goto out_out;
>> 
>> out_put?
>
> Joonas likes out: for normal exit paths that double for error handling.

Ok. Fine with me. I just like the last part to describe what the first part
of onion out does. Don't have to so much scroll back and forth
in editor.

>
>> Oh and we don't have to do anything with the instance on
>> error paths?
>
> No. The vma does not yet have an independent lifetime from the object
> (they are all owned by objects currently).
>
>> > +                     /* Replace the TLB with target batches */
>> > +                     for (i = 0; i < count; i++) {
>> > +                             u32 *cs = batch + i * 64 / sizeof(*cs);
>> > +                             u64 addr;
>> > +
>> > +                             vma->node.start = offset + i * PAGE_SIZE;
>> 
>> on previous loop, we have now primed the pte to tlb in here?
>
> We're using the same PTE as before, in the expectation that the TLB
> still contains that lookup.
>
>> > +                             vm->insert_entries(vm, vma, I915_CACHE_NONE, 0);
>> 
>> ..now changing in it here...
>> 
>> > +
>> > +                             addr = vma->node.start + i * 64;
>> > +                             cs[4] = MI_NOOP;
>> > +                             cs[6] = lower_32_bits(addr);
>> > +                             cs[7] = upper_32_bits(addr);
>> > +                             wmb();
>> > +
>> > +                             err = submit_batch(ce, addr);
>> 
>> in hope that with this submission hardware will see the old one and
>> completely miss the store+bb start?
>
> The hope is we see the right page of course! The test is to detect when
> it sees the old page instead.

Right, I guess it just depends who is hoping wrt to outcome =)

>  
>> Perhaps rewiring a more dummy emit only to prove this case
>> is pushing it.
>> 
>> But I am curious to know if you did try it out by removing
>> the invalidate on the emits and managed to bring
>> out the missing writes?
>
> We can demonstrate it on Tigerlake :)
>
> Indeed it detects the remove of MI_INVALIDATE_TLB elsewhere.

Neat addition to our triaging kit this will be.
-Mika