[igt-dev] [PATCH i-g-t 4/6] igt/gem_exec_nop: Drip feed nops

Wed Jun 20 10:49:51 UTC 2018

On Wed, Jun 20, 2018 at 10:12:58AM +0100, Chris Wilson wrote:
> Quoting Katarzyna Dec (2018-06-20 09:31:40)
> > On Tue, Jun 19, 2018 at 11:49:18AM +0100, Chris Wilson wrote:
> > Few questions below.
> > 
> > > Wait until the previous nop batch is running before submitting the next.
> > > This prevents the kernel from batching up sequential requests into a
> > > a ringfull, more strenuous exercising the "lite-restore" execution path.
> > > 
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > ---
> > >  tests/gem_exec_nop.c | 146 +++++++++++++++++++++++++++++++++++++++++--
> > >  1 file changed, 142 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/tests/gem_exec_nop.c b/tests/gem_exec_nop.c
> > > index 50f0a3aad..0523b1c02 100644
> > > --- a/tests/gem_exec_nop.c
> > > +++ b/tests/gem_exec_nop.c
> > > @@ -104,6 +104,129 @@ static double nop_on_ring(int fd, uint32_t handle, unsigned ring_id,
> > >       return elapsed(&start, &now);
> > >  }
> > >  
> > > +static void poll_ring(int fd, unsigned ring, const char *name, int timeout)
> > > +{
> > > +     const int gen = intel_gen(intel_get_drm_devid(fd));
> > > +     const uint32_t MI_ARB_CHK = 0x5 << 23;
> > > +     struct drm_i915_gem_execbuffer2 execbuf;
> > > +     struct drm_i915_gem_exec_object2 obj;
> > > +     struct drm_i915_gem_relocation_entry reloc[4], *r;
> > > +     uint32_t *bbe[2], *state, *batch;
> > > +     unsigned engines[16], nengine, flags;
> > > +     struct timespec tv = {};
> > > +     unsigned long cycles;
> > > +     uint64_t elapsed;
> > > +
> > > +     flags = I915_EXEC_NO_RELOC;
> > This flag means we will prepare relocations table for kernel?
> 
> No. It means that the contents of the batch buffer match the
> reloc.presumed_offset + reloc.delta which matches obj.offset. Then if
> obj.offset matches the final location, the kernel knows it doesn't have
> to check the reloc[]. On the first pass, the kernel will have to patch
> things up but after that, we don't even have to check the 4 reloc entries
> on every pass.
> 
> The goal is not to measure the reloc patching overhead, but how long it
> takes to do a series of "lite-restores".

Now I got that :)

> 
> > > +     if (gen == 4 || gen == 5)
> > > +             flags |= I915_EXEC_SECURE;
> > > +
> > > +     nengine = 0;
> > > +     if (ring == ALL_ENGINES) {
> > > +             for_each_physical_engine(fd, ring) {
> > > +                     if (!gem_can_store_dword(fd, ring))
> > > +                             continue;
> > > +
> > > +                     engines[nengine++] = ring;
> > > +             }
> > > +     } else {
> > > +             gem_require_ring(fd, ring);
> > > +             igt_require(gem_can_store_dword(fd, ring));
> > > +             engines[nengine++] = ring;
> > > +     }
> > > +     igt_require(nengine);
> > > +
> > > +     memset(&obj, 0, sizeof(obj));
> > > +     obj.handle = gem_create(fd, 4096);
> > > +     obj.relocs_ptr = to_user_pointer(reloc);
> > > +     obj.relocation_count = ARRAY_SIZE(reloc);
> > > +
> > > +     r = memset(reloc, 0, sizeof(reloc));
> > > +     batch = gem_mmap__wc(fd, obj.handle, 0, 4096, PROT_WRITE);
> > > +
> > > +     for (unsigned int start_offset = 0;
> > > +          start_offset <= 128;
> > > +          start_offset += 128) {
> > It looks like this loop will run only once. Why to use such 'strange'
> > values and why we need loop here?
> 
> Twice.
Right. Math.. 
> 
> > > +             uint32_t *b = batch + start_offset / sizeof(*batch);
> > I am curious why in b we add batch and below in r->offset we subtract it?
> 
> Just a generalised means of finding the byte offset from the start of
> the bo.
> 
> > > +
> > > +             r->target_handle = obj.handle;
> > > +             r->offset = (b - batch + 1) * sizeof(uint32_t);
> > 
> > > +             r->delta = 4092;
> > > +             r->read_domains = I915_GEM_DOMAIN_RENDER;
> > > +
> > > +             *b = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
> > > +             if (gen >= 8) {
> > > +                     *++b = r->delta;
> > > +                     *++b = 0;
> > > +             } else if (gen >= 4) {
> > > +                     r->offset += sizeof(uint32_t);
> > > +                     *++b = 0;
> > > +                     *++b = r->delta;
> > > +             } else {
> > > +                     *b -= 1;
> > > +                     *++b = r->delta;
> > > +             }
> > > +             *++b = start_offset != 0;
> > > +             r++;
> > > +
> > Could you explain why we need such 'hacky' batch settings?^^^
> 
> We flip the value written between 1/0 so we can wait for each batch to
> start.
> 
I missed that there are 2 batches :)
> > > +             b = batch + (start_offset + 64) / sizeof(*batch);
> > > +             bbe[start_offset != 0] = b;
> > > +             *b++ = MI_ARB_CHK;
> > > +
> > > +             r->target_handle = obj.handle;
> > > +             r->offset = (b - batch + 1) * sizeof(uint32_t);
> > > +             r->read_domains = I915_GEM_DOMAIN_COMMAND;
> 
> > Why do we need to change domain from render to command?
> 
> It's ored. It's also entirely irrelevant as the kernel only stores a
> write bit.
> 
> > > +             r->delta = start_offset + 64;
> > > +             if (gen >= 8) {
> > > +                     *b++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
> > > +                     *b++ = r->delta;
> > > +                     *b++ = 0;
> > > +             } else if (gen >= 6) {
> > > +                     *b++ = MI_BATCH_BUFFER_START | 1 << 8;
> > > +                     *b++ = r->delta;
> > > +             } else {
> > > +                     *b++ = MI_BATCH_BUFFER_START | 2 << 6;
> > > +                     if (gen < 4)
> > > +                             r->delta |= 1;
> > > +                     *b++ = r->delta;
> > > +             }
> > > +             r++;
> > > +     }
> > > +     igt_assert(r == reloc + ARRAY_SIZE(reloc));
> > > +     state = batch + 1023;
> > > +
> > > +     memset(&execbuf, 0, sizeof(execbuf));
> > > +     execbuf.buffers_ptr = to_user_pointer(&obj);
> > If I understand correctly obj is 'containing' previously prepared batch, right?
> 
> Obj is the pair of batches, plus the status dword.
> -Chris

Thanks for answers.
Reviewed-by: Katarzyna Dec <katarzyna.dec at intel.com>

Kasia :)