[igt-dev] [PATCH i-g-t 4/6] igt/gem_exec_nop: Drip feed nops
Katarzyna Dec
katarzyna.dec at intel.com
Wed Jun 20 10:49:51 UTC 2018
On Wed, Jun 20, 2018 at 10:12:58AM +0100, Chris Wilson wrote:
> Quoting Katarzyna Dec (2018-06-20 09:31:40)
> > On Tue, Jun 19, 2018 at 11:49:18AM +0100, Chris Wilson wrote:
> > Few questions below.
> >
> > > Wait until the previous nop batch is running before submitting the next.
> > > This prevents the kernel from batching up sequential requests into a
> > > a ringfull, more strenuous exercising the "lite-restore" execution path.
> > >
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > ---
> > > tests/gem_exec_nop.c | 146 +++++++++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 142 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/tests/gem_exec_nop.c b/tests/gem_exec_nop.c
> > > index 50f0a3aad..0523b1c02 100644
> > > --- a/tests/gem_exec_nop.c
> > > +++ b/tests/gem_exec_nop.c
> > > @@ -104,6 +104,129 @@ static double nop_on_ring(int fd, uint32_t handle, unsigned ring_id,
> > > return elapsed(&start, &now);
> > > }
> > >
> > > +static void poll_ring(int fd, unsigned ring, const char *name, int timeout)
> > > +{
> > > + const int gen = intel_gen(intel_get_drm_devid(fd));
> > > + const uint32_t MI_ARB_CHK = 0x5 << 23;
> > > + struct drm_i915_gem_execbuffer2 execbuf;
> > > + struct drm_i915_gem_exec_object2 obj;
> > > + struct drm_i915_gem_relocation_entry reloc[4], *r;
> > > + uint32_t *bbe[2], *state, *batch;
> > > + unsigned engines[16], nengine, flags;
> > > + struct timespec tv = {};
> > > + unsigned long cycles;
> > > + uint64_t elapsed;
> > > +
> > > + flags = I915_EXEC_NO_RELOC;
> > This flag means we will prepare relocations table for kernel?
>
> No. It means that the contents of the batch buffer match the
> reloc.presumed_offset + reloc.delta which matches obj.offset. Then if
> obj.offset matches the final location, the kernel knows it doesn't have
> to check the reloc[]. On the first pass, the kernel will have to patch
> things up but after that, we don't even have to check the 4 reloc entries
> on every pass.
>
> The goal is not to measure the reloc patching overhead, but how long it
> takes to do a series of "lite-restores".
Now I got that :)
>
> > > + if (gen == 4 || gen == 5)
> > > + flags |= I915_EXEC_SECURE;
> > > +
> > > + nengine = 0;
> > > + if (ring == ALL_ENGINES) {
> > > + for_each_physical_engine(fd, ring) {
> > > + if (!gem_can_store_dword(fd, ring))
> > > + continue;
> > > +
> > > + engines[nengine++] = ring;
> > > + }
> > > + } else {
> > > + gem_require_ring(fd, ring);
> > > + igt_require(gem_can_store_dword(fd, ring));
> > > + engines[nengine++] = ring;
> > > + }
> > > + igt_require(nengine);
> > > +
> > > + memset(&obj, 0, sizeof(obj));
> > > + obj.handle = gem_create(fd, 4096);
> > > + obj.relocs_ptr = to_user_pointer(reloc);
> > > + obj.relocation_count = ARRAY_SIZE(reloc);
> > > +
> > > + r = memset(reloc, 0, sizeof(reloc));
> > > + batch = gem_mmap__wc(fd, obj.handle, 0, 4096, PROT_WRITE);
> > > +
> > > + for (unsigned int start_offset = 0;
> > > + start_offset <= 128;
> > > + start_offset += 128) {
> > It looks like this loop will run only once. Why to use such 'strange'
> > values and why we need loop here?
>
> Twice.
Right. Math..
>
> > > + uint32_t *b = batch + start_offset / sizeof(*batch);
> > I am curious why in b we add batch and below in r->offset we subtract it?
>
> Just a generalised means of finding the byte offset from the start of
> the bo.
>
> > > +
> > > + r->target_handle = obj.handle;
> > > + r->offset = (b - batch + 1) * sizeof(uint32_t);
> >
> > > + r->delta = 4092;
> > > + r->read_domains = I915_GEM_DOMAIN_RENDER;
> > > +
> > > + *b = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
> > > + if (gen >= 8) {
> > > + *++b = r->delta;
> > > + *++b = 0;
> > > + } else if (gen >= 4) {
> > > + r->offset += sizeof(uint32_t);
> > > + *++b = 0;
> > > + *++b = r->delta;
> > > + } else {
> > > + *b -= 1;
> > > + *++b = r->delta;
> > > + }
> > > + *++b = start_offset != 0;
> > > + r++;
> > > +
> > Could you explain why we need such 'hacky' batch settings?^^^
>
> We flip the value written between 1/0 so we can wait for each batch to
> start.
>
I missed that there are 2 batches :)
> > > + b = batch + (start_offset + 64) / sizeof(*batch);
> > > + bbe[start_offset != 0] = b;
> > > + *b++ = MI_ARB_CHK;
> > > +
> > > + r->target_handle = obj.handle;
> > > + r->offset = (b - batch + 1) * sizeof(uint32_t);
> > > + r->read_domains = I915_GEM_DOMAIN_COMMAND;
>
> > Why do we need to change domain from render to command?
>
> It's ored. It's also entirely irrelevant as the kernel only stores a
> write bit.
>
> > > + r->delta = start_offset + 64;
> > > + if (gen >= 8) {
> > > + *b++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
> > > + *b++ = r->delta;
> > > + *b++ = 0;
> > > + } else if (gen >= 6) {
> > > + *b++ = MI_BATCH_BUFFER_START | 1 << 8;
> > > + *b++ = r->delta;
> > > + } else {
> > > + *b++ = MI_BATCH_BUFFER_START | 2 << 6;
> > > + if (gen < 4)
> > > + r->delta |= 1;
> > > + *b++ = r->delta;
> > > + }
> > > + r++;
> > > + }
> > > + igt_assert(r == reloc + ARRAY_SIZE(reloc));
> > > + state = batch + 1023;
> > > +
> > > + memset(&execbuf, 0, sizeof(execbuf));
> > > + execbuf.buffers_ptr = to_user_pointer(&obj);
> > If I understand correctly obj is 'containing' previously prepared batch, right?
>
> Obj is the pair of batches, plus the status dword.
> -Chris
Thanks for answers.
Reviewed-by: Katarzyna Dec <katarzyna.dec at intel.com>
Kasia :)
More information about the igt-dev
mailing list