[igt-dev] [PATCH i-g-t 4/6] igt/gem_exec_nop: Drip feed nops
Chris Wilson
chris at chris-wilson.co.uk
Wed Jun 20 09:12:58 UTC 2018
Quoting Katarzyna Dec (2018-06-20 09:31:40)
> On Tue, Jun 19, 2018 at 11:49:18AM +0100, Chris Wilson wrote:
> Few questions below.
>
> > Wait until the previous nop batch is running before submitting the next.
> > This prevents the kernel from batching up sequential requests into a
> > a ringfull, more strenuous exercising the "lite-restore" execution path.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> > tests/gem_exec_nop.c | 146 +++++++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 142 insertions(+), 4 deletions(-)
> >
> > diff --git a/tests/gem_exec_nop.c b/tests/gem_exec_nop.c
> > index 50f0a3aad..0523b1c02 100644
> > --- a/tests/gem_exec_nop.c
> > +++ b/tests/gem_exec_nop.c
> > @@ -104,6 +104,129 @@ static double nop_on_ring(int fd, uint32_t handle, unsigned ring_id,
> > return elapsed(&start, &now);
> > }
> >
> > +static void poll_ring(int fd, unsigned ring, const char *name, int timeout)
> > +{
> > + const int gen = intel_gen(intel_get_drm_devid(fd));
> > + const uint32_t MI_ARB_CHK = 0x5 << 23;
> > + struct drm_i915_gem_execbuffer2 execbuf;
> > + struct drm_i915_gem_exec_object2 obj;
> > + struct drm_i915_gem_relocation_entry reloc[4], *r;
> > + uint32_t *bbe[2], *state, *batch;
> > + unsigned engines[16], nengine, flags;
> > + struct timespec tv = {};
> > + unsigned long cycles;
> > + uint64_t elapsed;
> > +
> > + flags = I915_EXEC_NO_RELOC;
> This flag means we will prepare relocations table for kernel?
No. It means that the contents of the batch buffer match the
reloc.presumed_offset + reloc.delta which matches obj.offset. Then if
obj.offset matches the final location, the kernel knows it doesn't have
to check the reloc[]. On the first pass, the kernel will have to patch
things up but after that, we don't even have to check the 4 reloc entries
on every pass.
The goal is not to measure the reloc patching overhead, but how long it
takes to do a series of "lite-restores".
> > + if (gen == 4 || gen == 5)
> > + flags |= I915_EXEC_SECURE;
> > +
> > + nengine = 0;
> > + if (ring == ALL_ENGINES) {
> > + for_each_physical_engine(fd, ring) {
> > + if (!gem_can_store_dword(fd, ring))
> > + continue;
> > +
> > + engines[nengine++] = ring;
> > + }
> > + } else {
> > + gem_require_ring(fd, ring);
> > + igt_require(gem_can_store_dword(fd, ring));
> > + engines[nengine++] = ring;
> > + }
> > + igt_require(nengine);
> > +
> > + memset(&obj, 0, sizeof(obj));
> > + obj.handle = gem_create(fd, 4096);
> > + obj.relocs_ptr = to_user_pointer(reloc);
> > + obj.relocation_count = ARRAY_SIZE(reloc);
> > +
> > + r = memset(reloc, 0, sizeof(reloc));
> > + batch = gem_mmap__wc(fd, obj.handle, 0, 4096, PROT_WRITE);
> > +
> > + for (unsigned int start_offset = 0;
> > + start_offset <= 128;
> > + start_offset += 128) {
> It looks like this loop will run only once. Why to use such 'strange'
> values and why we need loop here?
Twice.
> > + uint32_t *b = batch + start_offset / sizeof(*batch);
> I am curious why in b we add batch and below in r->offset we subtract it?
Just a generalised means of finding the byte offset from the start of
the bo.
> > +
> > + r->target_handle = obj.handle;
> > + r->offset = (b - batch + 1) * sizeof(uint32_t);
>
> > + r->delta = 4092;
> > + r->read_domains = I915_GEM_DOMAIN_RENDER;
> > +
> > + *b = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
> > + if (gen >= 8) {
> > + *++b = r->delta;
> > + *++b = 0;
> > + } else if (gen >= 4) {
> > + r->offset += sizeof(uint32_t);
> > + *++b = 0;
> > + *++b = r->delta;
> > + } else {
> > + *b -= 1;
> > + *++b = r->delta;
> > + }
> > + *++b = start_offset != 0;
> > + r++;
> > +
> Could you explain why we need such 'hacky' batch settings?^^^
We flip the value written between 1/0 so we can wait for each batch to
start.
> > + b = batch + (start_offset + 64) / sizeof(*batch);
> > + bbe[start_offset != 0] = b;
> > + *b++ = MI_ARB_CHK;
> > +
> > + r->target_handle = obj.handle;
> > + r->offset = (b - batch + 1) * sizeof(uint32_t);
> > + r->read_domains = I915_GEM_DOMAIN_COMMAND;
> Why do we need to change domain from render to command?
It's ored. It's also entirely irrelevant as the kernel only stores a
write bit.
> > + r->delta = start_offset + 64;
> > + if (gen >= 8) {
> > + *b++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
> > + *b++ = r->delta;
> > + *b++ = 0;
> > + } else if (gen >= 6) {
> > + *b++ = MI_BATCH_BUFFER_START | 1 << 8;
> > + *b++ = r->delta;
> > + } else {
> > + *b++ = MI_BATCH_BUFFER_START | 2 << 6;
> > + if (gen < 4)
> > + r->delta |= 1;
> > + *b++ = r->delta;
> > + }
> > + r++;
> > + }
> > + igt_assert(r == reloc + ARRAY_SIZE(reloc));
> > + state = batch + 1023;
> > +
> > + memset(&execbuf, 0, sizeof(execbuf));
> > + execbuf.buffers_ptr = to_user_pointer(&obj);
> If I understand correctly obj is 'containing' previously prepared batch, right?
Obj is the pair of batches, plus the status dword.
-Chris
More information about the igt-dev
mailing list