[Intel-gfx] [PATCH i-g-t v3] benchmarks/gem_wsim: Command submission workload simulator

Fri Apr 7 09:51:04 UTC 2017

On Fri, Apr 07, 2017 at 09:53:05AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/04/2017 09:55, Chris Wilson wrote:
> >On Thu, Apr 06, 2017 at 09:18:36AM +0100, Tvrtko Ursulin wrote:
> 
> [snip]
[snip]

> >>>>+		if (swap_vcs && engine == VCS1)
> >>>>+			engine = VCS2;
> >>>>+		else if (swap_vcs && engine == VCS2)
> >>>>+			engine = VCS1;
> >>>>+		w->eb.flags = eb_engine_map[engine];
> >>>>+		w->eb.flags |= I915_EXEC_HANDLE_LUT;
> >>>>+		if (!seqnos)
> >>>>+			w->eb.flags |= I915_EXEC_NO_RELOC;
> >>>
> >>>Doesn't look too hard to get the relocation right. Forcing relocations
> >>>between batches is probably a good one to check (just to say don't do
> >>>that)
> >>
> >>I am not following here? You are saying don't do relocations at all?
> >>How do I make sure things stay fixed and even how to find out where
> >>they are in the first pass?
> >
> >Depending on the workload, it may be informative to also do comparisons
> >between NORELOC and always RELOC. Personally I would make sure we were
> >using NORELOC as this should be a simulator/example.
> 
> How do I use NORELOC? I mean, I have to know where to objects will
> be pinned, or be able to pin them first and know they will remain
> put. What am I not understanding here?

It will be assigned an address on first execution. Can I quote the spiel
I wrote for i915_gem_execbuffer.c and see if that answers how to use
NORELOC:

 * Reserving resources for the execbuf is the most complicated phase. We
 * neither want to have to migrate the object in the address space, nor do
 * we want to have to update any relocations pointing to this object. Ideally,
 * we want to leave the object where it is and for all the existing relocations
 * to match. If the object is given a new address, or if userspace thinks the
 * object is elsewhere, we have to parse all the relocation entries and update
 * the addresses. Userspace can set the I915_EXEC_NORELOC flag to hint that
 * all the target addresses in all of its objects match the value in the
 * relocation entries and that they all match the presumed offsets given by the
 * list of execbuffer objects. Using this knowledge, we know that if we haven't
 * moved any buffers, all the relocation entries are valid and we can skip
 * the update. (If userspace is wrong, the likely outcome is an impromptu GPU
 * hang.) The requirement for using I915_EXEC_NO_RELOC are:
 *
 *      The addresses written in the objects must match the corresponding
 *      reloc.presumed_offset which in turn must match the corresponding
 *      execobject.offset.
 *
 *      Any render targets written to in the batch must be flagged with
 *      EXEC_OBJECT_WRITE.
 *
 *      To avoid stalling, execobject.offset should match the current
 *      address of that object within the active context.
 *

Does that make sense? How questions remain unanswered?

Hmm, I usually sum it up as

	batch[reloc.offset] == reloc.presumed_offset + reloc.delta;

and

	execobj.offset == reloc.presumed_offset

must be true at the time of execbuf. Note that upon relocation,
batch[reloc.offset], reloc.presumed_offset and execobj.offset are
updated. This is important to remember if you are prerecording the
reloc/execobj arrays, and not feeding back the results of execbuf
between phases.

> But in general is this correctly implementing your idea for queue
> depth estimation?

>From my rough checklist:

	* writes engine->next_seqno++ after each op (in this case end of batch)
	* qlen[engine] = engine->next_seqno - *engine->current_seqno;

Design looks right. Implementation requires checking... I'll be back.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre