[Intel-gfx] [PATCH 2/2] drm/i915: allow sync points within batches

Wed Sep 3 21:01:49 CEST 2014

On Wed, 3 Sep 2014 17:08:53 +0100
Chris Wilson <chris at chris-wilson.co.uk> wrote:

> On Wed, Sep 03, 2014 at 08:41:06AM -0700, Jesse Barnes wrote:
> > On Wed, 3 Sep 2014 08:01:55 +0100
> > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > 
> > > These commands are illegal/invalid inside the object, only valid inside
> > > the ring.
> > 
> > Hm, we ought to be able to write to no privileged space with
> > STORE_DWORD, but that does mean moving to context specific pages in
> > process space, or at least adding them to our existing scheme.
> 
> The per-process context page also doesn't exist generically. I certainly
> hope that userspace can't overwrite the hws! Imagine if we were using
> that for interrupt status reads, or seqno tracking...

Yeah I'm thinking of an additional hws that's per-context and userspace
mappable.  It could come in handy for userspace only sync stuff.

>  
> > I haven't tried MI_USER_INTERRUPT from a batch, if we can't do it from
> > a non-privileged batch that nixes one of the other neat features we
> > could have (fine grained intra-batch userspace synchronization).
> 
> I don't understand how writing the operation into the batch is
> beneficial vs writing into the ring, unless you instended to use
> something more fine grained than the batch seqno. You want to get
> interrupts from inside batches? Rather than continue the existing scheme
> of splitting up batches between fences?

Yeah, the whole idea here was to avoid flushing batches in order to
emit fences, both to avoid overhead and give userspace more rope.

> 
> I definitely think we should think twice before allowing userspace to
> arbitrarily generate interrupts.
> 
> > > > +	return 0;
> > > > +}
> > > >  
> > > >  static int
> > > >  relocate_entry_cpu(struct drm_i915_gem_object *obj,
> > > > @@ -349,7 +411,8 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
> > > >  static int
> > > >  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> > > >  				   struct eb_vmas *eb,
> > > > -				   struct drm_i915_gem_relocation_entry *reloc)
> > > > +				   struct drm_i915_gem_relocation_entry *reloc,
> > > > +				   struct intel_context *ctx)
> > > 
> > > Hmm. That's a nuisance. But no, you only use it to automatically create
> > > a fence not to patch the batch, so you can just use an object-flag.
> > > 
> > > This fits neatly into requests.
> > 
> > Most definitely.  What do you think of the potential upside in the DDX
> > for this, assuming we get dword writes from batches working?
> 
> Negative. You now have relocation overhead, you still need to split
> batches to keep the gpu busy and do ring switches, and context switching
> between clients, so I don't feel a need for fences from inside a batch.
> 
> Getting seqno and a hws in the client would be nice, but if it continues
> to require kernel polling, no thanks, I'll just still to approximately
> tracking the active state of surfaces with the heavier accurate queries
> sparingly.
> 
> About the only thing I could see as being useful is that it would allow
> you to reuse a batch buffer multiple times, rather than overallocate a
> whole page and keep a pool of such pages.
> 
> I am missing something?

No I think that's about right.  The need for reloc processing is a
definite downside to this approach, but that could be solved with a new
interface, or by just allowing userspace to map/manage a hws.  The
downside there is that the resulting fences wouldn't be shareable.  But
requiring a flush for that is probably fine.

-- 
Jesse Barnes, Intel Open Source Technology Center