[Intel-gfx] [PATCH 2/2] drm/i915: allow sync points within batches

Wed Sep 3 18:08:53 CEST 2014

On Wed, Sep 03, 2014 at 08:41:06AM -0700, Jesse Barnes wrote:
> On Wed, 3 Sep 2014 08:01:55 +0100
> Chris Wilson <chris at chris-wilson.co.uk> wrote:
> 
> > These commands are illegal/invalid inside the object, only valid inside
> > the ring.
> 
> Hm, we ought to be able to write to no privileged space with
> STORE_DWORD, but that does mean moving to context specific pages in
> process space, or at least adding them to our existing scheme.

The per-process context page also doesn't exist generically. I certainly
hope that userspace can't overwrite the hws! Imagine if we were using
that for interrupt status reads, or seqno tracking...

> I haven't tried MI_USER_INTERRUPT from a batch, if we can't do it from
> a non-privileged batch that nixes one of the other neat features we
> could have (fine grained intra-batch userspace synchronization).

I don't understand how writing the operation into the batch is
beneficial vs writing into the ring, unless you instended to use
something more fine grained than the batch seqno. You want to get
interrupts from inside batches? Rather than continue the existing scheme
of splitting up batches between fences?

I definitely think we should think twice before allowing userspace to
arbitrarily generate interrupts.

> > > +	return 0;
> > > +}
> > >  
> > >  static int
> > >  relocate_entry_cpu(struct drm_i915_gem_object *obj,
> > > @@ -349,7 +411,8 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
> > >  static int
> > >  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> > >  				   struct eb_vmas *eb,
> > > -				   struct drm_i915_gem_relocation_entry *reloc)
> > > +				   struct drm_i915_gem_relocation_entry *reloc,
> > > +				   struct intel_context *ctx)
> > 
> > Hmm. That's a nuisance. But no, you only use it to automatically create
> > a fence not to patch the batch, so you can just use an object-flag.
> > 
> > This fits neatly into requests.
> 
> Most definitely.  What do you think of the potential upside in the DDX
> for this, assuming we get dword writes from batches working?

Negative. You now have relocation overhead, you still need to split
batches to keep the gpu busy and do ring switches, and context switching
between clients, so I don't feel a need for fences from inside a batch.

Getting seqno and a hws in the client would be nice, but if it continues
to require kernel polling, no thanks, I'll just still to approximately
tracking the active state of surfaces with the heavier accurate queries
sparingly.

About the only thing I could see as being useful is that it would allow
you to reuse a batch buffer multiple times, rather than overallocate a
whole page and keep a pool of such pages.

I am missing something?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre