[Intel-gfx] [PATCH] drm/i915: s/seqno/request/ tracking inside objects

Tue Jul 29 09:29:53 CEST 2014

On Mon, Jul 28, 2014 at 01:44:12PM -0700, Jesse Barnes wrote:
> > +static void i915_request_retire(struct i915_gem_request *rq)
> >  {
> > -	list_del(&request->list);
> > -	i915_gem_request_remove_from_client(request);
> > +	rq->completed = true;
> > +
> > +	list_del(&rq->list);
> > +	i915_gem_request_remove_from_client(rq);
> >  
> > -	if (request->ctx)
> > -		i915_gem_context_unreference(request->ctx);
> > +	if (rq->ctx) {
> > +		i915_gem_context_unreference(rq->ctx);
> > +		rq->ctx = NULL;
> > +	}
> >  
> > -	kfree(request);
> > +	i915_request_put(rq);
> >  }
> 
> I wonder if we can somehow always use this function instead of having
> both obj and request retire functions.  Maybe if we moved obj retire
> into the rendering sync functions we'd have a little less pseudo
> duplication.

I kept it as it was as I was trying to avoid increasing the number of
calls to ring->get_seqno. Elsewhere converting to a fence is good at
reducing the number of calls (as that completed:1 is likely to catch
most cases). But here I think we really do want to treat this en masse.

> request_retire does a put, where is the corresponding get so that we
> don't lose the object when the lock is dropped here or in the
> nonblocking wait?  Or should request_retire not do a put?

Ownership flows from:
  intel_ring_begin -> i915_add_request -> retire_request.
Everyone else has to grab their own references.

> > +static int
> > +i915_request_sync(struct i915_gem_request *rq,
> > +		  struct intel_engine_cs *to,
> > +		  struct drm_i915_gem_object *obj)
> > +{
> > +	int ret, idx;
> > +
> > +	if (to == NULL)
> > +		return i915_wait_request(rq);
> > +
> > +	/* XXX this is broken by VEBOX+ */
> 
> What does this comment mean?  How does VEBOX break this?  Does it not
> have semaphore support or something?

It's a very old comment. In theory it should have been fixed by the recent
gen8 semaphore work.

> > @@ -3038,44 +3203,35 @@ out:
> >   */
> >  int
> >  i915_gem_object_sync(struct drm_i915_gem_object *obj,
> > -		     struct intel_engine_cs *to)
> > +		     struct intel_engine_cs *to,
> > +		     bool readonly)
> >  {
> 
> It might be nice to have sync_read/sync_write functions instead, since
> looking up bool args or unnamed enums is a pain.

Not convinced since it is used in a single location and two function
calls would look unelegant - but we could switch to PROT_READ |
PROT_WRITE throughout.

> >  static bool
> >  ring_idle(struct intel_engine_cs *ring, u32 seqno)
> >  {
> >  	return (list_empty(&ring->request_list) ||
> > -		i915_seqno_passed(seqno, ring_last_seqno(ring)));
> > +		__i915_seqno_passed(seqno, ring_last_seqno(ring)));
> >  }
> 
> Unrelated question: why do we have two checks here?  Shouldn't the
> seqno check always be sufficient?  Or is the list_empty check for the
> uninitialized case?

We can't test the seqno without testing that the pointer is valid -
that's the first check. And since we retire lazily we cannot rely on the
request_list being empty itself.

> >  static int
> > -intel_ring_alloc_seqno(struct intel_engine_cs *ring)
> > +intel_ring_alloc_request(struct intel_engine_cs *ring)
> >  {
> > -	if (ring->outstanding_lazy_seqno)
> > -		return 0;
> > +	struct i915_gem_request *rq;
> > +	int ret;
> >  
> > -	if (ring->preallocated_lazy_request == NULL) {
> > -		struct drm_i915_gem_request *request;
> > +	if (ring->preallocated_request)
> > +		return 0;
> >  
> > -		request = kmalloc(sizeof(*request), GFP_KERNEL);
> > -		if (request == NULL)
> > -			return -ENOMEM;
> > +	rq = kmalloc(sizeof(*rq), GFP_KERNEL);
> > +	if (rq == NULL)
> > +		return -ENOMEM;
> >  
> > -		ring->preallocated_lazy_request = request;
> > +	ret = i915_gem_get_seqno(ring->dev, &rq->seqno);
> > +	if (ret) {
> > +		kfree(rq);
> > +		return ret;
> >  	}
> >  
> > -	return i915_gem_get_seqno(ring->dev,
> > &ring->outstanding_lazy_seqno);
> > +	kref_init(&rq->kref);
> > +	rq->ring = ring;
> > +	rq->completed = false;
> > +
> > +	ring->preallocated_request = rq;
> > +	return 0;
> >  }
> 
> Makes me wonder if we should have our own slab for the objs.  Might
> save a bit of mem and/or perf?  But then could reduce our cache hit
> rate, dunno.

It's just a case of whether we are in a general slab or a named slab.
Having a named slab is nice for debugging, and we definitely have a high
enough reuse rate to justify our own slab.

Elsewhere some useful comments deleted to save electrons :-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre