[Intel-gfx] [PATCH] drm/i915: s/seqno/request/ tracking inside objects
Chris Wilson
chris at chris-wilson.co.uk
Tue Jul 29 09:29:53 CEST 2014
On Mon, Jul 28, 2014 at 01:44:12PM -0700, Jesse Barnes wrote:
> > +static void i915_request_retire(struct i915_gem_request *rq)
> > {
> > - list_del(&request->list);
> > - i915_gem_request_remove_from_client(request);
> > + rq->completed = true;
> > +
> > + list_del(&rq->list);
> > + i915_gem_request_remove_from_client(rq);
> >
> > - if (request->ctx)
> > - i915_gem_context_unreference(request->ctx);
> > + if (rq->ctx) {
> > + i915_gem_context_unreference(rq->ctx);
> > + rq->ctx = NULL;
> > + }
> >
> > - kfree(request);
> > + i915_request_put(rq);
> > }
>
> I wonder if we can somehow always use this function instead of having
> both obj and request retire functions. Maybe if we moved obj retire
> into the rendering sync functions we'd have a little less pseudo
> duplication.
I kept it as it was as I was trying to avoid increasing the number of
calls to ring->get_seqno. Elsewhere converting to a fence is good at
reducing the number of calls (as that completed:1 is likely to catch
most cases). But here I think we really do want to treat this en masse.
> request_retire does a put, where is the corresponding get so that we
> don't lose the object when the lock is dropped here or in the
> nonblocking wait? Or should request_retire not do a put?
Ownership flows from:
intel_ring_begin -> i915_add_request -> retire_request.
Everyone else has to grab their own references.
> > +static int
> > +i915_request_sync(struct i915_gem_request *rq,
> > + struct intel_engine_cs *to,
> > + struct drm_i915_gem_object *obj)
> > +{
> > + int ret, idx;
> > +
> > + if (to == NULL)
> > + return i915_wait_request(rq);
> > +
> > + /* XXX this is broken by VEBOX+ */
>
> What does this comment mean? How does VEBOX break this? Does it not
> have semaphore support or something?
It's a very old comment. In theory it should have been fixed by the recent
gen8 semaphore work.
> > @@ -3038,44 +3203,35 @@ out:
> > */
> > int
> > i915_gem_object_sync(struct drm_i915_gem_object *obj,
> > - struct intel_engine_cs *to)
> > + struct intel_engine_cs *to,
> > + bool readonly)
> > {
>
> It might be nice to have sync_read/sync_write functions instead, since
> looking up bool args or unnamed enums is a pain.
Not convinced since it is used in a single location and two function
calls would look unelegant - but we could switch to PROT_READ |
PROT_WRITE throughout.
> > static bool
> > ring_idle(struct intel_engine_cs *ring, u32 seqno)
> > {
> > return (list_empty(&ring->request_list) ||
> > - i915_seqno_passed(seqno, ring_last_seqno(ring)));
> > + __i915_seqno_passed(seqno, ring_last_seqno(ring)));
> > }
>
> Unrelated question: why do we have two checks here? Shouldn't the
> seqno check always be sufficient? Or is the list_empty check for the
> uninitialized case?
We can't test the seqno without testing that the pointer is valid -
that's the first check. And since we retire lazily we cannot rely on the
request_list being empty itself.
> > static int
> > -intel_ring_alloc_seqno(struct intel_engine_cs *ring)
> > +intel_ring_alloc_request(struct intel_engine_cs *ring)
> > {
> > - if (ring->outstanding_lazy_seqno)
> > - return 0;
> > + struct i915_gem_request *rq;
> > + int ret;
> >
> > - if (ring->preallocated_lazy_request == NULL) {
> > - struct drm_i915_gem_request *request;
> > + if (ring->preallocated_request)
> > + return 0;
> >
> > - request = kmalloc(sizeof(*request), GFP_KERNEL);
> > - if (request == NULL)
> > - return -ENOMEM;
> > + rq = kmalloc(sizeof(*rq), GFP_KERNEL);
> > + if (rq == NULL)
> > + return -ENOMEM;
> >
> > - ring->preallocated_lazy_request = request;
> > + ret = i915_gem_get_seqno(ring->dev, &rq->seqno);
> > + if (ret) {
> > + kfree(rq);
> > + return ret;
> > }
> >
> > - return i915_gem_get_seqno(ring->dev,
> > &ring->outstanding_lazy_seqno);
> > + kref_init(&rq->kref);
> > + rq->ring = ring;
> > + rq->completed = false;
> > +
> > + ring->preallocated_request = rq;
> > + return 0;
> > }
>
> Makes me wonder if we should have our own slab for the objs. Might
> save a bit of mem and/or perf? But then could reduce our cache hit
> rate, dunno.
It's just a case of whether we are in a general slab or a named slab.
Having a named slab is nice for debugging, and we definitely have a high
enough reuse rate to justify our own slab.
Elsewhere some useful comments deleted to save electrons :-)
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list