[Intel-gfx] [PATCH 1/4] drm/i915: Clearing buffer objects via blitter engine

Tue Jul 7 00:42:11 PDT 2015

On Thu, 2015-07-02 at 10:50 +0100, Chris Wilson wrote:
> On Thu, Jul 02, 2015 at 10:30:43AM +0100, Tvrtko Ursulin wrote:
> > Well.. I the meantime why duplicate it when
> > i915_gem_validate_context does i915_gem_context_get and deferred
> > create if needed. I don't see the downside. Snippet from above
> > becomes:
> > 
> >   ring = &dev_priv->ring[HAS_BLT(dev) ? BCS : RCS];
> >   ctx = i915_gem_validate_context(dev, file, ring,
> > 				DFAULT_CONTEXT_HANDLE);
> >   ... handle error...
> >   /* Allocate a request for this operation nice and early. */
> >   ret = i915_gem_request_alloc(ring, ctx, &req);
> > 
> > Why would this code have to know about deferred create.
> 
> This one is irrelevant. Since the default_context is always allocated
> and available via the ring, we don't need to pretend we are inside
> userspace and do the idr lookup and validation, we can just use it
> directly.
> 
> > >>Why is this needed?
> > >
> > >Because it's a requirement of the op being used on those gen. Later gen
> > >can keep the fence.
> > >
> > >>Could it be called unconditionally and still work?
> > >>
> > >>At least I would recommend a comment explaining it.
> > 
> > It is ugly to sprinkle platform knowledge to the callers - I think I
> > saw a callsites which call i915_gem_object_put_fence unconditionally
> > so why would that not work here?
> 
> That's access via the GTT though. This is access via the GPU using GPU
> instructions, which sometimes use fences and sometimes don't. That
> knowledge is already baked into the choice of command.
> 
> What I would actually support would be to just use CPU GTT clearing.
But, we have verified earlier here that for large objects GPU clearing
is faster (here hoping that stolen region will be used mainly for
framebuffers). Is it ok to do this conditionally based on the size of
the objects? and GPU clearing is asynchronous.
> That will run at memory speeds, only stall for a small fraction of the
> second, and later if the workloads demand it, we can do GPU clearing.
Also how do you suggest to bring the workload in as a deciding factor?
may be checking the current running frequency or based on the number of
pending requests?
Or are you suggesting to use CPU GTT clearing completely?
> 
> It's much simpler, and I would say for any real workload the stolen
> objects will be cached to avoid reallocations.
> -Chris
> 


Thanks,
Ankit