[Intel-gfx] [PATCH 1/4] drm/i915: Clearing buffer objects via blitter engine

Tue Jul 7 01:46:26 PDT 2015

On Tue, Jul 07, 2015 at 01:12:11PM +0530, Ankitprasad Sharma wrote:
> On Thu, 2015-07-02 at 10:50 +0100, Chris Wilson wrote:
> > On Thu, Jul 02, 2015 at 10:30:43AM +0100, Tvrtko Ursulin wrote:
> > > Well.. I the meantime why duplicate it when
> > > i915_gem_validate_context does i915_gem_context_get and deferred
> > > create if needed. I don't see the downside. Snippet from above
> > > becomes:
> > > 
> > >   ring = &dev_priv->ring[HAS_BLT(dev) ? BCS : RCS];
> > >   ctx = i915_gem_validate_context(dev, file, ring,
> > > 				DFAULT_CONTEXT_HANDLE);
> > >   ... handle error...
> > >   /* Allocate a request for this operation nice and early. */
> > >   ret = i915_gem_request_alloc(ring, ctx, &req);
> > > 
> > > Why would this code have to know about deferred create.
> > 
> > This one is irrelevant. Since the default_context is always allocated
> > and available via the ring, we don't need to pretend we are inside
> > userspace and do the idr lookup and validation, we can just use it
> > directly.
> > 
> > > >>Why is this needed?
> > > >
> > > >Because it's a requirement of the op being used on those gen. Later gen
> > > >can keep the fence.
> > > >
> > > >>Could it be called unconditionally and still work?
> > > >>
> > > >>At least I would recommend a comment explaining it.
> > > 
> > > It is ugly to sprinkle platform knowledge to the callers - I think I
> > > saw a callsites which call i915_gem_object_put_fence unconditionally
> > > so why would that not work here?
> > 
> > That's access via the GTT though. This is access via the GPU using GPU
> > instructions, which sometimes use fences and sometimes don't. That
> > knowledge is already baked into the choice of command.
> > 
> > What I would actually support would be to just use CPU GTT clearing.
> But, we have verified earlier here that for large objects GPU clearing
> is faster (here hoping that stolen region will be used mainly for
> framebuffers). Is it ok to do this conditionally based on the size of
> the objects? and GPU clearing is asynchronous.

Hmm, this will be the GTT overhead which we can't avoid since we are
banned from accessing stolen directly (on byt).

Honestly this is the ugliest patch in the series. If we can do without
it it would make accepting it much easier. And then you have a nice
performance argument for doing via the blitter. Though I would be
surprised if the userspace cache was doing such a bad job that frequent
reallocations from stolen were required.

> > That will run at memory speeds, only stall for a small fraction of the
> > second, and later if the workloads demand it, we can do GPU clearing.
> Also how do you suggest to bring the workload in as a deciding factor?
> may be checking the current running frequency or based on the number of
> pending requests?
> Or are you suggesting to use CPU GTT clearing completely?
> > 
> > It's much simpler, and I would say for any real workload the stolen
> > objects will be cached to avoid reallocations.

Yes. A quick patch to do an unconditional memset() of a pinned GTT
stolen object should be about 20 lines. For the benefit of getting
create2 upsteam with little fuss, I think that is worth it.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre