[Intel-gfx] [PATCH 1/4] drm/i915: Clearing buffer objects via blitter engine

Tue Jul 7 01:52:03 PDT 2015

On Tue, 2015-07-07 at 09:46 +0100, Chris Wilson wrote:
> On Tue, Jul 07, 2015 at 01:12:11PM +0530, Ankitprasad Sharma wrote:
> > On Thu, 2015-07-02 at 10:50 +0100, Chris Wilson wrote:
> > > On Thu, Jul 02, 2015 at 10:30:43AM +0100, Tvrtko Ursulin wrote:
> > > > Well.. I the meantime why duplicate it when
> > > > i915_gem_validate_context does i915_gem_context_get and deferred
> > > > create if needed. I don't see the downside. Snippet from above
> > > > becomes:
> > > > 
> > > >   ring = &dev_priv->ring[HAS_BLT(dev) ? BCS : RCS];
> > > >   ctx = i915_gem_validate_context(dev, file, ring,
> > > > 				DFAULT_CONTEXT_HANDLE);
> > > >   ... handle error...
> > > >   /* Allocate a request for this operation nice and early. */
> > > >   ret = i915_gem_request_alloc(ring, ctx, &req);
> > > > 
> > > > Why would this code have to know about deferred create.
> > > 
> > > This one is irrelevant. Since the default_context is always allocated
> > > and available via the ring, we don't need to pretend we are inside
> > > userspace and do the idr lookup and validation, we can just use it
> > > directly.
> > > 
> > > > >>Why is this needed?
> > > > >
> > > > >Because it's a requirement of the op being used on those gen. Later gen
> > > > >can keep the fence.
> > > > >
> > > > >>Could it be called unconditionally and still work?
> > > > >>
> > > > >>At least I would recommend a comment explaining it.
> > > > 
> > > > It is ugly to sprinkle platform knowledge to the callers - I think I
> > > > saw a callsites which call i915_gem_object_put_fence unconditionally
> > > > so why would that not work here?
> > > 
> > > That's access via the GTT though. This is access via the GPU using GPU
> > > instructions, which sometimes use fences and sometimes don't. That
> > > knowledge is already baked into the choice of command.
> > > 
> > > What I would actually support would be to just use CPU GTT clearing.
> > But, we have verified earlier here that for large objects GPU clearing
> > is faster (here hoping that stolen region will be used mainly for
> > framebuffers). Is it ok to do this conditionally based on the size of
> > the objects? and GPU clearing is asynchronous.
> 
> Hmm, this will be the GTT overhead which we can't avoid since we are
> banned from accessing stolen directly (on byt).
> 
> Honestly this is the ugliest patch in the series. If we can do without
> it it would make accepting it much easier. And then you have a nice
> performance argument for doing via the blitter. Though I would be
> surprised if the userspace cache was doing such a bad job that frequent
> reallocations from stolen were required.
> 
> > > That will run at memory speeds, only stall for a small fraction of the
> > > second, and later if the workloads demand it, we can do GPU clearing.
> > Also how do you suggest to bring the workload in as a deciding factor?
> > may be checking the current running frequency or based on the number of
> > pending requests?
> > Or are you suggesting to use CPU GTT clearing completely?
> > > 
> > > It's much simpler, and I would say for any real workload the stolen
> > > objects will be cached to avoid reallocations.
> 
> Yes. A quick patch to do an unconditional memset() of a pinned GTT
> stolen object should be about 20 lines. For the benefit of getting
> create2 upsteam with little fuss, I think that is worth it.
So this ugly patch itself will go away :( and I will add a new function
in  i915_gem_stolen.c to do CPU (GTT) based clearing of the object in
stolen. 
>