[Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context backing objects to GGTT on-demand

Tue Nov 18 16:32:46 CET 2014

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, November 18, 2014 3:11 PM
> To: Daniel, Thomas
> Cc: Daniel Vetter; intel-gfx at lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context
> backing objects to GGTT on-demand
> 
> On Tue, Nov 18, 2014 at 02:51:52PM +0000, Daniel, Thomas wrote:
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of
> > > Daniel Vetter
> > > Sent: Tuesday, November 18, 2014 2:33 PM
> > > To: Daniel, Thomas
> > > Cc: Daniel Vetter; intel-gfx at lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > context backing objects to GGTT on-demand
> > >
> > > On Tue, Nov 18, 2014 at 10:48:09AM +0000, Daniel, Thomas wrote:
> > > > > -----Original Message-----
> > > > > From: Intel-gfx [mailto:intel-gfx-bounces at lists.freedesktop.org]
> > > > > On Behalf Of Daniel, Thomas
> > > > > Sent: Tuesday, November 18, 2014 9:28 AM
> > > > > To: Daniel Vetter
> > > > > Cc: intel-gfx at lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > > context backing objects to GGTT on-demand
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf
> > > > > > Of Daniel Vetter
> > > > > > Sent: Monday, November 17, 2014 6:09 PM
> > > > > > To: Daniel, Thomas
> > > > > > Cc: intel-gfx at lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > > > context backing objects to GGTT on-demand
> > > > > >
> > > > > > On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > > > > > b/drivers/gpu/drm/i915/i915_drv.h index 059330c..3c7299d
> > > > > > > 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > > > > @@ -655,6 +655,7 @@ struct intel_context {
> > > > > > >  	struct {
> > > > > > >  		struct drm_i915_gem_object *state;
> > > > > > >  		struct intel_ringbuffer *ringbuf;
> > > > > > > +		int unpin_count;
> > > > > >
> > > > > > Pinning is already refcounted. Why this additional refcount?
> > > > >
> > > > > The vma.pin_count is only allocated 4 bits of storage.  If this
> > > > > restriction can be lifted then I can use that.
> > >
> > > Those 4 bits are good enough for legacy contexts, so I wonder a bit
> > > what's so massively different for execlist contexts.
> > With execlists, in order to dynamically unpin the LRC backing object
> > and ring buffer object when not required we take a reference for each
> > execlist request that uses them (remember that the execlist request
> > lifecycle is currently different from the execbuffer request).  This
> > can be a lot, especially in some of the less sane i-g-t tests.
> 
> Why?
> 
> Presuming the buffer objects is properly pushed onto the active list you only
> need to pin while doing the command submission up to the point where
> you've committed the buffer object to the active list.
This is not currently the case.  Using the active list for context object management is one of the refactoring tasks, as we agreed.

> I know documentation sucks for this stuff since I have this discussion with
> roughly everyone ever touching anything related to active buffers :( If you
> want some recent examples the cmd parser's shadow batch should serve
> well (including the entire evolution from reinvented wheel to just using the
> active list, although the latest patches are only 90% there and still have 1-2
> misplaced pieces).
> 
> > > > Actually I just tried to implement this, it causes a problem for
> > > > patch
> > > > 4 of this set as the unpin_count is also used for the ringbuffer
> > > > object which has an ioremap as well as a ggtt pin.
> > >
> > > Yeah, ioremap needs to be redone every time we pin/unpin. But on
> > > sane archs it's almost no overhead really. And if this does start to
> > > matter (shudder for 32bit kernels on gen8) then we can fix it ...
> > Hm, so the CPU vaddr of the ring buffer will move around as more
> > requests reference it which I suppose is not a problem.  We will use a
> > lot of address space (again, especially with the i-g-t stress tests
> > which can submit tens of thousands of requests in a very short space
> > of time).  What would the fix be?  An extra reference count for the
> > ioremap?  Looks familiar :)
> 
> ioremap always gives you the same linear address on 64bit kernels. On 32bit
> it makes a new one, but if you ioremap for each request it'll fall over anyway.
Ah, I didn't know that ioremap behaved like that.

> The solution would be to ioremap just the required pages using the atomic
> kmap stuff wrapped up into the io_mapping stuff.
> 
> > I still think it's best to keep the context unpin_count for execlists mode.
> 
> Well just means the todo-list to fix up execlist grows longer.
That's OK from my point of view, this may go away anyway with some of the refactoring.  The (strong) direction I'm getting from the management is that they want these merged ASAP.

Cheers,
Thomas.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch