[Intel-gfx] [PATCH 5/7] drm/i915/vlv: Increase the utilization of stolen memory on VLV.

Fri Jan 10 10:26:52 CET 2014

On Fri, Jan 10, 2014 at 08:40:34AM +0000, Goel, Akash wrote:
> >> For your usecase, have you tried to simply reduce the stolen area as
> >>  much as possible? Our friendly competition on ARM SoCs seems to have
> >>  mostly moved away from gfx reserved chunks towards more flexible
> >>  approaches like CMA. Giving stolen back to the linux page allocator
> >>  should be possible, but I haven't really looked into that yet all that
> >>  much ...
> 
> This is the first option we explored, as it would have made our task
> also simple. There is a Hw bug on BYT, due to which either the
> allocation of Stolen area can be completely disabled or the next
> allocation has to be of 64 MB only.  But this limitation will not be
> present on upcoming platforms.
> 
> >> - object_pin is the wrong place to change the backing storage since
> >>  someone else could already have pinned the it (e.g. through dma-buf). We
> >>  need to do this earlier before calling down into ->get_pages.
> 
> We had an option to allocate the backing storage from stolen area, at
> the time when a GEM object is associated with a User created frame
> buffer (drm_mode_addfb2).  But we often saw that pool of Frame buffer
> objects were getting created, but not all them were getting used. So we
> thought that it will be more optimal, if we reserve the stolen space
> when the object actually starts to gets used, i.e. when it is being
> mapped to GTT.  In order to handle the dma-buf case (for User frame
> buffers) we can add a new check, so as not to consider using stolen area
> for such objects.

Correct purgeable semantics with shared objects is still an open for
shared buffers. An option would be to introduce the concept of weak/strong
userspace references to dma-buf/gem objects and allow a buffer to be
purged if there's no strong reference. A bit more work though.

> >> - If we allow opportunistical placement of objects into stolen memory I
> >>  think we should also fully support purgeable objects so that we can
> >>  dynamically scale to workloads. Or at least to be able to kick out stuff
> >>  in case the kernel needs the contiguous memory for something more
> >>  important. So if we couldn't find a suitable stolen block we'd scan
> >>  through all objects with stolen memory backing and check whether any
> >>   purgeable objects could be kicked out.
> 
> Actually we didn't expected much value-add on having the
> purging/truncate logic for Frame buffer objects also allocated from
> stolen area.  We saw that Frame buffer objects were being used as shared
> objects only & not as local objects. So the cacheing/purging logic in
> libdrm will not really apply to them, until unless the gem_madvise ioctl
> call is used to truncate the objects.  But on our UFO (OGL & Media)
> drivers side, currently the gem_madvise ioctl call is not being used.
> So until the frame buffer object itself is destroyed, it cannot be
> purged before that.  On Android side, as the 'swap' is disabled, the
> physical space of the GEM objects cannot be reclaimed by releasing the
> ref count on the underlying Physical pages (put_pages).  The purging
> from the GEM shrinker side, will be really effective in relinquishing
> the backing physical space, only when the objects are marked as
> purgeable.  We can try to add the support to purge/truncate logic for
> the stolen objects, in order to create room in stolen space for a new
> frame buffer. 

Well the caching/purging will be interesting once we have use stolen in
general where it's useful, e.g. for accelarated media decode using large
page ptes.

> >> For upstream I think changing the personality of buffer objects behind
> >>  userspace's back is a bit too frisky wrt breaking something. I prefer if
> >>  userspace opts-in explicitly by passing a flag to the create ioctl
> >>  stating that stolen memory is allowed/preferred for this allocation.
> >>  Chris Wilson posted an rfc patch a while back to add a create2 ioctl
> >>  (which at the same time also allows us to set the caching, tiling and
> >>  any other object parameters).
> 
> Yes, agree that is not cleanest of a solution, but we didn't had an
> option of introducing a new API/interface.  But the change is limited
> only to User created frame buffer GEM objects.  What new constraints we
> will be introducing if we go ahead with this design for Frame buffers.
> The mmap_gtt interface can still be used for these GEM objects.

It breaks pwrite/pread and cpu mmaps. Yes, we've had old userspace that
uses this on X-tiled buffers and breaking them isn't an option for
upstream really.

One thing I don't understand is why you can't use a new ioctl. You've
emphasised this point a few times by now, but I don't understand the
reasons for it ... Please elaborate.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch