[Intel-gfx] [PATCH 5/7] drm/i915/vlv: Increase the utilization of stolen memory on VLV.

Mon Jan 13 10:17:50 CET 2014

>> It breaks pwrite/pread and cpu mmaps. Yes, we've had old userspace that uses this on X-tiled buffers and breaking them isn't an option for upstream really.
>> One thing I don't understand is why you can't use a new ioctl. You've emphasised this point a few times by now, but I don't understand the reasons for it ... Please elaborate.

Since pwrite/pread interfaces are being handled completely by Driver, we can modify their implementation to add support for Stolen buffers also (as Driver can access the stolen contents through the aperture space).
It's the CPU mmaps which will still remain broken. But is User space code aware & capable of handling the bit17 swizzling (since currently driver handle that for pread/pwrite), even though they can handle the X-Tile to Linear conversion ?

We were reluctant to modify the User space Driver code, to introduce a new ioctl for using the stolen area as a backing store. 
Actually Stolen memory is acting as a dedicated graphics memory, which can be used by the Driver. 
Our understanding that since kernel mode Driver is acting as a Graphics memory manager, it shall have the prerogative to optimally decide when to use the System memory(shmem) or Graphics memory for the allocation of Graphics resources. We were apprehensive about exposing the stolen memory interface, as we thought that then Users will always prefer to use the dedicated memory for their allocations which may not be optimal.

Best Regards
Akash 

-----Original Message-----
From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel Vetter
Sent: Friday, January 10, 2014 2:57 PM
To: Goel, Akash
Cc: Daniel Vetter; Vetter, Daniel; intel-gfx at lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 5/7] drm/i915/vlv: Increase the utilization of stolen memory on VLV.

On Fri, Jan 10, 2014 at 08:40:34AM +0000, Goel, Akash wrote:
> >> For your usecase, have you tried to simply reduce the stolen area 
> >> as  much as possible? Our friendly competition on ARM SoCs seems to 
> >> have  mostly moved away from gfx reserved chunks towards more 
> >> flexible  approaches like CMA. Giving stolen back to the linux page 
> >> allocator  should be possible, but I haven't really looked into 
> >> that yet all that  much ...
> 
> This is the first option we explored, as it would have made our task 
> also simple. There is a Hw bug on BYT, due to which either the 
> allocation of Stolen area can be completely disabled or the next 
> allocation has to be of 64 MB only.  But this limitation will not be 
> present on upcoming platforms.
> 
> >> - object_pin is the wrong place to change the backing storage since  
> >> someone else could already have pinned the it (e.g. through 
> >> dma-buf). We  need to do this earlier before calling down into ->get_pages.
> 
> We had an option to allocate the backing storage from stolen area, at 
> the time when a GEM object is associated with a User created frame 
> buffer (drm_mode_addfb2).  But we often saw that pool of Frame buffer 
> objects were getting created, but not all them were getting used. So 
> we thought that it will be more optimal, if we reserve the stolen 
> space when the object actually starts to gets used, i.e. when it is 
> being mapped to GTT.  In order to handle the dma-buf case (for User 
> frame
> buffers) we can add a new check, so as not to consider using stolen 
> area for such objects.

Correct purgeable semantics with shared objects is still an open for shared buffers. An option would be to introduce the concept of weak/strong userspace references to dma-buf/gem objects and allow a buffer to be purged if there's no strong reference. A bit more work though.

> >> - If we allow opportunistical placement of objects into stolen 
> >> memory I  think we should also fully support purgeable objects so 
> >> that we can  dynamically scale to workloads. Or at least to be able 
> >> to kick out stuff  in case the kernel needs the contiguous memory 
> >> for something more  important. So if we couldn't find a suitable 
> >> stolen block we'd scan  through all objects with stolen memory backing and check whether any
> >>   purgeable objects could be kicked out.
> 
> Actually we didn't expected much value-add on having the 
> purging/truncate logic for Frame buffer objects also allocated from 
> stolen area.  We saw that Frame buffer objects were being used as 
> shared objects only & not as local objects. So the cacheing/purging 
> logic in libdrm will not really apply to them, until unless the 
> gem_madvise ioctl call is used to truncate the objects.  But on our 
> UFO (OGL & Media) drivers side, currently the gem_madvise ioctl call is not being used.
> So until the frame buffer object itself is destroyed, it cannot be 
> purged before that.  On Android side, as the 'swap' is disabled, the 
> physical space of the GEM objects cannot be reclaimed by releasing the 
> ref count on the underlying Physical pages (put_pages).  The purging 
> from the GEM shrinker side, will be really effective in relinquishing 
> the backing physical space, only when the objects are marked as 
> purgeable.  We can try to add the support to purge/truncate logic for 
> the stolen objects, in order to create room in stolen space for a new 
> frame buffer.

Well the caching/purging will be interesting once we have use stolen in general where it's useful, e.g. for accelarated media decode using large page ptes.

> >> For upstream I think changing the personality of buffer objects 
> >> behind  userspace's back is a bit too frisky wrt breaking 
> >> something. I prefer if  userspace opts-in explicitly by passing a 
> >> flag to the create ioctl  stating that stolen memory is allowed/preferred for this allocation.
> >>  Chris Wilson posted an rfc patch a while back to add a create2 
> >> ioctl  (which at the same time also allows us to set the caching, 
> >> tiling and  any other object parameters).
> 
> Yes, agree that is not cleanest of a solution, but we didn't had an 
> option of introducing a new API/interface.  But the change is limited 
> only to User created frame buffer GEM objects.  What new constraints 
> we will be introducing if we go ahead with this design for Frame buffers.
> The mmap_gtt interface can still be used for these GEM objects.

It breaks pwrite/pread and cpu mmaps. Yes, we've had old userspace that uses this on X-tiled buffers and breaking them isn't an option for upstream really.

One thing I don't understand is why you can't use a new ioctl. You've emphasised this point a few times by now, but I don't understand the reasons for it ... Please elaborate.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch