[Intel-gfx] [PATCH v7 0/5] Support for creating/using Stolen memory backed objects

Wed Sep 23 09:14:25 PDT 2015

On Wed, Sep 23, 2015 at 06:03:55PM +0200, Daniel Vetter wrote:
> On Wed, Sep 23, 2015 at 04:21:18PM +0530, ankitprasad.r.sharma at intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma at intel.com>
> > 
> > This patch series adds support for creating/using Stolen memory backed
> > objects.
> > 
> > Despite being a unified memory architecture (UMA) some bits of memory
> > are more equal than others. In particular we have the thorny issue of
> > stolen memory, memory stolen from the system by the BIOS and reserved
> > for igfx use. Stolen memory is required for some functions of the GPU
> > and display engine, but in general it goes wasted. Whilst we cannot
> > return it back to the system, we need to find some other method for
> > utilising it. As we do not support direct access to the physical address
> > in the stolen region, it behaves like a different class of memory,
> > closer in kin to local GPU memory. This strongly suggests that we need a
> > placement model like TTM if we are to fully utilize these discrete
> > chunks of differing memory.
> > 
> > To add support for creating Stolen memory backed objects, we extend the
> > drm_i915_gem_create structure, by adding a new flag through which user
> > can specify the preference to allocate the object from stolen memory,
> > which if set, an attempt will be made to allocate the object from stolen
> > memory subject to the availability of free space in the stolen region.
> > 
> > This patch series adds support for clearing buffer objects via CPU/GTT.
> > This is particularly useful for clearing out the memory from stolen
> > region, but can also be used for other shmem allocated objects. Currently
> > being used for buffers allocated in the stolen region. Also adding support
> > for stealing purgable stolen pages, if we run out of stolen memory when
> > trying to allocate an object.
> > 
> > v2: Added support for read/write from/to objects not backed by
> > shmem using the pread/pwrite interface.
> > Also extended the current get_aperture ioctl to retrieve the
> > total and available size of the stolen region
> > 
> > v3: Removed the extended get_aperture ioctl patch 5 (to be submitted as
> > part of other patch series), addressed comments by Chris about pread/pwrite
> > for non shmem backed objects
> > 
> > v4: Rebased to the latest drm-intel-nightly
> > 
> > v5: Addressed comments, replaced patch 1/4 "Clearing buffers via blitter
> > engine" by "Clearing buffers via CPU/GTT"
> > 
> > v6: Rebased to the latest drm-intel-nightly, Addressed comments, updated
> > stolen memory purging logic by maintaining a list for purgable stolen
> > memory objects, enabled pread/pwrite for all non-shmem backed objects
> > without tiling restrictions
> > 
> > v7: Addressed comments, compiler optimization, new patch added for correct
> > error code propagation to the userspace
> > 
> > This can be verified using IGT tests: igt/gem_stolen, igt/gem_create
> > 
> > Ankitprasad Sharma (4):
> >   drm/i915: Clearing buffer objects via CPU/GTT
> >   drm/i915: Support for creating Stolen memory backed objects
> >   drm/i915: Support for pread/pwrite from/to non shmem backed objects
> >   drm/i915: Propagating correct error codes to the userspace
> > 
> > Chris Wilson (1):
> >   drm/i915: Add support for stealing purgable stolen pages
> 
> Hm, where's the patch to evict stolen objects to sysmem over
> hibernate-to-disk that Chris raised? I guess we need this to avoid
> breaking generic linux distros (and atm that's the only open-source user
> afaics).

As far as users go, I'm dubious as to the merits of using stolen (and so
have not written patches for the ddx/mesa) simply because we do not have
CPU access to them and so that excludes using all of the fast paths and
general flexibility. And I have complications like if I allocate a buffer
from stolen I need to migrate it if it is exported to a client over
DRI2/DRI3 (because I can't communicate that it is not first class). For
internal auxiliary buffers (which aren't that many as they get recycled
quickly like vertex/batch/instruction/temporary buffers), the quandary is
to save a few hundred KiB of memory or stick to fast access along generic
paths.

The no-CPU access also ends up severely limiting what we can use stolen
for inside the kernel as well (or at least makes it much more
complicated than need be). Totally snafu.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre