[Intel-gfx] [PATCH] Always mark GEM objects as dirty when written by the CPU

Daniel Vetter daniel at ffwll.ch
Thu Dec 10 00:52:01 PST 2015

On Mon, Dec 07, 2015 at 12:04:18PM +0000, Dave Gordon wrote:
> On 07/12/15 08:29, Daniel Vetter wrote:
> >On Fri, Dec 04, 2015 at 05:28:29PM +0000, Dave Gordon wrote:
> >>On 04/12/15 09:57, Daniel Vetter wrote:
> >>>On Tue, Dec 01, 2015 at 01:21:07PM +0000, Dave Gordon wrote:
> >>>>On 01/12/15 13:04, Chris Wilson wrote:
> >>>>>On Tue, Dec 01, 2015 at 12:42:02PM +0000, Dave Gordon wrote:
> >>>>>>In various places, one or more pages of a GEM object are mapped into CPU
> >>>>>>address space and updated. In each such case, the object should be
> >>>>>>marked dirty, to ensure that the modifications are not discarded if the
> >>>>>>object is evicted under memory pressure.
> >>>>>>
> >>>>>>This is similar to commit
> >>>>>>	commit 51bc140431e233284660b1d22c47dec9ecdb521e
> >>>>>>	Author: Chris Wilson <chris at chris-wilson.co.uk>
> >>>>>>	Date:   Mon Aug 31 15:10:39 2015 +0100
> >>>>>>	drm/i915: Always mark the object as dirty when used by the GPU
> >>>>>>
> >>>>>>in which Chris ensured that updates by the GPU were not lost due to
> >>>>>>eviction, but this patch applies instead to the multiple places where
> >>>>>>object content is updated by the host CPU.
> >>>>>
> >>>>>Apart from that commit was to mask userspace bugs, here we are under
> >>>>>control of when the pages are marked and have chosen a different
> >>>>>per-page interface for CPU writes as opposed to per-object.
> >>>>>-Chris
> >>>>
> >>>>The pattern
> >>>>	get_pages();
> >>>>	kmap(get_page())
> >>>>	write
> >>>>	kunmap()
> >>>>occurs often enough that it might be worth providing a common function to do
> >>>>that and mark only the specific page dirty (other cases touch the whole
> >>>>object, so for those we can just set the obj->dirty flag and let put_pages()
> >>>>take care of propagating that to all the individual pages).
> >>>>
> >>>>But can we be sure that all the functions touched by this patch will operate
> >>>>only on regular (default) GEM objects (i.e. not phys, stolen, etc) 'cos some
> >>>>of those don't support per-page tracking. What about objects with no backing
> >>>>store -- can/should we mark those as dirty (which would prevent eviction)?
> >>>
> >>>I thought our special objects do clear obj->dirty on put_pages? Can you
> >>>please elaborate on your concern?
> >>>
> >>>While we discuss all this: A patch at the end to document dirty (maybe
> >>>even as a first stab at kerneldoc for i915_drm_gem_buffer_object) would be
> >>>awesome.
> >>>-Daniel
> >>
> >>In general, obj->dirty means that (some or) all the pages of the object
> >>(may) have been modified since last time the object was read from backing
> >>store, and that the modified data should be written back rather than
> >>discarded.
> >>
> >>Code that works only on default (gtt) GEM objects may be able to optimise
> >>writebacks by marking individual pages dirty, rather than the object as a
> >>whole. But not every GEM object has backing store, and even among those that
> >>do, some do not support per-page dirty tracking.
> >>
> >>These are the GEM objects we may want to consider:
> >>
> >>1. Default (gtt) object
> >>    * Discontiguous, lives in page cache while pinned during use
> >>    * Backed by shmfs (swap)
> >>    * put_pages() transfers dirty status from object to each page
> >>      before release
> >>    * shmfs ensures that dirty unpinned pages are written out
> >>      before deallocation
> >>    * Could optimise by marking individual pages at point of use,
> >>      rather than marking whole object and then pushing to all pages
> >>      during put_pages()
> >>
> >>2. Phys GEM object
> >>    * Lives in physically-contiguous system memory, pinned during use
> >>    * Backed by shmfs
> >>    * if obj->dirty, put_pages() *copies* all pages back to shmfs via
> >>      page cache RMW
> >>    * No per-page tracking, cannot optimise
> >>
> >>3. Stolen GEM object
> >>    * Lives in (physically-contiguous) stolen memory, always pinned
> >>    * No backing store!
> >>    * obj->dirty is irrelevant (ignored)
> >>    * put_pages() only called at end-of-life
> >>    * No per-page tracking (not meaningful anyway)
> >>
> >>4. Userptr GEM object
> >>    * Discontiguous, lives in page cache while pinned during use
> >>    * Backed by user process memory (which may then map to some
> >>      arbitrary file mapping?)
> >>    * put_pages() transfers dirty status from object to each page
> >>      before release
> >>    * dirty pages are still resident in user space, can be swapped
> >>      out when not pinned
> >>    * Could optimise by marking individual pages at point of use,
> >>      rather than marking whole object and then pushing to all pages
> >>      during put_pages()
> >>
> >>Are there any more?
> >>
> >>Given this diversity, it may be worth adding a dirty_page() vfunc, so that
> >>for those situations where a single page is dirtied AND the object type
> >>supports per-page tracking, we can take advantage of this to reduce copying.
> >>For objects that don't support per-page tracking, the implementation would
> >>just set obj->dirty.
> >>
> >>For example:
> >>     void (*dirty_page)(obj, pageno);
> >>possibly with the additional semantic that pageno == -1 means 'dirty the
> >>whole object'.
> >>
> >>A convenient further facility would be:
> >>     struct page *i915_gem_object_get_dirty_page(obj, pageno);
> >>which is just like i915_gem_object_get_page() but with the additional effect
> >>of marking the returned page dirty (by calling the above vfunc).
> >>[Aside: can we call set_page_dirty() on a non-shmfs-backed page?].
> >>
> >>This means that in all the places where I added 'obj->dirty = 1' after a
> >>kunmap() call, we would instead just change the earlier get_page() to
> >>get_dirty_page() instead, which provides better layering.
> >>
> >>Together these changes mean that obj->dirty would then be a purely private
> >>member for use by implementations of get_pages/put_pages().
> >>
> >>Opinions?
> >
> >Hm, I thought we've been careful with checking that an object is somehow
> >backed by struct pages, and only use the page-wise access if that's the
> >case. But looking at the execbuf relocate code we've probably already
> >screwed this up, or at least will when we expose stolen to userspace.
> >Userptr should still work (since ultimately it's struct page backed), and
> >phys gem object doesn't matter (if you but relocs into your cursor on
> >gen2-4.0 you get all the pieces). I think step one would be more nasty
> >test coverage, at least for the execbuf path.
> >
> >The other page-wise access path seem all internal, so I'm much less
> >worried about those.
> >-Daniel
> So does this mean that i915_pages_create_for_stolen() isn't really doing
> what it says? After that function has been called, obj->pages is filled in -
> but is it then valid to call i915_gem_object_get_page() ?
> That returns a pointer to the (preexisting) entry in the system page tables
> for the specified page, but isn't the anomalous thing about stolen memory
> the fact that the kernel doesn't know about it and doesn't include it in its
> page tables at all?

We use sg tables as a dual-purpose thing, both to look up struct pages and
to get at the device dma address. For stolen we only fill out one side of
this though, so it's not legal to call get_page on it.

> For kmap purposes, we don't really need the 'struct page' as we could use
> kmap_atomic_pfn() instead. So maybe to make stolen objects work in general
> without everything having to know they're different, we would need to move
> the kmap operation into the vfunc as well? That would mean the vfunc would
> be something like obj->kmap_page(obj, pageno, dirty) returning the vaddr of
> the mapped page.

Because of this awesome stuff hw engineers did to implement content
protection the cpu is forbidden from accessing stolen :( Would indeed make
things simpler if we could do that - once we even considered to just give
the entire stolen range back to the linux page allocator using memory
hotplug. Unfortunately Stolen Is Special and there's no way to avoid that.

> This looks like it will need a bit more study and design so perhaps we could
> just take the quick fix of marking whole objects dirty for now (which will
> at least give *correct* behaviour) and then work out how to avoid marking
> whole objects dirty where possible.

Tbh I've lost track of the patches. I do kinda like the idea of
get_page_dirty or adding a write flag to get_page. Since get_page should
be the official API to get at pinned struct pages it should cover most. If
we still have anything left doing a raw loop over the sg table for pages,
I guess we could wrap it up in an i915 macro which takes a write boolean,
Daniel Vetter
Software Engineer, Intel Corporation

More information about the Intel-gfx mailing list