[Intel-gfx] [PATCH v1] drm/i915/guc: Fix a fw content lost issue after it is evicted

Tue Nov 24 10:06:21 PST 2015

On Tue, Nov 24, 2015 at 05:47:08PM +0000, Dave Gordon wrote:
> On 24/11/15 16:04, Daniel Vetter wrote:
> >On Tue, Nov 24, 2015 at 03:47:02PM +0000, Dave Gordon wrote:
> >>On 10/11/15 23:27, yu.dai at intel.com wrote:
> >>>From: Alex Dai <yu.dai at intel.com>
> >>>
> >>>We keep a copy of GuC fw in a GEM obj. However its content is lost
> >>>if the GEM obj is swapped (igt/gem_evict_*). Therefore, the later
> >>>fw loading during GPU reset will fail. Mark the obj dirty after
> >>>copying data into the pages. So its content will be kept during
> >>>swapped out.
> >>>
> >>>Signed-off-by: Alex Dai <yu.dai at intel.com>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem.c | 1 +
> >>>  1 file changed, 1 insertion(+)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>index f1e3fde..3b15167 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>@@ -5199,6 +5199,7 @@ i915_gem_object_create_from_data(struct drm_device *dev,
> >>>  	i915_gem_object_pin_pages(obj);
> >>>  	sg = obj->pages;
> >>>  	bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size);
> >>>+	obj->dirty = 1;
> >>>  	i915_gem_object_unpin_pages(obj);
> >>>
> >>>  	if (WARN_ON(bytes != size)) {
> >>
> >>Hi,
> >>
> >>although I liked this at first, on further checking I've found that there
> >>are other cases where the CPU writes on a GEM object without marking it
> >>dirty, so in theory all of those could also be subject to the same issue.
> >>
> >>Functions that may suffer from this problem include copy_batch(),
> >>relocate_entry_cpu(), and render_state_setup(); hence, the objects that
> >>might be affected (CPU updates discarded) would in general be batch buffers.
> >>
> >>Obviously also i915_gem_object_create_from_data() currently used only for
> >>firmware objects; and populate_lr_context() also modifies a GEM object using
> >>the CPU, but it already explicitly marks the object dirty and so avoid any
> >>problems.
> >>
> >>At present, setting an object into the GTT domain with intent-to-write
> >>causes the object to be marked as dirty, whereas setting it into the CPU
> >>domain with intent-to-write doesn't. Since the obj->dirty flag doesn't mean
> >>"CPU and GPU views differ" but rather "In-memory version is newer than
> >>backing store", setting it should depend only on the intent-to-write flag,
> >>not on which domain it's being put into.
> >>
> >>So my preferred solution would be to set the obj->dirty flag inside
> >>i915_gem_object_set_to_cpu_domain(obj, true). Then objects that have been
> >>written by the CPU would be swapped out to backing store rather than
> >>discarded if the shrinker decides to reclaim the memory.
> >>
> >>To make this work properly, we also need to audit all the callers of
> >>i915_gem_object_set_to_cpu_domain(), because I think in several places the
> >>callers pass the second argument as true where it should be false; this
> >>would cause objects to unnecessarily be marked dirty, increasing swap
> >>requirements and impacting performance.
> >>
> >>I'll follow up with an actual patch once people have had an opportunity to
> >>check that the above analysis is accurate.
> >
> >set_to_cpu_domain won't cover everyone, since some callers like
> >shmem_pwrite/pread do clever tricks with inline flushing.
> >
> >Also there's a fundamental difference between writing through GTT (where
> >we always pin all the pages, even though now we do have partial views),
> >and CPU writes, which in most places access the bo page-by-page.
> >
> >Given that I'm still slightly leaning towards a "cpu writes need to mark
> >pages dirty themselves" world-view. At least teaching this to
> >set_to_cpu_domain won't be a fully effective magic bullet.
> >-Daniel
> 
> shmem_pwrite doesn't call set_to_cpu_domain, and it already explicitly marks
> the GEM object as dirty; shmem_pread obviously doesn't need to as it's a
> read operation and doesn't change the object contents.
> 
> There's no difference (as far as backing store is concerned) whether the
> object is written via a CPU mapping, a GTT mapping, or by the GPU; the end
> result is that the object in memory is different from the version on backing
> store, and the object should be marked dirty if we want the changes to
> persist across eviction.
> 
> There may indeed be additional functions that modify GEM objects in some way
> without first setting them into the CPU domain, in which case they will also
> be responsible for marking the objects dirty. But all the (correct) callers
> of i915_gem_object_set_to_cpu_domain(obj, true) follow it by writing on the
> object, and only one marks the object as dirty, so all others are potential
> victims of having the shrinker discard their uncommitted changes.
> 
> Fixing it here (i.e when mapping whole objects for CPU write access) means
> that object-level operations then don't have to worry about page-level
> details. Functions that want to be smarter about tracking updates at the
> level of individual pages could still do so, but I don't want to impose that
> requirement on code that just wants to manipulate objects.

It's still not entirely equivalent: With GTT writes it's impossible to
write through the GTT with get_pages having been called, which means
someone eventually will do the put_pages. But with CPU writes you can do a
set_to_cpu_domain just for flushing, and then write through some
completely different path. Just setting obj->dirty only works if you also
have the pages.

Which means moving the obj->dirty = true into set_to_cpu_domain won't be a
solution that works everywhere, but it will work mostly. That's a bit
fragile. For create_from_data it will work perfectly, and if we put the
set_to_cpu_domain after get_pages it's even obvious that this is so. But
since it's fundamentally fragile I don't like it that much.

But it's also not awesome that set_to_gtt_domain does this for callers.

For lack of clear solutions I'd go with sprinkling obj->dirty or
page_set_dirty over callers. Aside: relocate_entry_cpu probably gets away
because of the unconditional obj->dirty we do later on, and that we redo
all relocs if a fault happens. Still would be good to fix it, just for
safety.

> BTW, I note that the different implementations of put_pages() are not
> consistent about how they treat dirty pages within objects not marked as
> dirty. Does this matter?

Just checked them. All the struct page backed ones handle it
(shmem+userptr) all the others (dma-buf+stolen) don't. Seems consistent.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch