[Intel-gfx] [PATCH v1] drm/i915/guc: Fix a fw content lost issue after it is evicted

Tue Nov 24 15:01:25 PST 2015

On Tue, Nov 24, 2015 at 07:06:21PM +0100, Daniel Vetter wrote:
> Just setting obj->dirty only works if you also have the pages.

Exactly. The CPU access has historically always been page-by-page. The
style here more or less to emulate the CPU mmap.

> But it's also not awesome that set_to_gtt_domain does this for callers.

Hmm, do you have an example where we want set-to-gtt(write), but not
actually write through the backing storage? Internal use of set-to-gtt
has never been ideal (e.g. context) but we haven't yet come up with a
better semantic.

> For lack of clear solutions I'd go with sprinkling obj->dirty or
> page_set_dirty over callers. Aside: relocate_entry_cpu probably gets away
> because of the unconditional obj->dirty we do later on, and that we redo
> all relocs if a fault happens. Still would be good to fix it, just for
> safety.

[copy_batch() isn't a bug as the contents are invalidated after use
anyway]

relocate_entry_cpu() is a bug we never caught. Indeed we've papered over
it to mask some over userspace issues, but just adding the set_page_dirty()
as required isn't going to be a big hardship.

We have tons of swapthrash tests to check persistency of GPU buffers,
but we never tried to thrash the batches themselves out to swap and then
reuse them.

I guess that it is because userspace doesn't reuse batches that we never
had report of the issue. Hibernating would be a good exercise of such.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre