[Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages
Goel, Akash
akash.goel at intel.com
Sun Aug 9 03:53:01 PDT 2015
On 8/7/2015 1:37 PM, Daniel Vetter wrote:
> On Thu, Aug 06, 2015 at 05:43:39PM +0100, Chris Wilson wrote:
>> We have for a long time been ultra-paranoid about the situation whereby
>> we hand back pages to the system that have been written to by the GPU
>> and potentially simultaneously by the user through a CPU mmapping. We
>> can relax this restriction when we know that the cache domain tracking
>> is true and there can be no stale cacheline invalidatations. This is
>> true if the object has never been CPU mmaped as all internal accesses
>> (i.e. kmap/iomap) are carefully flushed. For a CPU mmaping, one would
>> expect that the invalid cache lines are resolved on PTE/TLB shootdown
>> during munmap(), so the only situation we need to be paranoid about is
>
> That seems a pretty strong assumption given that x86 cache are physically
> indexed - I'd never expect flushing to happen on munmap.
x86 L1 caches are most probably virtually indexed/physically tagged and
generally this is the prevalent & most optimal Cache configuration.
For the virtually indexed/physically tagged caches, the cache flush is
not really required either on Context switch or on page table update
(munmap, PTE/TLB shootdown).
So there could be few cache lines, holding the stale data (due to a
prior CPU mmapping), for the object being purged/truncated.
>
>> when such a CPU mmaping exists at the time of put_pages. Given that we
>> need to treat put_pages carefully as we may return live data to the
>> system that we want to use again in the future (i.e. I915_MADV_WILLNEED
>> pages) we can simply treat a live CPU mmaping as a special case of
>> WILLNEED (which it is!). Any I915_MADV_DONTNEED pages and their
>> mmapings are shotdown immediately following put_pages.
>>
>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: "Goel, Akash" <akash.goel at intel.com>
>> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>> Cc: Jesse Barnes <jbarnes at virtuousgeek.org>
>
> But it's still salvageable I think since we only care about coherency for
> the gpu (where data might be stuck in cpu caches). From the cpu's pov (and
> hence the entire system except the gpu) we should never see inconsistency
> really - as soon as the gpu does a write to a cacheline it'll win, and
> before that nothing in the system can assume anything about the contents
> of these pages.
>
> So imo a simple
>
> /* For purgeable objects we don't care about object contents. */
>
> would be enough.
>
> Well except that there's gl extensions which expose purgeable objects, so
> I guess we can't actually do this.
>
> I presume though you only want to avoid clflush when actually purging an
> object, so maybe we can keep this by purging the shmem backing node first
> and checking here for __I915_MADV_PURGED instead?
An object marked as MADV_DONT_NEED, implies that it will be
purged/truncated right away after the call to put_pages_gtt function.
So doing the other way round by purging first and then checking for
__I915_MADV_PURGED, might be equivalent.
Best regards
Akash
> Oh and some perf data would be good for this.
> -Daniel
>
>> ---
>> drivers/gpu/drm/i915/i915_gem.c | 49 ++++++++++++++++++++++++++++++-----------
>> 1 file changed, 36 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 2dfe707f11d3..24deace364a5 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2047,22 +2047,45 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
>>
>> BUG_ON(obj->madv == __I915_MADV_PURGED);
>>
>> - ret = i915_gem_object_set_to_cpu_domain(obj, true);
>> - if (ret) {
>> - /* In the event of a disaster, abandon all caches and
>> - * hope for the best.
>> - */
>> - WARN_ON(ret != -EIO);
>> - i915_gem_clflush_object(obj, true);
>> - obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
>> - }
>> -
>> i915_gem_gtt_finish_object(obj);
>>
>> - if (i915_gem_object_needs_bit17_swizzle(obj))
>> - i915_gem_object_save_bit_17_swizzle(obj);
>> + /* If we need to access the data in the future, we need to
>> + * be sure that the contents of the object is coherent with
>> + * the CPU prior to releasing the pages back to the system.
>> + * Once we unpin them, the mm is free to move them to different
>> + * zones or even swap them out to disk - all without our
>> + * intervention. (Though we could track such operations with our
>> + * own gemfs, if we ever write one.) As such if we want to keep
>> + * the data, set it to the CPU domain now just in case someone
>> + * else touches it.
>> + *
>> + * For a long time we have been paranoid about handing back
>> + * pages to the system with stale cacheline invalidation. For
>> + * all internal use (kmap/iomap), we know that the domain tracking is
>> + * accurate. However, the userspace API is lax and the user can CPU
>> + * mmap the object and invalidate cachelines without our accurate
>> + * tracking. We have been paranoid to be sure that we always flushed
>> + * the cachelines when we stopped using the pages. However, given
>> + * that the CPU PTE/TLB shootdown must have invalidated the cachelines
>> + * upon munmap(), we only need to be paranoid about a live CPU mmap
>> + * now. For this, we need only treat it as live data see
>> + * discard_backing_storage().
>> + */
>> + if (obj->madv == I915_MADV_WILLNEED) {
>> + ret = i915_gem_object_set_to_cpu_domain(obj, true);
>> + if (ret) {
>> + /* In the event of a disaster, abandon all caches and
>> + * hope for the best.
>> + */
>> + WARN_ON(ret != -EIO);
>> + i915_gem_clflush_object(obj, true);
>> + obj->base.read_domains = I915_GEM_DOMAIN_CPU;
>> + obj->base.write_domain = I915_GEM_DOMAIN_CPU;
>> + }
>>
>> - if (obj->madv == I915_MADV_DONTNEED)
>> + if (i915_gem_object_needs_bit17_swizzle(obj))
>> + i915_gem_object_save_bit_17_swizzle(obj);
>> + } else
>> obj->dirty = 0;
>>
>> st_for_each_page(&iter, obj->pages) {
>> --
>> 2.5.0
>>
>
More information about the Intel-gfx
mailing list