[Intel-gfx] [PATCH 02/18] drm/i915: Flush pages on acquisition
Matthew Auld
matthew.william.auld at gmail.com
Wed Mar 20 11:41:52 UTC 2019
On Tue, 19 Mar 2019 at 11:58, Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> When we return pages to the system, we ensure that they are marked as
> being in the CPU domain since any external access is uncontrolled and we
> must assume the worst. This means that we need to always flush the pages
> on acquisition if we need to use them on the GPU, and from the beginning
> have used set-domain. Set-domain is overkill for the purpose as it is a
> general synchronisation barrier, but our intent is to only flush the
> pages being swapped in. If we move that flush into the pages acquisition
> phase, we know then that when we have obj->mm.pages, they are coherent
> with the GPU and need only maintain that status without resorting to
> heavy handed use of set-domain.
>
> The principle knock-on effect for userspace is through mmap-gtt
> pagefaulting. Our uAPI has always implied that the GTT mmap was async
> (especially as when any pagefault occurs is unpredicatable to userspace)
> and so userspace had to apply explicit domain control itself
> (set-domain). However, swapping is transparent to the kernel, and so on
> first fault we need to acquire the pages and make them coherent for
> access through the GTT. Our use of set-domain here leaks into the uABI
> that the first pagefault was synchronous. This is unintentional and
> baring a few igt should be unoticed, nevertheless we bump the uABI
> version for mmap-gtt to reflect the change in behaviour.
>
> Another implication of the change is that gem_create() is presumed to
> create an object that is coherent with the CPU and is in the CPU write
> domain, so a set-domain(CPU) following a gem_create() would be a minor
> operation that merely checked whether we could allocate all pages for
> the object. On applying this change, a set-domain(CPU) causes a clflush
> as we acquire the pages. This will have a small impact on mesa as we move
> the clflush here on !llc from execbuf time to create, but that should
> have minimal performance impact as the same clflush exists but is now
> done early and because of the clflush issue, userspace recycles bo and
> so should resist allocating fresh objects.
>
> Internally, the presumption that objects are created in the CPU
> write-domain and remain so through writes to obj->mm.mapping is more
> prevalent than I expect; but easy enough to catch and apply a manual
> flush.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.william.auld at gmail.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Antonio Argenziano <antonio.argenziano at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 8 +++
> drivers/gpu/drm/i915/i915_gem.c | 57 ++++++++++++-----
> drivers/gpu/drm/i915/i915_gem_dmabuf.c | 1 +
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 7 +--
> drivers/gpu/drm/i915/i915_gem_render_state.c | 2 +-
> drivers/gpu/drm/i915/i915_perf.c | 4 +-
> drivers/gpu/drm/i915/intel_engine_cs.c | 4 +-
> drivers/gpu/drm/i915/intel_lrc.c | 63 +++++++++----------
> drivers/gpu/drm/i915/intel_ringbuffer.c | 62 +++++++-----------
> drivers/gpu/drm/i915/selftests/huge_pages.c | 5 +-
> .../gpu/drm/i915/selftests/i915_gem_context.c | 17 ++---
> .../gpu/drm/i915/selftests/i915_gem_dmabuf.c | 1 +
> .../gpu/drm/i915/selftests/i915_gem_evict.c | 4 +-
> drivers/gpu/drm/i915/selftests/i915_request.c | 14 ++---
> drivers/gpu/drm/i915/selftests/igt_spinner.c | 2 +-
> .../gpu/drm/i915/selftests/intel_hangcheck.c | 2 +-
> drivers/gpu/drm/i915/selftests/intel_lrc.c | 5 +-
> .../drm/i915/selftests/intel_workarounds.c | 3 +
> 18 files changed, 127 insertions(+), 134 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c65c2e6649df..395aa9d5ba02 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2959,6 +2959,14 @@ i915_coherent_map_type(struct drm_i915_private *i915)
> void *__must_check i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
> enum i915_map_type type);
>
> +void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj,
> + unsigned long offset,
> + unsigned long size);
> +static inline void i915_gem_object_flush_map(struct drm_i915_gem_object *obj)
> +{
> + __i915_gem_object_flush_map(obj, 0, obj->base.size);
> +}
> +
> /**
> * i915_gem_object_unpin_map - releases an earlier mapping
> * @obj: the object to unmap
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b7086c8d4726..41d96414ef18 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1713,6 +1713,9 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj)
> * 2 - Recognise WC as a separate cache domain so that we can flush the
> * delayed writes via GTT before performing direct access via WC.
> *
> + * 3 - Remove implicit set-domain(GTT) and synchronisation on initial
> + * pagefault; swapin remains transparent.
> + *
> * Restrictions:
> *
> * * snoopable objects cannot be accessed via the GTT. It can cause machine
> @@ -1740,7 +1743,7 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj)
> */
> int i915_gem_mmap_gtt_version(void)
> {
> - return 2;
> + return 3;
> }
>
> static inline struct i915_ggtt_view
> @@ -1808,17 +1811,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>
> trace_i915_gem_object_fault(obj, page_offset, true, write);
>
> - /* Try to flush the object off the GPU first without holding the lock.
> - * Upon acquiring the lock, we will perform our sanity checks and then
> - * repeat the flush holding the lock in the normal manner to catch cases
> - * where we are gazumped.
> - */
> - ret = i915_gem_object_wait(obj,
> - I915_WAIT_INTERRUPTIBLE,
> - MAX_SCHEDULE_TIMEOUT);
> - if (ret)
> - goto err;
> -
> ret = i915_gem_object_pin_pages(obj);
> if (ret)
> goto err;
> @@ -1874,10 +1866,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
> goto err_unlock;
> }
>
> - ret = i915_gem_object_set_to_gtt_domain(obj, write);
> - if (ret)
> - goto err_unpin;
> -
> ret = i915_vma_pin_fence(vma);
> if (ret)
> goto err_unpin;
> @@ -2534,6 +2522,14 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>
> lockdep_assert_held(&obj->mm.lock);
>
> + /* Make the pages coherent with the GPU (flushing any swapin). */
> + if (obj->cache_dirty) {
> + obj->write_domain = 0;
> + if (i915_gem_object_has_struct_page(obj))
> + drm_clflush_sg(pages);
> + obj->cache_dirty = false;
> + }
Is it worth adding some special casing here for volatile objects, so
that we avoid doing the clflush_sg every time we do set_pages for
!llc?
if (obj->cache_dirty && obj->mm.madvise == WILLNEED)
Or is that meh?
More information about the Intel-gfx
mailing list