[Intel-gfx] [PATCH 02/18] drm/i915: Flush pages on acquisition

Wed Mar 20 11:41:52 UTC 2019

On Tue, 19 Mar 2019 at 11:58, Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> When we return pages to the system, we ensure that they are marked as
> being in the CPU domain since any external access is uncontrolled and we
> must assume the worst. This means that we need to always flush the pages
> on acquisition if we need to use them on the GPU, and from the beginning
> have used set-domain. Set-domain is overkill for the purpose as it is a
> general synchronisation barrier, but our intent is to only flush the
> pages being swapped in. If we move that flush into the pages acquisition
> phase, we know then that when we have obj->mm.pages, they are coherent
> with the GPU and need only maintain that status without resorting to
> heavy handed use of set-domain.
>
> The principle knock-on effect for userspace is through mmap-gtt
> pagefaulting. Our uAPI has always implied that the GTT mmap was async
> (especially as when any pagefault occurs is unpredicatable to userspace)
> and so userspace had to apply explicit domain control itself
> (set-domain). However, swapping is transparent to the kernel, and so on
> first fault we need to acquire the pages and make them coherent for
> access through the GTT. Our use of set-domain here leaks into the uABI
> that the first pagefault was synchronous. This is unintentional and
> baring a few igt should be unoticed, nevertheless we bump the uABI
> version for mmap-gtt to reflect the change in behaviour.
>
> Another implication of the change is that gem_create() is presumed to
> create an object that is coherent with the CPU and is in the CPU write
> domain, so a set-domain(CPU) following a gem_create() would be a minor
> operation that merely checked whether we could allocate all pages for
> the object. On applying this change, a set-domain(CPU) causes a clflush
> as we acquire the pages. This will have a small impact on mesa as we move
> the clflush here on !llc from execbuf time to create, but that should
> have minimal performance impact as the same clflush exists but is now
> done early and because of the clflush issue, userspace recycles bo and
> so should resist allocating fresh objects.
>
> Internally, the presumption that objects are created in the CPU
> write-domain and remain so through writes to obj->mm.mapping is more
> prevalent than I expect; but easy enough to catch and apply a manual
> flush.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.william.auld at gmail.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Antonio Argenziano <antonio.argenziano at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h               |  8 +++
>  drivers/gpu/drm/i915/i915_gem.c               | 57 ++++++++++++-----
>  drivers/gpu/drm/i915/i915_gem_dmabuf.c        |  1 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  7 +--
>  drivers/gpu/drm/i915/i915_gem_render_state.c  |  2 +-
>  drivers/gpu/drm/i915/i915_perf.c              |  4 +-
>  drivers/gpu/drm/i915/intel_engine_cs.c        |  4 +-
>  drivers/gpu/drm/i915/intel_lrc.c              | 63 +++++++++----------
>  drivers/gpu/drm/i915/intel_ringbuffer.c       | 62 +++++++-----------
>  drivers/gpu/drm/i915/selftests/huge_pages.c   |  5 +-
>  .../gpu/drm/i915/selftests/i915_gem_context.c | 17 ++---
>  .../gpu/drm/i915/selftests/i915_gem_dmabuf.c  |  1 +
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>  drivers/gpu/drm/i915/selftests/i915_request.c | 14 ++---
>  drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
>  .../gpu/drm/i915/selftests/intel_hangcheck.c  |  2 +-
>  drivers/gpu/drm/i915/selftests/intel_lrc.c    |  5 +-
>  .../drm/i915/selftests/intel_workarounds.c    |  3 +
>  18 files changed, 127 insertions(+), 134 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c65c2e6649df..395aa9d5ba02 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2959,6 +2959,14 @@ i915_coherent_map_type(struct drm_i915_private *i915)
>  void *__must_check i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
>                                            enum i915_map_type type);
>
> +void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj,
> +                                unsigned long offset,
> +                                unsigned long size);
> +static inline void i915_gem_object_flush_map(struct drm_i915_gem_object *obj)
> +{
> +       __i915_gem_object_flush_map(obj, 0, obj->base.size);
> +}
> +
>  /**
>   * i915_gem_object_unpin_map - releases an earlier mapping
>   * @obj: the object to unmap
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b7086c8d4726..41d96414ef18 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1713,6 +1713,9 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj)
>   * 2 - Recognise WC as a separate cache domain so that we can flush the
>   *     delayed writes via GTT before performing direct access via WC.
>   *
> + * 3 - Remove implicit set-domain(GTT) and synchronisation on initial
> + *     pagefault; swapin remains transparent.
> + *
>   * Restrictions:
>   *
>   *  * snoopable objects cannot be accessed via the GTT. It can cause machine
> @@ -1740,7 +1743,7 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj)
>   */
>  int i915_gem_mmap_gtt_version(void)
>  {
> -       return 2;
> +       return 3;
>  }
>
>  static inline struct i915_ggtt_view
> @@ -1808,17 +1811,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>
>         trace_i915_gem_object_fault(obj, page_offset, true, write);
>
> -       /* Try to flush the object off the GPU first without holding the lock.
> -        * Upon acquiring the lock, we will perform our sanity checks and then
> -        * repeat the flush holding the lock in the normal manner to catch cases
> -        * where we are gazumped.
> -        */
> -       ret = i915_gem_object_wait(obj,
> -                                  I915_WAIT_INTERRUPTIBLE,
> -                                  MAX_SCHEDULE_TIMEOUT);
> -       if (ret)
> -               goto err;
> -
>         ret = i915_gem_object_pin_pages(obj);
>         if (ret)
>                 goto err;
> @@ -1874,10 +1866,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>                 goto err_unlock;
>         }
>
> -       ret = i915_gem_object_set_to_gtt_domain(obj, write);
> -       if (ret)
> -               goto err_unpin;
> -
>         ret = i915_vma_pin_fence(vma);
>         if (ret)
>                 goto err_unpin;
> @@ -2534,6 +2522,14 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>
>         lockdep_assert_held(&obj->mm.lock);
>
> +       /* Make the pages coherent with the GPU (flushing any swapin). */
> +       if (obj->cache_dirty) {
> +               obj->write_domain = 0;
> +               if (i915_gem_object_has_struct_page(obj))
> +                       drm_clflush_sg(pages);
> +               obj->cache_dirty = false;
> +       }

Is it worth adding some special casing here for volatile objects, so
that we avoid doing the clflush_sg every time we do set_pages for
!llc?

if (obj->cache_dirty && obj->mm.madvise == WILLNEED)

Or is that meh?