[Intel-gfx] [RFC PATCH v2 0/1] Replace shmem memory region and object backend

Wed May 18 15:00:32 UTC 2022

On Tue, 17 May 2022 at 21:45, Adrian Larumbe
<adrian.larumbe at collabora.com> wrote:
>
> This patch is a second attempt at eliminating the old shmem memory region
> and GEM object backend, in favour of a TTM-based one that is able to manage
> objects placed on both system and local memory.
>
> Questions addressed since previous revision:
>
> * Creating an anonymous vfs mount for shmem files in TTM
> * Fixing LLC caching properties and bit 17 swizzling before setting a TTM
> bo's pages when calling get_pages
> * Added handling of phys backend from TTM functions
> * Added pread callback to TTM gem object backend
> * In shmem_create_from_object, ensuring an shmem object we just got a filp
> for has its pages marked dirty and accessed. Otherwise, the engine won't be
> able to read the initial state and a GPU hung will ensue
>
> However, one of the issues persists:
>
> Many GPU hungs in machines of GEN <= 5. My assumption is this has something
>  to do with a caching pitfall, but everywhere across the TTM backend code
>  I've tried to handle object creation and getting its pages with the same
>  set of caching and coherency properties as in the old shmem backend.

Some thoughts in case it's helpful:

- We still look to be trampling the cache_level etc after object
creation. AFAICT i915_ttm_adjust_gem_after_move can be called in
various places after creation.

- The i915_ttm_pwrite hook won't play nice on non-llc platforms, since
it doesn't force a clflush or keep track of the writes with
cache_dirty. The existing ->shmem_pwrite hook only works because we
are guaranteed to have not yet populated the mm.pages, and on non-llc
platforms we always force a clflush in __set_pages(). In
i915_ttm_pwrite we are now just calling pin_pages() and then writing
through the page-cache without forcing a clflush, or ensuring that we
leave cache_dirty set. Also AFAIK the whole point of shmem_pwrite was
to avoid needing to populate the entire object like when calling
pin_pages(). Would it make sense to just fallback to using
i915_gem_shmem_pwrite, which should already take care of the required
flushing?

For reference a common usage pattern is something like:

bb = gem_create() <-- assume non-llc so must be CACHE_NONE
gem_write(bb, BATCH_BUFFER_END) <-- might use cached pwrite internally
execbuf(bb) <-- doesn't see BATCH_BUFFER_END if we don't clflush

>
> Adrian Larumbe (1):
>   drm/i915: Replace shmem memory region and object backend with TTM
>
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  12 +-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c     |  32 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h   |   4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_phys.c     |  32 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c    | 390 +------------------
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 267 ++++++++++++-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   3 +
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |   9 +-
>  drivers/gpu/drm/i915/gt/shmem_utils.c        |  64 ++-
>  drivers/gpu/drm/i915/intel_memory_region.c   |   7 +-
>  10 files changed, 398 insertions(+), 422 deletions(-)
>
> --
> 2.35.1
>