[RFC PATCH 0/3] Propose new struct drm_mem_region

Tue Jul 30 10:45:09 UTC 2019

On Tue, Jul 30, 2019 at 12:24 PM Koenig, Christian
<Christian.Koenig at amd.com> wrote:
>
> Am 30.07.19 um 11:38 schrieb Daniel Vetter:
> > On Tue, Jul 30, 2019 at 08:45:57AM +0000, Koenig, Christian wrote:
> >> Yeah, that looks like a good start. Just a couple of random design
> >> comments/requirements.
> >>
> >> First of all please restructure the changes so that you more or less
> >> have the following:
> >> 1. Adding of the new structures and functionality without any change to
> >> existing code.
> >> 2. Replacing the existing functionality in TTM and all of its drivers.
> >> 3. Replacing the existing functionality in i915.
> >>
> >> This should make it much easier to review the new functionality when it
> >> is not mixed with existing TTM stuff.
> >>
> >>
> >> Second please completely drop the concept of gpu_offset or start of the
> >> memory region like here:
> >>> drm_printf(p, "    gpu_offset: 0x%08llX\n", man->region.start);
> >> At least on AMD hardware we have the following address spaces which are
> >> sometimes even partially overlapping: VM, MC, SYSTEM, FB, AGP, XGMI, bus
> >> addresses and physical addresses.
> >>
> >> Pushing a concept of a general GPU address space into the memory
> >> management was a rather bad design mistake in TTM and we should not
> >> repeat that here.
> >>
> >> A region should only consists of a size in bytes and (internal to the
> >> region manager) allocations in that region.
> >>
> >>
> >> Third please don't use any CPU or architecture specific types in any
> >> data structures:
> >>> +struct drm_mem_region {
> >>> +   resource_size_t start; /* within GPU physical address space */
> >>> +   resource_size_t io_start; /* BAR address (CPU accessible) */
> >>> +   resource_size_t size;
> >> I knew that resource_size is mostly 64bit on modern architectures, but
> >> dGPUs are completely separate to the architecture and we always need
> >> 64bits here at least for AMD hardware.
> >>
> >> So this should either be always uint64_t, or something like
> >> gpu_resource_size which depends on what the compiled in drivers require
> >> if we really need that.
> >>
> >> And by the way: Please always use bytes for things like sizes and not
> >> number of pages, cause page size is again CPU/architecture specific and
> >> GPU drivers don't necessary care about that.
> >>
> >>
> >> And here also a few direct comments on the code:
> >>> +   union {
> >>> +           struct drm_mm *mm;
> >>> +           /* FIXME (for i915): struct drm_buddy_mm *buddy_mm; */
> >>> +           void *priv;
> >>> +   };
> >> Maybe just always use void *mm here.
> >>
> >>> +   spinlock_t move_lock;
> >>> +   struct dma_fence *move;
> >> That is TTM specific and I'm not sure if we want it in the common memory
> >> management handling.
> >>
> >> If we want that here we should probably replace the lock with some rcu
> >> and atomic fence pointer exchange first.
> >>
> >>> +/*
> >>> + * Memory types for drm_mem_region
> >>> + */
> >> #define DRM_MEM_SWAP    ?
> > btw what did you have in mind for this? Since we use shmem we kinda don't
> > know whether the BO is actually swapped out or not, at least on the i915
> > side. So this would be more NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.
>
> Yeah, the problem is not everybody can use shmem. For some use cases you
> have to use memory allocated through dma_alloc_coherent().
>
> So to be able to swap this out you need a separate domain to copy it
> from whatever is backing it currently to shmem.
>
> So we essentially have:
> DRM_MEM_SYS_SWAPABLE
> DRM_MEM_SYS_NOT_GPU_MAPPED
> DRM_MEM_SYS_GPU_MAPPED
>
> Or something like that.

Yeah i915-gem is similar. We oportunistically keep the pages pinned
sometimes even if not currently mapped into the (what ttm calls) TT.
So I think these three for system memory make sense for us too. I
think that's similar (at least in spirit) to the dma_alloc cache you
have going on. Mabye instead of the somewhat cumbersome NOT_GPU_MAPPED
we could have something like PINNED or so. Although it's not
permanently pinned, so maybe that's confusing too.

> >> TTM was clearly missing that resulting in a whole bunch of extra
> >> handling and rather complicated handling.
> >>
> >>> +#define DRM_MEM_SYSTEM     0
> >>> +#define DRM_MEM_STOLEN     1
> >> I think we need a better naming for that.
> >>
> >> STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
> >> least for TTM this is the system memory currently GPU accessible.
> > Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
> > translation table window into system memory. Not the same thing at all.
>
> Thought so. The closest I have in mind is GTT, but everything else works
> as well.

Would your GPU_MAPPED above work for TT? I think we'll also need
STOLEN, I'm even hearing noises that there's going to be stolen for
discrete vram for us ... Also if we expand I guess we need to teach
ttm to cope with more, or maybe treat the DRM one as some kind of
sub-flavour.
-Daniel

>
> Christian.
>
> > -Daniel
> >
> >>
> >> Thanks for looking into that,
> >> Christian.
> >>
> >> Am 30.07.19 um 02:32 schrieb Brian Welty:
> >>> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
> >>>     I fixed the nit with ordering of header includes that Sam noted. ]
> >>>
> >>> This RFC series is first implementation of some ideas expressed
> >>> earlier on dri-devel [1].
> >>>
> >>> Some of the goals (open for much debate) are:
> >>>     - Create common base structure (subclass) for memory regions (patch #1)
> >>>     - Create common memory region types (patch #2)
> >>>     - Create common set of memory_region function callbacks (based on
> >>>       ttm_mem_type_manager_funcs and intel_memory_regions_ops)
> >>>     - Create common helpers that operate on drm_mem_region to be leveraged
> >>>       by both TTM drivers and i915, reducing code duplication
> >>>     - Above might start with refactoring ttm_bo_manager.c as these are
> >>>       helpers for using drm_mm's range allocator and could be made to
> >>>       operate on DRM structures instead of TTM ones.
> >>>     - Larger goal might be to make LRU management of GEM objects common, and
> >>>       migrate those fields into drm_mem_region and drm_gem_object strucures.
> >>>
> >>> Patches 1-2 implement the proposed struct drm_mem_region and adds
> >>> associated common set of definitions for memory region type.
> >>>
> >>> Patch #3 is update to i915 and is based upon another series which is
> >>> in progress to add vram support to i915 [2].
> >>>
> >>> [1] https://lists.freedesktop.org/archives/dri-devel/2019-June/224501.html
> >>> [2] https://lists.freedesktop.org/archives/intel-gfx/2019-June/203649.html
> >>>
> >>> Brian Welty (3):
> >>>     drm: introduce new struct drm_mem_region
> >>>     drm: Introduce DRM_MEM defines for specifying type of drm_mem_region
> >>>     drm/i915: Update intel_memory_region to use nested drm_mem_region
> >>>
> >>>    drivers/gpu/drm/i915/gem/i915_gem_object.c    |  2 +-
> >>>    drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  2 +-
> >>>    drivers/gpu/drm/i915/i915_gem_gtt.c           | 10 ++---
> >>>    drivers/gpu/drm/i915/i915_gpu_error.c         |  2 +-
> >>>    drivers/gpu/drm/i915/i915_query.c             |  2 +-
> >>>    drivers/gpu/drm/i915/intel_memory_region.c    | 10 +++--
> >>>    drivers/gpu/drm/i915/intel_memory_region.h    | 19 +++------
> >>>    drivers/gpu/drm/i915/intel_region_lmem.c      | 26 ++++++-------
> >>>    .../drm/i915/selftests/intel_memory_region.c  |  8 ++--
> >>>    drivers/gpu/drm/ttm/ttm_bo.c                  | 34 +++++++++-------
> >>>    drivers/gpu/drm/ttm/ttm_bo_manager.c          | 14 +++----
> >>>    drivers/gpu/drm/ttm/ttm_bo_util.c             | 11 +++---
> >>>    drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c |  8 ++--
> >>>    drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c    |  4 +-
> >>>    include/drm/drm_mm.h                          | 39 ++++++++++++++++++-
> >>>    include/drm/ttm/ttm_bo_api.h                  |  2 +-
> >>>    include/drm/ttm/ttm_bo_driver.h               | 16 ++++----
> >>>    include/drm/ttm/ttm_placement.h               |  8 ++--
> >>>    18 files changed, 124 insertions(+), 93 deletions(-)
> >>>
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch