Starting from DG2 we will have resizable BAR support for device local-memory, but in some cases the final BAR size might still be smaller than the total local-memory size. In such cases only part of local-memory will be CPU accessible, while the remainder is only accessible via the GPU. This series adds the basic enablers needed to ensure that the entire local-memory range is usable.
Needs to be applied on top of Arun' in-progress series[1].
[1] https://patchwork.freedesktop.org/series/99430/
v2: - Various improvements and fixes as suggested by Thomas. v3: - Add some more a-b and r-b. Also tweak patch 12 slightly, as suggested by Thomas.
With small LMEM-BAR we need to be able to differentiate between the total size of LMEM, and how much of it is CPU mappable. The end goal is to be able to utilize the entire range, even if part of is it not CPU accessible.
v2: also update intelfb_create
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/i915/display/intel_fbdev.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 8 +++++--- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 6 +++++- drivers/gpu/drm/i915/intel_memory_region.c | 6 +++++- drivers/gpu/drm/i915/intel_memory_region.h | 2 ++ drivers/gpu/drm/i915/selftests/intel_memory_region.c | 8 ++++---- drivers/gpu/drm/i915/selftests/mock_region.c | 6 ++++-- drivers/gpu/drm/i915/selftests/mock_region.h | 3 ++- 11 files changed, 31 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 41d279db2be6..244acf462065 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -248,7 +248,7 @@ static int intelfb_create(struct drm_fb_helper *helper, struct intel_memory_region *mem = obj->mm.region;
info->apertures->ranges[0].base = mem->io_start; - info->apertures->ranges[0].size = mem->total; + info->apertures->ranges[0].size = mem->io_size;
/* Use fbdev's framebuffer from lmem for discrete */ info->fix.smem_start = diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 6c57b0a79c8a..a9aca11cedbb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -696,7 +696,7 @@ struct intel_memory_region *i915_gem_shmem_setup(struct drm_i915_private *i915, { return intel_memory_region_create(i915, 0, totalram_pages() << PAGE_SHIFT, - PAGE_SIZE, 0, + PAGE_SIZE, 0, 0, type, instance, &shmem_region_ops); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 1de73a644965..b79f5f5e7df0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -491,6 +491,7 @@ static int i915_gem_init_stolen(struct intel_memory_region *mem)
/* Exclude the reserved region from driver use */ mem->region.end = reserved_base - 1; + mem->io_size = resource_size(&mem->region);
/* It is possible for the reserved area to end before the end of stolen * memory, so just consider the start. */ @@ -747,7 +748,7 @@ static int init_stolen_lmem(struct intel_memory_region *mem)
if (!io_mapping_init_wc(&mem->iomap, mem->io_start, - resource_size(&mem->region))) + mem->io_size)) return -EIO;
/* @@ -802,7 +803,8 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type, I915_GTT_PAGE_SIZE_4K;
mem = intel_memory_region_create(i915, lmem_base, lmem_size, - min_page_size, io_start, + min_page_size, + io_start, lmem_size, type, instance, &i915_region_stolen_lmem_ops); if (IS_ERR(mem)) @@ -833,7 +835,7 @@ i915_gem_stolen_smem_setup(struct drm_i915_private *i915, u16 type, mem = intel_memory_region_create(i915, intel_graphics_stolen_res.start, resource_size(&intel_graphics_stolen_res), - PAGE_SIZE, 0, type, instance, + PAGE_SIZE, 0, 0, type, instance, &i915_region_stolen_smem_ops); if (IS_ERR(mem)) return mem; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 1eb2fd81c5b6..e9399f7b3e67 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -1101,7 +1101,7 @@ i915_gem_ttm_system_setup(struct drm_i915_private *i915,
mr = intel_memory_region_create(i915, 0, totalram_pages() << PAGE_SHIFT, - PAGE_SIZE, 0, + PAGE_SIZE, 0, 0, type, instance, &ttm_system_region_ops); if (IS_ERR(mr)) diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index f36191ebf964..42db9cd30978 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -499,7 +499,7 @@ static int igt_mock_memory_region_huge_pages(void *arg) int bit; int err = 0;
- mem = mock_region_create(i915, 0, SZ_2G, I915_GTT_PAGE_SIZE_4K, 0); + mem = mock_region_create(i915, 0, SZ_2G, I915_GTT_PAGE_SIZE_4K, 0, 0); if (IS_ERR(mem)) { pr_err("%s failed to create memory region\n", __func__); return PTR_ERR(mem); diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index cb3f66707b21..01838b8ce4c7 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -91,7 +91,7 @@ region_lmem_init(struct intel_memory_region *mem)
if (!io_mapping_init_wc(&mem->iomap, mem->io_start, - resource_size(&mem->region))) { + mem->io_size)) { ret = -EIO; goto out_no_io; } @@ -144,6 +144,7 @@ intel_gt_setup_fake_lmem(struct intel_gt *gt) mappable_end, PAGE_SIZE, io_start, + mappable_end, INTEL_MEMORY_LOCAL, 0, &intel_region_lmem_ops); @@ -220,6 +221,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) lmem_size, min_page_size, io_start, + lmem_size, INTEL_MEMORY_LOCAL, 0, &intel_region_lmem_ops); @@ -233,6 +235,8 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) drm_dbg(&i915->drm, "Local memory: %pR\n", &mem->region); drm_dbg(&i915->drm, "Local memory IO start: %pa\n", &mem->io_start); + drm_info(&i915->drm, "Local memory IO size: %pa\n", + &mem->io_size); drm_info(&i915->drm, "Local memory available: %pa\n", &lmem_size);
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index c70d7e286a51..595e2489c23e 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -97,7 +97,7 @@ static int iomemtest(struct intel_memory_region *mem, bool test_all, const void *caller) { - resource_size_t last = resource_size(&mem->region) - PAGE_SIZE; + resource_size_t last = mem->io_size - PAGE_SIZE; resource_size_t page; int err;
@@ -205,6 +205,8 @@ static int intel_memory_region_memtest(struct intel_memory_region *mem, if (!mem->io_start) return 0;
+ WARN_ON_ONCE(!mem->io_size); + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) || i915->params.memtest) err = iomemtest(mem, i915->params.memtest, caller);
@@ -217,6 +219,7 @@ intel_memory_region_create(struct drm_i915_private *i915, resource_size_t size, resource_size_t min_page_size, resource_size_t io_start, + resource_size_t io_size, u16 type, u16 instance, const struct intel_memory_region_ops *ops) @@ -231,6 +234,7 @@ intel_memory_region_create(struct drm_i915_private *i915, mem->i915 = i915; mem->region = (struct resource)DEFINE_RES_MEM(start, size); mem->io_start = io_start; + mem->io_size = io_size; mem->min_page_size = min_page_size; mem->ops = ops; mem->total = size; diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index 5625c9c38993..459051ce0c91 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -71,6 +71,7 @@ struct intel_memory_region { struct drm_mm_node fake_mappable;
resource_size_t io_start; + resource_size_t io_size; resource_size_t min_page_size; resource_size_t total; resource_size_t avail; @@ -103,6 +104,7 @@ intel_memory_region_create(struct drm_i915_private *i915, resource_size_t size, resource_size_t min_page_size, resource_size_t io_start, + resource_size_t io_size, u16 type, u16 instance, const struct intel_memory_region_ops *ops); diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 7acba1d2135e..247f65f02bbf 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -170,7 +170,7 @@ static int igt_mock_reserve(void *arg) if (!order) return 0;
- mem = mock_region_create(i915, 0, SZ_2G, I915_GTT_PAGE_SIZE_4K, 0); + mem = mock_region_create(i915, 0, SZ_2G, I915_GTT_PAGE_SIZE_4K, 0, 0); if (IS_ERR(mem)) { pr_err("failed to create memory region\n"); err = PTR_ERR(mem); @@ -383,7 +383,7 @@ static int igt_mock_splintered_region(void *arg) */
size = (SZ_4G - 1) & PAGE_MASK; - mem = mock_region_create(i915, 0, size, PAGE_SIZE, 0); + mem = mock_region_create(i915, 0, size, PAGE_SIZE, 0, 0); if (IS_ERR(mem)) return PTR_ERR(mem);
@@ -471,7 +471,7 @@ static int igt_mock_max_segment(void *arg) */
size = SZ_8G; - mem = mock_region_create(i915, 0, size, PAGE_SIZE, 0); + mem = mock_region_create(i915, 0, size, PAGE_SIZE, 0, 0); if (IS_ERR(mem)) return PTR_ERR(mem);
@@ -1188,7 +1188,7 @@ int intel_memory_region_mock_selftests(void) if (!i915) return -ENOMEM;
- mem = mock_region_create(i915, 0, SZ_2G, I915_GTT_PAGE_SIZE_4K, 0); + mem = mock_region_create(i915, 0, SZ_2G, I915_GTT_PAGE_SIZE_4K, 0, 0); if (IS_ERR(mem)) { pr_err("failed to create memory region\n"); err = PTR_ERR(mem); diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c index 19bff8afcaaa..467eeae6d5f0 100644 --- a/drivers/gpu/drm/i915/selftests/mock_region.c +++ b/drivers/gpu/drm/i915/selftests/mock_region.c @@ -107,7 +107,8 @@ mock_region_create(struct drm_i915_private *i915, resource_size_t start, resource_size_t size, resource_size_t min_page_size, - resource_size_t io_start) + resource_size_t io_start, + resource_size_t io_size) { int instance = ida_alloc_max(&i915->selftest.mock_region_instances, TTM_NUM_MEM_TYPES - TTM_PL_PRIV - 1, @@ -117,6 +118,7 @@ mock_region_create(struct drm_i915_private *i915, return ERR_PTR(instance);
return intel_memory_region_create(i915, start, size, min_page_size, - io_start, INTEL_MEMORY_MOCK, instance, + io_start, io_size, + INTEL_MEMORY_MOCK, instance, &mock_region_ops); } diff --git a/drivers/gpu/drm/i915/selftests/mock_region.h b/drivers/gpu/drm/i915/selftests/mock_region.h index 329bf74dfaca..e36c3a433551 100644 --- a/drivers/gpu/drm/i915/selftests/mock_region.h +++ b/drivers/gpu/drm/i915/selftests/mock_region.h @@ -16,6 +16,7 @@ mock_region_create(struct drm_i915_private *i915, resource_size_t start, resource_size_t size, resource_size_t min_page_size, - resource_size_t io_start); + resource_size_t io_start, + resource_size_t io_size);
#endif /* !__MOCK_REGION_H */
On devices with non-mappable LMEM ensure we always allocate the pages within the mappable portion. For now we assume that all LMEM buffers will require CPU access, which is also inline with pretty much all current kernel internal users. In the next patch we will introduce a new flag to override this behaviour.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++ drivers/gpu/drm/i915/intel_region_ttm.c | 5 +++++ 2 files changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index e9399f7b3e67..41e94d09e742 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -128,6 +128,10 @@ i915_ttm_place_from_region(const struct intel_memory_region *mr,
if (flags & I915_BO_ALLOC_CONTIGUOUS) place->flags = TTM_PL_FLAG_CONTIGUOUS; + if (mr->io_size && mr->io_size < mr->total) { + place->fpfn = 0; + place->lpfn = mr->io_size >> PAGE_SHIFT; + } }
static void diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c index f2b888c16958..4689192d5e8d 100644 --- a/drivers/gpu/drm/i915/intel_region_ttm.c +++ b/drivers/gpu/drm/i915/intel_region_ttm.c @@ -199,6 +199,11 @@ intel_region_ttm_resource_alloc(struct intel_memory_region *mem, struct ttm_resource *res; int ret;
+ if (mem->io_size && mem->io_size < mem->total) { + place.fpfn = 0; + place.lpfn = mem->io_size >> PAGE_SHIFT; + } + mock_bo.base.size = size; place.flags = flags;
If the user doesn't require CPU access for the buffer, then ALLOC_GPU_ONLY should be used, in order to prioritise allocating in the non-mappable portion of LMEM, on devices with small BAR.
v2(Thomas): - The BO_ALLOC_TOPDOWN naming here is poor, since this is pure lies on systems that don't even have small BAR. A better name is GPU_ONLY, which is accurate regardless of the configuration.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- .../gpu/drm/i915/gem/i915_gem_object_types.h | 17 ++++++++++++----- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 3 +++ drivers/gpu/drm/i915/gem/i915_gem_region.c | 5 +++++ drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 13 ++++++++++--- drivers/gpu/drm/i915/gt/intel_gt.c | 4 +++- drivers/gpu/drm/i915/i915_vma.c | 3 +++ drivers/gpu/drm/i915/intel_region_ttm.c | 11 ++++++++--- drivers/gpu/drm/i915/selftests/mock_region.c | 7 +------ 8 files changed, 45 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 0098a32490f0..fd54eb8f4826 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -319,16 +319,23 @@ struct drm_i915_gem_object { #define I915_BO_ALLOC_PM_VOLATILE BIT(4) /* Object needs to be restored early using memcpy during resume */ #define I915_BO_ALLOC_PM_EARLY BIT(5) +/* + * Object is likely never accessed by the CPU. This will prioritise the BO to be + * allocated in the non-mappable portion of lmem. This is merely a hint, and if + * dealing with userspace objects the CPU fault handler is free to ignore this. + */ +#define I915_BO_ALLOC_GPU_ONLY BIT(6) #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \ I915_BO_ALLOC_VOLATILE | \ I915_BO_ALLOC_CPU_CLEAR | \ I915_BO_ALLOC_USER | \ I915_BO_ALLOC_PM_VOLATILE | \ - I915_BO_ALLOC_PM_EARLY) -#define I915_BO_READONLY BIT(6) -#define I915_TILING_QUIRK_BIT 7 /* unknown swizzling; do not release! */ -#define I915_BO_PROTECTED BIT(8) -#define I915_BO_WAS_BOUND_BIT 9 + I915_BO_ALLOC_PM_EARLY | \ + I915_BO_ALLOC_GPU_ONLY) +#define I915_BO_READONLY BIT(7) +#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */ +#define I915_BO_PROTECTED BIT(9) +#define I915_BO_WAS_BOUND_BIT 10 /** * @mem_flags - Mutable placement-related flags * diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 060fe29f5929..a4d8dc163691 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -356,6 +356,9 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, !i915_gem_object_has_iomem(obj)) return ERR_PTR(-ENXIO);
+ if (WARN_ON_ONCE(obj->flags & I915_BO_ALLOC_GPU_ONLY)) + return ERR_PTR(-EINVAL); + assert_object_held(obj);
pinned = !(type & I915_MAP_OVERRIDE); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index a4350227e9ae..873d52f872c5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -45,6 +45,11 @@ i915_gem_object_create_region(struct intel_memory_region *mem,
GEM_BUG_ON(flags & ~I915_BO_ALLOC_FLAGS);
+ if (WARN_ON_ONCE(flags & I915_BO_ALLOC_GPU_ONLY && + (flags & I915_BO_ALLOC_CPU_CLEAR || + flags & I915_BO_ALLOC_PM_EARLY))) + return ERR_PTR(-EINVAL); + if (!mem) return ERR_PTR(-ENODEV);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 41e94d09e742..36e77fcf2ef9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -127,10 +127,14 @@ i915_ttm_place_from_region(const struct intel_memory_region *mr, place->mem_type = intel_region_to_ttm_type(mr);
if (flags & I915_BO_ALLOC_CONTIGUOUS) - place->flags = TTM_PL_FLAG_CONTIGUOUS; + place->flags |= TTM_PL_FLAG_CONTIGUOUS; if (mr->io_size && mr->io_size < mr->total) { - place->fpfn = 0; - place->lpfn = mr->io_size >> PAGE_SHIFT; + if (flags & I915_BO_ALLOC_GPU_ONLY) { + place->flags |= TTM_PL_FLAG_TOPDOWN; + } else { + place->fpfn = 0; + place->lpfn = mr->io_size >> PAGE_SHIFT; + } } }
@@ -888,6 +892,9 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) if (!obj) return VM_FAULT_SIGBUS;
+ if (obj->flags & I915_BO_ALLOC_GPU_ONLY) + return -EINVAL; + /* Sanity check that we allow writing into this object */ if (unlikely(i915_gem_object_is_readonly(obj) && area->vm_flags & VM_WRITE)) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index e6f6bf7c3926..4f35c51fdb52 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -457,7 +457,9 @@ static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size) struct i915_vma *vma; int ret;
- obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE); + obj = i915_gem_object_create_lmem(i915, size, + I915_BO_ALLOC_VOLATILE | + I915_BO_ALLOC_GPU_ONLY); if (IS_ERR(obj)) obj = i915_gem_object_create_stolen(i915, size); if (IS_ERR(obj)) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 845cd88f8313..3322a0651a17 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -540,6 +540,9 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma) void __iomem *ptr; int err;
+ if (WARN_ON_ONCE(vma->obj->flags & I915_BO_ALLOC_GPU_ONLY)) + return IO_ERR_PTR(-EINVAL); + if (!i915_gem_object_is_lmem(vma->obj)) { if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) { err = -ENODEV; diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c index 4689192d5e8d..c055029c7a70 100644 --- a/drivers/gpu/drm/i915/intel_region_ttm.c +++ b/drivers/gpu/drm/i915/intel_region_ttm.c @@ -199,13 +199,18 @@ intel_region_ttm_resource_alloc(struct intel_memory_region *mem, struct ttm_resource *res; int ret;
+ if (flags & I915_BO_ALLOC_CONTIGUOUS) + place.flags |= TTM_PL_FLAG_CONTIGUOUS; if (mem->io_size && mem->io_size < mem->total) { - place.fpfn = 0; - place.lpfn = mem->io_size >> PAGE_SHIFT; + if (flags & I915_BO_ALLOC_GPU_ONLY) { + place.flags |= TTM_PL_FLAG_TOPDOWN; + } else { + place.fpfn = 0; + place.lpfn = mem->io_size >> PAGE_SHIFT; + } }
mock_bo.base.size = size; - place.flags = flags;
ret = man->func->alloc(man, &mock_bo, &place, &res); if (ret == -ENOSPC) diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c index 467eeae6d5f0..f64325491f35 100644 --- a/drivers/gpu/drm/i915/selftests/mock_region.c +++ b/drivers/gpu/drm/i915/selftests/mock_region.c @@ -22,17 +22,12 @@ static void mock_region_put_pages(struct drm_i915_gem_object *obj,
static int mock_region_get_pages(struct drm_i915_gem_object *obj) { - unsigned int flags; struct sg_table *pages; int err;
- flags = 0; - if (obj->flags & I915_BO_ALLOC_CONTIGUOUS) - flags |= TTM_PL_FLAG_CONTIGUOUS; - obj->mm.res = intel_region_ttm_resource_alloc(obj->mm.region, obj->base.size, - flags); + obj->flags); if (IS_ERR(obj->mm.res)) return PTR_ERR(obj->mm.res);
Track the total amount of available visible memory, and also track per-resource the amount of used visible memory. For now this is useful for our debug output, and deciding if it is even worth calling into the buddy allocator. In the future tracking the per-resource visible usage will be useful for when deciding if we should attempt to evict certain buffers.
v2: - s/place->lpfn/lpfn/, that way we can avoid scanning the list if the entire range is already mappable. - Move the end declaration inside the if block(Thomas). - Make sure to also account for reserved memory.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 68 ++++++++++++++++++- drivers/gpu/drm/i915/i915_ttm_buddy_manager.h | 8 ++- drivers/gpu/drm/i915/intel_region_ttm.c | 1 + 3 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c index 76d5211c25eb..e47a3d46c6ff 100644 --- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c +++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c @@ -19,6 +19,9 @@ struct i915_ttm_buddy_manager { struct drm_buddy mm; struct list_head reserved; struct mutex lock; + unsigned long visible_size; + unsigned long visible_avail; + unsigned long visible_reserved; u64 default_page_size; };
@@ -87,6 +90,12 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man, n_pages = size >> ilog2(mm->chunk_size);
mutex_lock(&bman->lock); + if (lpfn <= bman->visible_size && n_pages > bman->visible_avail) { + mutex_unlock(&bman->lock); + err = -ENOSPC; + goto err_free_res; + } + err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT, (u64)lpfn << PAGE_SHIFT, (u64)n_pages << PAGE_SHIFT, @@ -107,6 +116,31 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man, mutex_unlock(&bman->lock); }
+ if (lpfn <= bman->visible_size) { + bman_res->used_visible_size = bman_res->base.num_pages; + } else { + struct drm_buddy_block *block; + + list_for_each_entry(block, &bman_res->blocks, link) { + unsigned long start = + drm_buddy_block_offset(block) >> PAGE_SHIFT; + + if (start < bman->visible_size) { + unsigned long end = start + + (drm_buddy_block_size(mm, block) >> PAGE_SHIFT); + + bman_res->used_visible_size += + min(end, bman->visible_size) - start; + } + } + } + + if (bman_res->used_visible_size) { + mutex_lock(&bman->lock); + bman->visible_avail -= bman_res->used_visible_size; + mutex_unlock(&bman->lock); + } + *res = &bman_res->base; return 0;
@@ -128,6 +162,7 @@ static void i915_ttm_buddy_man_free(struct ttm_resource_manager *man,
mutex_lock(&bman->lock); drm_buddy_free_list(&bman->mm, &bman_res->blocks); + bman->visible_avail += bman_res->used_visible_size; mutex_unlock(&bman->lock);
ttm_resource_fini(man, res); @@ -143,6 +178,12 @@ static void i915_ttm_buddy_man_debug(struct ttm_resource_manager *man, mutex_lock(&bman->lock); drm_printf(printer, "default_page_size: %lluKiB\n", bman->default_page_size >> 10); + drm_printf(printer, "visible_avail: %lluMiB\n", + (u64)bman->visible_avail << PAGE_SHIFT >> 20); + drm_printf(printer, "visible_size: %lluMiB\n", + (u64)bman->visible_size << PAGE_SHIFT >> 20); + drm_printf(printer, "visible_reserved: %lluMiB\n", + (u64)bman->visible_reserved << PAGE_SHIFT >> 20);
drm_buddy_print(&bman->mm, printer);
@@ -164,6 +205,7 @@ static const struct ttm_resource_manager_func i915_ttm_buddy_manager_func = { * @type: Memory type we want to manage * @use_tt: Set use_tt for the manager * @size: The size in bytes to manage + * @visible_size: The CPU visible size in bytes to manage * @default_page_size: The default minimum page size in bytes for allocations, * this must be at least as large as @chunk_size, and can be overridden by * setting the BO page_alignment, to be larger or smaller as needed. @@ -187,7 +229,7 @@ static const struct ttm_resource_manager_func i915_ttm_buddy_manager_func = { */ int i915_ttm_buddy_man_init(struct ttm_device *bdev, unsigned int type, bool use_tt, - u64 size, u64 default_page_size, + u64 size, u64 visible_size, u64 default_page_size, u64 chunk_size) { struct ttm_resource_manager *man; @@ -206,6 +248,8 @@ int i915_ttm_buddy_man_init(struct ttm_device *bdev, INIT_LIST_HEAD(&bman->reserved); GEM_BUG_ON(default_page_size < chunk_size); bman->default_page_size = default_page_size; + bman->visible_size = visible_size >> PAGE_SHIFT; + bman->visible_avail = bman->visible_size;
man = &bman->manager; man->use_tt = use_tt; @@ -250,6 +294,8 @@ int i915_ttm_buddy_man_fini(struct ttm_device *bdev, unsigned int type) mutex_lock(&bman->lock); drm_buddy_free_list(mm, &bman->reserved); drm_buddy_fini(mm); + bman->visible_avail += bman->visible_reserved; + WARN_ON_ONCE(bman->visible_avail != bman->visible_size); mutex_unlock(&bman->lock);
ttm_resource_manager_cleanup(man); @@ -273,6 +319,7 @@ int i915_ttm_buddy_man_reserve(struct ttm_resource_manager *man, { struct i915_ttm_buddy_manager *bman = to_buddy_manager(man); struct drm_buddy *mm = &bman->mm; + unsigned long fpfn = start >> PAGE_SHIFT; unsigned long flags = 0; int ret;
@@ -284,8 +331,27 @@ int i915_ttm_buddy_man_reserve(struct ttm_resource_manager *man, size, mm->chunk_size, &bman->reserved, flags); + + if (fpfn < bman->visible_size) { + unsigned long lpfn = fpfn + (size >> PAGE_SHIFT); + unsigned long visible = min(lpfn, bman->visible_size) - fpfn; + + bman->visible_reserved += visible; + bman->visible_avail -= visible; + } mutex_unlock(&bman->lock);
return ret; }
+/** + * i915_ttm_buddy_man_visible_size - Return the size of the CPU visible portion + * in pages. + * @man: The buddy allocator ttm manager + */ +u64 i915_ttm_buddy_man_visible_size(struct ttm_resource_manager *man) +{ + struct i915_ttm_buddy_manager *bman = to_buddy_manager(man); + + return bman->visible_size; +} diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h index 72c90b432e87..35fe03a6a78c 100644 --- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h +++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h @@ -21,6 +21,8 @@ struct drm_buddy; * @base: struct ttm_resource base class we extend * @blocks: the list of struct i915_buddy_block for this resource/allocation * @flags: DRM_BUDDY_*_ALLOCATION flags + * @used_visible_size: How much of this resource, if any, uses the CPU visible + * portion, in pages. * @mm: the struct i915_buddy_mm for this resource * * Extends the struct ttm_resource to manage an address space allocation with @@ -30,6 +32,7 @@ struct i915_ttm_buddy_resource { struct ttm_resource base; struct list_head blocks; unsigned long flags; + unsigned long used_visible_size; struct drm_buddy *mm; };
@@ -48,11 +51,14 @@ to_ttm_buddy_resource(struct ttm_resource *res)
int i915_ttm_buddy_man_init(struct ttm_device *bdev, unsigned type, bool use_tt, - u64 size, u64 default_page_size, u64 chunk_size); + u64 size, u64 visible_size, + u64 default_page_size, u64 chunk_size); int i915_ttm_buddy_man_fini(struct ttm_device *bdev, unsigned int type);
int i915_ttm_buddy_man_reserve(struct ttm_resource_manager *man, u64 start, u64 size);
+u64 i915_ttm_buddy_man_visible_size(struct ttm_resource_manager *man); + #endif diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c index c055029c7a70..96a0bd82c1f3 100644 --- a/drivers/gpu/drm/i915/intel_region_ttm.c +++ b/drivers/gpu/drm/i915/intel_region_ttm.c @@ -87,6 +87,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
ret = i915_ttm_buddy_man_init(bdev, mem_type, false, resource_size(&mem->region), + mem->io_size, mem->min_page_size, PAGE_SIZE); if (ret) return ret;
Differentiate between mappable vs non-mappable resources, also if this is an actual range allocation ensure we set res->start as the starting pfn. Later when we need to do non-mappable -> mappable moves then we want TTM to see that the current placement is not compatible, which should result in an actual move, instead of being turned into a noop.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c index e47a3d46c6ff..0ac6b2463fd5 100644 --- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c +++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c @@ -141,6 +141,13 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man, mutex_unlock(&bman->lock); }
+ if (place->lpfn - place->fpfn == n_pages) + bman_res->base.start = place->fpfn; + else if (lpfn <= bman->visible_size) + bman_res->base.start = 0; + else + bman_res->base.start = bman->visible_size; + *res = &bman_res->base; return 0;
Otherwise we get -EINVAL, instead of the more useful -E2BIG if the allocation doesn't fit within the pfn range, like with mappable lmem. The hugepages selftest, for example, needs this to know if a smaller size is needed.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c index 0ac6b2463fd5..92d49a3c378c 100644 --- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c +++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c @@ -82,7 +82,7 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man, lpfn = pages; }
- if (size > mm->size) { + if (size > lpfn << PAGE_SHIFT) { err = -E2BIG; goto err_free_res; }
Check that mappable vs non-mappable matches our expectations.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../drm/i915/selftests/intel_memory_region.c | 143 ++++++++++++++++++ 1 file changed, 143 insertions(+)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 247f65f02bbf..56dec9723601 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -17,6 +17,7 @@ #include "gem/i915_gem_context.h" #include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" +#include "gem/i915_gem_ttm.h" #include "gem/selftests/igt_gem_utils.h" #include "gem/selftests/mock_context.h" #include "gt/intel_engine_pm.h" @@ -512,6 +513,147 @@ static int igt_mock_max_segment(void *arg) return err; }
+static u64 igt_object_mappable_total(struct drm_i915_gem_object *obj) +{ + struct intel_memory_region *mr = obj->mm.region; + struct i915_ttm_buddy_resource *bman_res = + to_ttm_buddy_resource(obj->mm.res); + struct drm_buddy *mm = bman_res->mm; + struct drm_buddy_block *block; + u64 total; + + total = 0; + list_for_each_entry(block, &bman_res->blocks, link) { + u64 start = drm_buddy_block_offset(block); + u64 end = start + drm_buddy_block_size(mm, block); + + if (start < mr->io_size) + total += min_t(u64, end, mr->io_size) - start; + } + + return total; +} + +static int igt_mock_io_size(void *arg) +{ + struct intel_memory_region *mr = arg; + struct drm_i915_private *i915 = mr->i915; + struct drm_i915_gem_object *obj; + u64 mappable_theft_total; + u64 io_size; + u64 total; + u64 ps; + u64 rem; + u64 size; + I915_RND_STATE(prng); + LIST_HEAD(objects); + int err = 0; + + ps = SZ_4K; + if (i915_prandom_u64_state(&prng) & 1) + ps = SZ_64K; /* For something like DG2 */ + + div64_u64_rem(i915_prandom_u64_state(&prng), SZ_8G, &total); + total = round_down(total, ps); + total = max_t(u64, total, SZ_1G); + + div64_u64_rem(i915_prandom_u64_state(&prng), total - ps, &io_size); + io_size = round_down(io_size, ps); + io_size = max_t(u64, io_size, SZ_256M); /* 256M seems to be the common lower limit */ + + pr_info("%s with ps=%llx, io_size=%llx, total=%llx\n", + __func__, ps, io_size, total); + + mr = mock_region_create(i915, 0, total, ps, 0, io_size); + if (IS_ERR(mr)) { + err = PTR_ERR(mr); + goto out_err; + } + + mappable_theft_total = 0; + rem = total - io_size; + do { + div64_u64_rem(i915_prandom_u64_state(&prng), rem, &size); + size = round_down(size, ps); + size = max(size, ps); + + obj = igt_object_create(mr, &objects, size, + I915_BO_ALLOC_GPU_ONLY); + if (IS_ERR(obj)) { + pr_err("%s TOPDOWN failed with rem=%llx, size=%llx\n", + __func__, rem, size); + err = PTR_ERR(obj); + goto out_close; + } + + mappable_theft_total += igt_object_mappable_total(obj); + rem -= size; + } while (rem); + + pr_info("%s mappable theft=(%lluMiB/%lluMiB), total=%lluMiB\n", + __func__, + (u64)mappable_theft_total >> 20, + (u64)io_size >> 20, + (u64)total >> 20); + + /* + * Even if we allocate all of the non-mappable portion, we should still + * be able to dip into the mappable portion. + */ + obj = igt_object_create(mr, &objects, io_size, + I915_BO_ALLOC_GPU_ONLY); + if (IS_ERR(obj)) { + pr_err("%s allocation unexpectedly failed\n", __func__); + err = PTR_ERR(obj); + goto out_close; + } + + close_objects(mr, &objects); + + rem = io_size; + do { + div64_u64_rem(i915_prandom_u64_state(&prng), rem, &size); + size = round_down(size, ps); + size = max(size, ps); + + obj = igt_object_create(mr, &objects, size, 0); + if (IS_ERR(obj)) { + pr_err("%s MAPPABLE failed with rem=%llx, size=%llx\n", + __func__, rem, size); + err = PTR_ERR(obj); + goto out_close; + } + + if (igt_object_mappable_total(obj) != size) { + pr_err("%s allocation is not mappable(size=%llx)\n", + __func__, size); + err = -EINVAL; + goto out_close; + } + rem -= size; + } while (rem); + + /* + * We assume CPU access is required by default, which should result in a + * failure here, even though the non-mappable portion is free. + */ + obj = igt_object_create(mr, &objects, ps, 0); + if (!IS_ERR(obj)) { + pr_err("%s allocation unexpectedly succeeded\n", __func__); + err = -EINVAL; + goto out_close; + } + +out_close: + close_objects(mr, &objects); + intel_memory_region_destroy(mr); +out_err: + if (err == -ENOMEM) + err = 0; + + return err; +} + static int igt_gpu_write_dw(struct intel_context *ce, struct i915_vma *vma, u32 dword, @@ -1179,6 +1321,7 @@ int intel_memory_region_mock_selftests(void) SUBTEST(igt_mock_contiguous), SUBTEST(igt_mock_splintered_region), SUBTEST(igt_mock_max_segment), + SUBTEST(igt_mock_io_size), }; struct intel_memory_region *mem; struct drm_i915_private *i915;
If we need to make room for some mappable object, then we should only victimize objects that have one or pages that occupy the visible portion of LMEM. Let's also create a new priority hint for objects that are placed in mappable memory, where we know that CPU access was requested, that way we hopefully victimize these last.
v2(Thomas): s/TTM_PL_PRIV/I915_PL_LMEM0/
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 65 ++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 36e77fcf2ef9..3790e569d6a9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -5,8 +5,10 @@
#include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> +#include <drm/drm_buddy.h>
#include "i915_drv.h" +#include "i915_ttm_buddy_manager.h" #include "intel_memory_region.h" #include "intel_region_ttm.h"
@@ -20,6 +22,7 @@ #define I915_TTM_PRIO_PURGE 0 #define I915_TTM_PRIO_NO_PAGES 1 #define I915_TTM_PRIO_HAS_PAGES 2 +#define I915_TTM_PRIO_NEEDS_CPU_ACCESS 3
/* * Size of struct ttm_place vector in on-stack struct ttm_placement allocs @@ -337,6 +340,7 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo, const struct ttm_place *place) { struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); + struct ttm_resource *res = bo->resource;
if (!obj) return false; @@ -350,7 +354,48 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo, return false;
/* Will do for now. Our pinned objects are still on TTM's LRU lists */ - return i915_gem_object_evictable(obj); + if (!i915_gem_object_evictable(obj)) + return false; + + switch (res->mem_type) { + case I915_PL_LMEM0: { + struct ttm_resource_manager *man = + ttm_manager_type(bo->bdev, res->mem_type); + struct i915_ttm_buddy_resource *bman_res = + to_ttm_buddy_resource(res); + struct drm_buddy *mm = bman_res->mm; + struct drm_buddy_block *block; + + if (!place->fpfn && !place->lpfn) + return true; + + GEM_BUG_ON(!place->lpfn); + + /* + * If we just want something mappable then we can quickly check + * if the current victim resource is using any of the CPU + * visible portion. + */ + if (!place->fpfn && + place->lpfn == i915_ttm_buddy_man_visible_size(man)) + return bman_res->used_visible_size > 0; + + /* Real range allocation */ + list_for_each_entry(block, &bman_res->blocks, link) { + unsigned long fpfn = + drm_buddy_block_offset(block) >> PAGE_SHIFT; + unsigned long lpfn = fpfn + + (drm_buddy_block_size(mm, block) >> PAGE_SHIFT); + + if (place->fpfn < lpfn && place->lpfn > fpfn) + return true; + } + return false; + } default: + break; + } + + return true; }
static void i915_ttm_evict_flags(struct ttm_buffer_object *bo, @@ -850,7 +895,23 @@ void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj) } else if (!i915_gem_object_has_pages(obj)) { bo->priority = I915_TTM_PRIO_NO_PAGES; } else { - bo->priority = I915_TTM_PRIO_HAS_PAGES; + struct ttm_resource_manager *man = + ttm_manager_type(bo->bdev, bo->resource->mem_type); + + /* + * If we need to place an LMEM resource which doesn't need CPU + * access then we should try not to victimize mappable objects + * first, since we likely end up stealing more of the mappable + * portion. And likewise when we try to find space for a mappble + * object, we know not to ever victimize objects that don't + * occupy any mappable pages. + */ + if (i915_ttm_cpu_maps_iomem(bo->resource) && + i915_ttm_buddy_man_visible_size(man) < man->size && + !(obj->flags & I915_BO_ALLOC_GPU_ONLY)) + bo->priority = I915_TTM_PRIO_NEEDS_CPU_ACCESS; + else + bo->priority = I915_TTM_PRIO_HAS_PAGES; }
ttm_bo_move_to_lru_tail(bo, bo->resource, NULL);
The end goal is to have userspace tell the kernel what buffers will require CPU access, however if we ever reach the CPU fault handler, and the current resource is not mappable, then we should attempt to migrate the buffer to the mappable portion of LMEM, or even system memory, if the allowable placements permit it.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 54 ++++++++++++++++++++++--- 1 file changed, 48 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 3790e569d6a9..7940dfec1d56 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -636,11 +636,24 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo) i915_ttm_purge(obj); }
+static bool i915_ttm_resource_mappable(struct ttm_resource *res) +{ + struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res); + + if (!i915_ttm_cpu_maps_iomem(res)) + return true; + + return bman_res->used_visible_size == bman_res->base.num_pages; +} + static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem) { if (!i915_ttm_cpu_maps_iomem(mem)) return 0;
+ if (!i915_ttm_resource_mappable(mem)) + return -EINVAL; + mem->bus.caching = ttm_write_combined; mem->bus.is_iomem = true;
@@ -779,14 +792,15 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj) * Gem forced migration using the i915_ttm_migrate() op, is allowed even * to regions that are not in the object's list of allowable placements. */ -static int i915_ttm_migrate(struct drm_i915_gem_object *obj, - struct intel_memory_region *mr) +static int __i915_ttm_migrate(struct drm_i915_gem_object *obj, + struct intel_memory_region *mr, + unsigned int flags) { struct ttm_place requested; struct ttm_placement placement; int ret;
- i915_ttm_place_from_region(mr, &requested, obj->flags); + i915_ttm_place_from_region(mr, &requested, flags); placement.num_placement = 1; placement.num_busy_placement = 1; placement.placement = &requested; @@ -809,6 +823,12 @@ static int i915_ttm_migrate(struct drm_i915_gem_object *obj, return 0; }
+static int i915_ttm_migrate(struct drm_i915_gem_object *obj, + struct intel_memory_region *mr) +{ + return __i915_ttm_migrate(obj, mr, obj->flags); +} + static void i915_ttm_put_pages(struct drm_i915_gem_object *obj, struct sg_table *st) { @@ -953,9 +973,6 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) if (!obj) return VM_FAULT_SIGBUS;
- if (obj->flags & I915_BO_ALLOC_GPU_ONLY) - return -EINVAL; - /* Sanity check that we allow writing into this object */ if (unlikely(i915_gem_object_is_readonly(obj) && area->vm_flags & VM_WRITE)) @@ -970,6 +987,31 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) return VM_FAULT_SIGBUS; }
+ if (!i915_ttm_resource_mappable(bo->resource)) { + int err = -ENODEV; + int i; + + for (i = 0; i < obj->mm.n_placements; i++) { + struct intel_memory_region *mr = obj->mm.placements[i]; + unsigned int flags; + + if (!mr->io_size && mr->type != INTEL_MEMORY_SYSTEM) + continue; + + flags = obj->flags; + flags &= ~I915_BO_ALLOC_GPU_ONLY; + err = __i915_ttm_migrate(obj, mr, flags); + if (!err) + break; + } + + if (err) { + drm_dbg(dev, "Unable to make resource CPU accessible\n"); + dma_resv_unlock(bo->base.resv); + return VM_FAULT_SIGBUS; + } + } + if (drm_dev_enter(dev, &idx)) { ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot, TTM_BO_VM_NUM_PREFAULT);
Exercise each of the migration scenarios, verifying that the final placement and buffer contents match our expectations.
v2(Thomas): Replace for_i915_gem_ww() block with simpler object_lock()
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../drm/i915/gem/selftests/i915_gem_mman.c | 304 ++++++++++++++++++ 1 file changed, 304 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index ba29767348be..7a73a0b015b9 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -10,6 +10,7 @@ #include "gt/intel_gpu_commands.h" #include "gt/intel_gt.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_migrate.h" #include "gem/i915_gem_region.h" #include "huge_gem_object.h" #include "i915_selftest.h" @@ -999,6 +1000,308 @@ static int igt_mmap(void *arg) return 0; }
+static void igt_close_objects(struct drm_i915_private *i915, + struct list_head *objects) +{ + struct drm_i915_gem_object *obj, *on; + + list_for_each_entry_safe(obj, on, objects, st_link) { + i915_gem_object_lock(obj, NULL); + if (i915_gem_object_has_pinned_pages(obj)) + i915_gem_object_unpin_pages(obj); + /* No polluting the memory region between tests */ + __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); + list_del(&obj->st_link); + i915_gem_object_put(obj); + } + + cond_resched(); + + i915_gem_drain_freed_objects(i915); +} + +static void igt_make_evictable(struct list_head *objects) +{ + struct drm_i915_gem_object *obj; + + list_for_each_entry(obj, objects, st_link) { + i915_gem_object_lock(obj, NULL); + if (i915_gem_object_has_pinned_pages(obj)) + i915_gem_object_unpin_pages(obj); + i915_gem_object_unlock(obj); + } + + cond_resched(); +} + +static int igt_fill_mappable(struct intel_memory_region *mr, + struct list_head *objects) +{ + u64 size, total; + int err; + + total = 0; + size = mr->io_size; + do { + struct drm_i915_gem_object *obj; + + obj = i915_gem_object_create_region(mr, size, 0, 0); + if (IS_ERR(obj)) { + err = PTR_ERR(obj); + goto err_close; + } + + list_add(&obj->st_link, objects); + + err = i915_gem_object_pin_pages_unlocked(obj); + if (err) { + if (err != -ENXIO && err != -ENOMEM) + goto err_close; + + if (size == mr->min_page_size) { + err = 0; + break; + } + + size >>= 1; + continue; + } + + total += obj->base.size; + } while (1); + + pr_info("%s filled=%lluMiB\n", __func__, total >> 20); + return 0; + +err_close: + igt_close_objects(mr->i915, objects); + return err; +} + +static int ___igt_mmap_migrate(struct drm_i915_private *i915, + struct drm_i915_gem_object *obj, + unsigned long addr, + bool unfaultable) +{ + struct vm_area_struct *area; + int err = 0, i; + + pr_info("igt_mmap(%s, %d) @ %lx\n", + obj->mm.region->name, I915_MMAP_TYPE_FIXED, addr); + + mmap_read_lock(current->mm); + area = vma_lookup(current->mm, addr); + mmap_read_unlock(current->mm); + if (!area) { + pr_err("%s: Did not create a vm_area_struct for the mmap\n", + obj->mm.region->name); + err = -EINVAL; + goto out_unmap; + } + + for (i = 0; i < obj->base.size / sizeof(u32); i++) { + u32 __user *ux = u64_to_user_ptr((u64)(addr + i * sizeof(*ux))); + u32 x; + + if (get_user(x, ux)) { + err = -EFAULT; + if (!unfaultable) { + pr_err("%s: Unable to read from mmap, offset:%zd\n", + obj->mm.region->name, i * sizeof(x)); + goto out_unmap; + } + + continue; + } + + if (unfaultable) { + pr_err("%s: Faulted unmappable memory\n", + obj->mm.region->name); + err = -EINVAL; + goto out_unmap; + } + + if (x != expand32(POISON_INUSE)) { + pr_err("%s: Read incorrect value from mmap, offset:%zd, found:%x, expected:%x\n", + obj->mm.region->name, + i * sizeof(x), x, expand32(POISON_INUSE)); + err = -EINVAL; + goto out_unmap; + } + + x = expand32(POISON_FREE); + if (put_user(x, ux)) { + pr_err("%s: Unable to write to mmap, offset:%zd\n", + obj->mm.region->name, i * sizeof(x)); + err = -EFAULT; + goto out_unmap; + } + } + + if (unfaultable) { + if (err == -EFAULT) + err = 0; + } else { + obj->flags &= ~I915_BO_ALLOC_GPU_ONLY; + err = wc_check(obj); + } +out_unmap: + vm_munmap(addr, obj->base.size); + return err; +} + +#define IGT_MMAP_MIGRATE_TOPDOWN (1 << 0) +#define IGT_MMAP_MIGRATE_FILL (1 << 1) +#define IGT_MMAP_MIGRATE_EVICTABLE (1 << 2) +#define IGT_MMAP_MIGRATE_UNFAULTABLE (1 << 3) +static int __igt_mmap_migrate(struct intel_memory_region **placements, + int n_placements, + struct intel_memory_region *expected_mr, + unsigned int flags) +{ + struct drm_i915_private *i915 = placements[0]->i915; + struct drm_i915_gem_object *obj; + struct i915_request *rq = NULL; + unsigned long addr; + LIST_HEAD(objects); + u64 offset; + int err; + + obj = __i915_gem_object_create_user(i915, PAGE_SIZE, + placements, + n_placements); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + if (flags & IGT_MMAP_MIGRATE_TOPDOWN) + obj->flags |= I915_BO_ALLOC_GPU_ONLY; + + err = __assign_mmap_offset(obj, I915_MMAP_TYPE_FIXED, &offset, NULL); + if (err) + goto out_put; + + /* + * This will eventually create a GEM context, due to opening dummy drm + * file, which needs a tiny amount of mappable device memory for the top + * level paging structures(and perhaps scratch), so make sure we + * allocate early, to avoid tears. + */ + addr = igt_mmap_offset(i915, offset, obj->base.size, + PROT_WRITE, MAP_SHARED); + if (IS_ERR_VALUE(addr)) { + err = addr; + goto out_put; + } + + if (flags & IGT_MMAP_MIGRATE_FILL) { + err = igt_fill_mappable(placements[0], &objects); + if (err) + goto out_put; + } + + err = i915_gem_object_lock(obj, NULL); + if (err) + goto out_put; + + err = i915_gem_object_pin_pages(obj); + if (err) { + i915_gem_object_unlock(obj); + goto out_put; + } + + err = intel_context_migrate_clear(to_gt(i915)->migrate.context, NULL, + obj->mm.pages->sgl, obj->cache_level, + i915_gem_object_is_lmem(obj), + expand32(POISON_INUSE), &rq); + i915_gem_object_unpin_pages(obj); + if (rq) { + dma_resv_add_excl_fence(obj->base.resv, &rq->fence); + i915_gem_object_set_moving_fence(obj, &rq->fence); + i915_request_put(rq); + } + i915_gem_object_unlock(obj); + if (err) + goto out_put; + + if (flags & IGT_MMAP_MIGRATE_EVICTABLE) + igt_make_evictable(&objects); + + err = ___igt_mmap_migrate(i915, obj, addr, + flags & IGT_MMAP_MIGRATE_UNFAULTABLE); + if (!err && obj->mm.region != expected_mr) { + pr_err("%s region mismatch %s\n", __func__, expected_mr->name); + err = -EINVAL; + } + +out_put: + i915_gem_object_put(obj); + igt_close_objects(i915, &objects); + return err; +} + +static int igt_mmap_migrate(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct intel_memory_region *system = i915->mm.regions[INTEL_REGION_SMEM]; + struct intel_memory_region *mr; + enum intel_region_id id; + + for_each_memory_region(mr, i915, id) { + struct intel_memory_region *mixed[] = { mr, system }; + struct intel_memory_region *single[] = { mr }; + int err; + + if (mr->private) + continue; + + if (!mr->io_size || mr->io_size == mr->total) + continue; + + /* + * Allocate in the mappable portion, should be no suprises here. + */ + err = __igt_mmap_migrate(mixed, ARRAY_SIZE(mixed), mr, 0); + if (err) + return err; + + /* + * Allocate in the non-mappable portion, but force migrating to + * the mappable portion on fault (LMEM -> LMEM) + */ + err = __igt_mmap_migrate(single, ARRAY_SIZE(single), mr, + IGT_MMAP_MIGRATE_TOPDOWN | + IGT_MMAP_MIGRATE_FILL | + IGT_MMAP_MIGRATE_EVICTABLE); + if (err) + return err; + + /* + * Allocate in the non-mappable portion, but force spilling into + * system memory on fault (LMEM -> SMEM) + */ + err = __igt_mmap_migrate(mixed, ARRAY_SIZE(mixed), system, + IGT_MMAP_MIGRATE_TOPDOWN | + IGT_MMAP_MIGRATE_FILL); + if (err) + return err; + + /* + * Allocate in the non-mappable portion, but since the mappable + * portion is already full, and we can't spill to system memory, + * then we should expect the fault to fail. + */ + err = __igt_mmap_migrate(single, ARRAY_SIZE(single), mr, + IGT_MMAP_MIGRATE_TOPDOWN | + IGT_MMAP_MIGRATE_FILL | + IGT_MMAP_MIGRATE_UNFAULTABLE); + if (err) + return err; + } + + return 0; +} + static const char *repr_mmap_type(enum i915_mmap_type type) { switch (type) { @@ -1424,6 +1727,7 @@ int i915_gem_mman_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_smoke_tiling), SUBTEST(igt_mmap_offset_exhaustion), SUBTEST(igt_mmap), + SUBTEST(igt_mmap_migrate), SUBTEST(igt_mmap_access), SUBTEST(igt_mmap_revoke), SUBTEST(igt_mmap_gpu),
If we have to contend with non-mappable LMEM, then we need to ensure the object fits within the mappable portion, like in the selftests, where we later try to CPU access the pages. However if it can't then we need to gracefully handle this, without throwing an error.
Also it looks like TTM will return -ENOMEM, in ttm_bo_mem_space() after exhausting all possible placements.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- drivers/gpu/drm/i915/selftests/intel_memory_region.c | 8 +++++++- 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 42db9cd30978..3caa178bbd07 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1344,7 +1344,7 @@ static int igt_ppgtt_smoke_huge(void *arg)
err = i915_gem_object_pin_pages_unlocked(obj); if (err) { - if (err == -ENXIO || err == -E2BIG) { + if (err == -ENXIO || err == -E2BIG || err == -ENOMEM) { i915_gem_object_put(obj); size >>= 1; goto try_again; diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 56dec9723601..ba32893e0873 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -822,8 +822,14 @@ static int igt_lmem_create_with_ps(void *arg)
i915_gem_object_lock(obj, NULL); err = i915_gem_object_pin_pages(obj); - if (err) + if (err) { + if (err == -ENXIO || err == -E2BIG || err == -ENOMEM) { + pr_info("%s not enough lmem for ps(%u) err=%d\n", + __func__, ps, err); + err = 0; + } goto out_put; + }
daddr = i915_gem_object_get_dma_address(obj, 0); if (!IS_ALIGNED(daddr, ps)) {
Starting from DG2+, when dealing with LMEM, we assume that by default all userspace allocations should be placed in the non-mappable portion of LMEM. Note that dumb buffers are not included here, since these are not "GPU accelerated" and likely need CPU access. We choose to just always set GPU_ONLY, and let the backend figure out if that should be ignored on discrete devices.
In a later patch userspace will be able to provide a hint if CPU access to the buffer is needed.
v2(Thomas) - Apply GPU_ONLY on all discrete devices, but only if the BO can be placed in LMEM. Down in the depths this should be turned into a noop, where required, and as an annotation it still make some sense. If we apply it regardless of the placements then we end up needing to check the placements during exec capture. Also it's slightly inconsistent since the NEEDS_CPU_ACCESS can only be applied on objects that can be placed in LMEM. The other annoyance would be gem_create_ext vs plain gem_create, if we were to always apply GPU_ONLY.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index 9402d4bf4ffc..ecb8c2feec46 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -424,6 +424,14 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, ext_data.n_placements = 1; }
+ /* + * TODO: add a userspace hint to force CPU_ACCESS for the object, which + * can override this. + */ + if (ext_data.n_placements > 1 || + ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM) + ext_data.flags |= I915_BO_ALLOC_GPU_ONLY; + obj = __i915_gem_object_create_user_ext(i915, args->size, ext_data.placements, ext_data.n_placements,
On 2/11/22 12:34, Matthew Auld wrote:
Starting from DG2+, when dealing with LMEM, we assume that by default all userspace allocations should be placed in the non-mappable portion of LMEM. Note that dumb buffers are not included here, since these are not "GPU accelerated" and likely need CPU access. We choose to just always set GPU_ONLY, and let the backend figure out if that should be ignored on discrete devices.
In a later patch userspace will be able to provide a hint if CPU access to the buffer is needed.
v2(Thomas)
- Apply GPU_ONLY on all discrete devices, but only if the BO can be placed in LMEM. Down in the depths this should be turned into a noop, where required, and as an annotation it still make some sense. If we apply it regardless of the placements then we end up needing to check the placements during exec capture. Also it's slightly inconsistent since the NEEDS_CPU_ACCESS can only be applied on objects that can be placed in LMEM. The other annoyance would be gem_create_ext vs plain gem_create, if we were to always apply GPU_ONLY.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
drivers/gpu/drm/i915/gem/i915_gem_create.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index 9402d4bf4ffc..ecb8c2feec46 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -424,6 +424,14 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, ext_data.n_placements = 1; }
- /*
* TODO: add a userspace hint to force CPU_ACCESS for the object, which
* can override this.
*/
- if (ext_data.n_placements > 1 ||
ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
- obj = __i915_gem_object_create_user_ext(i915, args->size, ext_data.placements, ext_data.n_placements,
If set, force the allocation to be placed in the mappable portion of LMEM. One big restriction here is that system memory must be given as a potential placement for the object, that way we can always spill the object into system memory if we can't make space.
XXX: Still needs IGTs. Including now just for the sake of having more complete picture.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 26 ++++++++++++------ include/uapi/drm/i915_drm.h | 31 +++++++++++++++++++++- 2 files changed, 48 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index ecb8c2feec46..6d4a637ec654 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -238,6 +238,7 @@ struct create_ext { struct drm_i915_private *i915; struct intel_memory_region *placements[INTEL_REGION_UNKNOWN]; unsigned int n_placements; + unsigned int placement_mask; unsigned long flags; };
@@ -334,6 +335,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args, for (i = 0; i < args->num_regions; i++) ext_data->placements[i] = placements[i];
+ ext_data->placement_mask = mask; return 0;
out_dump: @@ -408,7 +410,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, struct drm_i915_gem_object *obj; int ret;
- if (args->flags) + if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) return -EINVAL;
ret = i915_user_extensions(u64_to_user_ptr(args->extensions), @@ -424,13 +426,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, ext_data.n_placements = 1; }
- /* - * TODO: add a userspace hint to force CPU_ACCESS for the object, which - * can override this. - */ - if (ext_data.n_placements > 1 || - ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM) - ext_data.flags |= I915_BO_ALLOC_GPU_ONLY; + if (args->flags & I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) { + if (ext_data.n_placements == 1) + return -EINVAL; + + /* + * We always need to be able to spill to system memory, if we + * can't place in the mappable part of LMEM. + */ + if (!(ext_data.placement_mask & BIT(INTEL_REGION_SMEM))) + return -EINVAL; + } else { + if (ext_data.n_placements > 1 || + ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM) + ext_data.flags |= I915_BO_ALLOC_GPU_ONLY; + }
obj = __i915_gem_object_create_user_ext(i915, args->size, ext_data.placements, diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 914ebd9290e5..925685bd261e 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3157,7 +3157,36 @@ struct drm_i915_gem_create_ext { * Object handles are nonzero. */ __u32 handle; - /** @flags: MBZ */ + /** + * @flags: Optional flags. + * + * Supported values: + * + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that + * the object will need to be accessed via the CPU. + * + * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and + * only strictly required on platforms where only some of the device + * memory is directly visible or mappable through the CPU, like on DG2+. + * + * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to + * ensure we can always spill the allocation to system memory, if we + * can't place the object in the mappable part of + * I915_MEMORY_CLASS_DEVICE. + * + * Note that buffers that need to be captured with EXEC_OBJECT_CAPTURE, + * will need to enable this hint, if the object can also be placed in + * I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call will + * throw an error otherwise. This also means that such objects will need + * I915_MEMORY_CLASS_SYSTEM set as a possible placement. + * + * Without this hint, the kernel will assume that non-mappable + * I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the + * kernel can still migrate the object to the mappable part, as a last + * resort, if userspace ever CPU faults this object, but this might be + * expensive, and so ideally should be avoided. + */ +#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0) __u32 flags; /** * @extensions: The chain of extensions to apply to this object.
On platforms where there might be non-mappable LMEM, force userspace to mark the buffers with the correct hint. When dumping the BO contents during capture we need CPU access. Note this only applies to buffers that can be placed in LMEM, and also doesn't impact DG1.
v2(Reported-by: kernel test robot lkp@intel.com): - Also update the function signature on !CONFIG_DRM_I915_CAPTURE_ERROR builds.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 498b458fd784..0166a37d252f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1965,7 +1965,7 @@ eb_find_first_request_added(struct i915_execbuffer *eb) #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)
/* Stage with GFP_KERNEL allocations before we enter the signaling critical path */ -static void eb_capture_stage(struct i915_execbuffer *eb) +static int eb_capture_stage(struct i915_execbuffer *eb) { const unsigned int count = eb->buffer_count; unsigned int i = count, j; @@ -1978,6 +1978,10 @@ static void eb_capture_stage(struct i915_execbuffer *eb) if (!(flags & EXEC_OBJECT_CAPTURE)) continue;
+ if (!IS_DG1(eb->i915) && + vma->obj->flags & I915_BO_ALLOC_GPU_ONLY) + return -EINVAL; + for_each_batch_create_order(eb, j) { struct i915_capture_list *capture;
@@ -1990,6 +1994,8 @@ static void eb_capture_stage(struct i915_execbuffer *eb) eb->capture_lists[j] = capture; } } + + return 0; }
/* Commit once we're in the critical path */ @@ -2031,7 +2037,7 @@ static void eb_capture_list_clear(struct i915_execbuffer *eb)
#else
-static void eb_capture_stage(struct i915_execbuffer *eb) +static int eb_capture_stage(struct i915_execbuffer *eb) { }
@@ -3418,7 +3424,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, }
ww_acquire_done(&eb.ww.ctx); - eb_capture_stage(&eb); + err = eb_capture_stage(&eb); + if (err) + goto err_vma;
out_fence = eb_requests_create(&eb, in_fence, out_fence_fd); if (IS_ERR(out_fence)) {
On 2/11/22 12:34, Matthew Auld wrote:
On platforms where there might be non-mappable LMEM, force userspace to mark the buffers with the correct hint. When dumping the BO contents during capture we need CPU access. Note this only applies to buffers that can be placed in LMEM, and also doesn't impact DG1.
v2(Reported-by: kernel test robot lkp@intel.com):
- Also update the function signature on !CONFIG_DRM_I915_CAPTURE_ERROR builds.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
Just pass along the probed io_size. The backend should be able to utilize the entire range here, even if some of it is non-mappable.
It does leave open with what to do with stolen local-memory.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index 01838b8ce4c7..ad3cf348b4a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -201,6 +201,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) struct intel_memory_region *mem; resource_size_t min_page_size; resource_size_t io_start; + resource_size_t io_size; resource_size_t lmem_size; int err;
@@ -211,7 +212,8 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
io_start = pci_resource_start(pdev, 2); - if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2))) + io_size = min(pci_resource_len(pdev, 2), lmem_size); + if (!io_size) return ERR_PTR(-ENODEV);
min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K : @@ -221,7 +223,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) lmem_size, min_page_size, io_start, - lmem_size, + io_size, INTEL_MEMORY_LOCAL, 0, &intel_region_lmem_ops);
dri-devel@lists.freedesktop.org