[PATCH v3 07/14] drm/panthor: Add the MMU/VM logical block

Mon Jan 15 11:04:15 UTC 2024

Hi Steve,

On Fri, 8 Dec 2023 14:28:05 +0000
Steven Price <steven.price at arm.com> wrote:

> > +/**
> > + * alloc_pt() - Custom page table allocator
> > + * @cookie: Cookie passed at page table allocation time.
> > + * @size: Size of the page table. This size should be fixed,
> > + * and determined at creation time based on the granule size.
> > + * @gfp: GFP flags.
> > + *
> > + * We want a custom allocator so we can use a cache for page table
> > + * allocations and amortize the cost of the over-reservation that's
> > + * done to allow asynchronous VM operations.
> > + *
> > + * Return: non-NULL on success, NULL if the allocation failed for any
> > + * reason.
> > + */
> > +static void *alloc_pt(void *cookie, size_t size, gfp_t gfp)
> > +{
> > +	struct panthor_vm *vm = cookie;
> > +	void *page;
> > +
> > +	/* Allocation of the root page table happening during init. */
> > +	if (unlikely(!vm->pgtbl_ops)) {  
> 
> I'm not that keen on using pgtbl_ops as the proxy for this. Can we use
> root_page_table instead?

Definitely, I actually intended to test ->root_page_table when I
introduced this field, but somehow forgot to update this part of the
code.

> 
> At the moment if the IOMMU code ever did multiple allocations during
> alloc_io_pgtable_ops() then we'd overwrite root_page_table and screw up
> on the free path.
> 
> If we use root_page_table == NULL as the check then things will
> 'cleanly' fail by falling through to the non-root case in that case.
> 
> Of course this really looks like we should have had a different
> allocator for the root table but I'm not (re)opening that can of worms! ;)
> 
> And of course it doesn't make any sense for the IOMMU code to do
> multiple allocations so this is all rather academic - but maybe one day
> there will be a different page table structure (16K pages maybe?).
> 
> > +		struct page *p;
> > +
> > +		drm_WARN_ON(&vm->ptdev->base, vm->op_ctx);
> > +		p = alloc_pages_node(dev_to_node(vm->ptdev->base.dev),
> > +				     gfp | __GFP_ZERO, get_order(size));
> > +		page = p ? page_address(p) : NULL;
> > +		vm->root_page_table = page;
> > +		return page;
> > +	}
> > +
> > +	/* We're not supposed to have anything bigger than 4k here, because we picked a
> > +	 * 4k granule size at init time.
> > +	 */
> > +	if (drm_WARN_ON(&vm->ptdev->base, size != SZ_4K))
> > +		return NULL;
> > +
> > +	/* We must have some op_ctx attached to the VM and it must have at least one
> > +	 * free page.
> > +	 */
> > +	if (drm_WARN_ON(&vm->ptdev->base, !vm->op_ctx) ||
> > +	    drm_WARN_ON(&vm->ptdev->base,
> > +			vm->op_ctx->rsvd_page_tables.ptr >= vm->op_ctx->rsvd_page_tables.count))
> > +		return NULL;
> > +
> > +	page = vm->op_ctx->rsvd_page_tables.pages[vm->op_ctx->rsvd_page_tables.ptr++];
> > +	memset(page, 0, SZ_4K);
> > +
> > +	/* Page table entries don't use virtual addresses, which trips out
> > +	 * kmemleak. kmemleak_alloc_phys() might work, but physical addresses
> > +	 * are mixed with other fields, and I fear kmemleak won't detect that
> > +	 * either.
> > +	 *
> > +	 * Let's just ignore memory passed to the page-table driver for now.
> > +	 */
> > +	kmemleak_ignore(page);
> > +	return page;
> > +}
> > +
> > +/**
> > + * @free_pt() - Custom page table free function
> > + * @cookie: Cookie passed at page table allocation time.
> > + * @data: Page table to free.
> > + * @size: Size of the page table. This size should be fixed,
> > + * and determined at creation time based on the granule size.
> > + */
> > +static void free_pt(void *cookie, void *data, size_t size)
> > +{
> > +	struct panthor_vm *vm = cookie;
> > +
> > +	if (unlikely(vm->root_page_table == data)) {
> > +		free_pages((unsigned long)data, get_order(size));  
> 
> Maybe add "vm->root_page_table = NULL;"?

Sure.

> 
> > +		return;
> > +	}

[...]

> > +/**
> > + * panthor_vm_alloc_va() - Allocate a region in the auto-va space
> > + * @VM: VM to allocate a region on.
> > + * @size: Size of the region.  
> 
> kerneldoc needs updating for the new arguments.

Will fix.

> 
> > + *
> > + * Some GPU objects, like heap chunks, are fully managed by the kernel and
> > + * need to be mapped to the userspace VM, in the region reserved for kernel
> > + * objects.
> > + *
> > + * This function takes care of allocating a region in this reserved space.
> > + *
> > + * Return: A valid pointer on success, and ERR_PTR() otherwise.  
> 
> Returns an error code not a pointer.

And that too.

> 
> > + */
> > +int
> > +panthor_vm_alloc_va(struct panthor_vm *vm, u64 va, u64 size,
> > +		    struct drm_mm_node *va_node)
> > +{
> > +	int ret;
> > +
> > +	if (!size || (size & ~PAGE_MASK))
> > +		return -EINVAL;
> > +
> > +	if (va != PANTHOR_VM_KERNEL_AUTO_VA && (va & ~PAGE_MASK))
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&vm->mm_lock);
> > +	if (va != PANTHOR_VM_KERNEL_AUTO_VA) {
> > +		memset(va_node, 0, sizeof(*va_node));  
> 
> This memset() seems redundant.

If we assume the va_node is initialized to zero, it's indeed redundant.
I'll update the doc to make this a caller requirement.

> I certainly can't see why it's only
> required on this path.

drm_mm_insert_node_in_range() seems to assign all fields explicitly,
while, according to the doc [1], drm_mm_reserve_node() wants the caller
to make sure the struct is zero-initialized, except for the start and
size fields.

> 
> > +		va_node->start = va;
> > +		va_node->size = size;
> > +		ret = drm_mm_reserve_node(&vm->mm, va_node);
> > +	} else {
> > +		ret = drm_mm_insert_node_in_range(&vm->mm, va_node, size,
> > +						  size >= SZ_2M ? SZ_2M : SZ_4K,
> > +						  0, vm->kernel_auto_va.start,
> > +						  vm->kernel_auto_va.end,
> > +						  DRM_MM_INSERT_BEST);
> > +	}
> > +	mutex_unlock(&vm->mm_lock);
> > +
> > +	return ret;
> > +}

[...]

> > +/*
> > + * Only 32 VMs per open file. If that becomes a limiting factor, we can
> > + * increase this number.
> > + */
> > +#define PANTHOR_MAX_VMS_PER_FILE	32
> > +
> > +/**
> > + * panthor_vm_pool_create_vm() - Create a VM
> > + * @pool: The VM to create this VM on.
> > + * @kernel_va_start: Start of the region reserved for kernel objects.
> > + * @kernel_va_range: Size of the region reserved for kernel objects.
> > + *
> > + * Return: 0 on success, a negative error code otherwise.  
> 
> Actually returns the (positive) id on success.

Will fix.

> 
> > + */
> > +int panthor_vm_pool_create_vm(struct panthor_device *ptdev,
> > +			      struct panthor_vm_pool *pool,
> > +			      struct drm_panthor_vm_create *args)
> > +{
> > +	u64 kernel_va_start, kernel_va_range;
> > +	struct panthor_vm *vm;
> > +	int ret;
> > +	u32 id;
> > +
> > +	ret = panthor_vm_create_check_args(ptdev, args, &kernel_va_start, &kernel_va_range);
> > +	if (ret)
> > +		return ret;
> > +
> > +	vm = panthor_vm_create(ptdev, false, kernel_va_start, kernel_va_range,
> > +			       kernel_va_start, kernel_va_range);
> > +	if (IS_ERR(vm))
> > +		return PTR_ERR(vm);
> > +
> > +	ret = xa_alloc(&pool->xa, &id, vm,
> > +		       XA_LIMIT(1, PANTHOR_MAX_VMS_PER_FILE), GFP_KERNEL);
> > +
> > +	if (ret) {
> > +		panthor_vm_put(vm);
> > +		return ret;
> > +	}
> > +
> > +	args->user_va_range = kernel_va_start;
> > +	return id;
> > +}

[...]

> > +/**
> > + * panthor_vm_put() - Release a reference on a VM
> > + * @vm: VM to release the reference on. Can be NULL.
> > + */
> > +void panthor_vm_put(struct panthor_vm *vm)
> > +{
> > +	static_assert(offsetof(struct panthor_vm, base) == 0);  
> 
> Yuk! ;)
> 
> I'd prefer:
> 
>   drm_gpuvm_put(vm ? &vm->base : NULL);
> 
> which my compiler turns into the same thing rather than relying on the
> type punning. You can keep the static_assert if you like, but I don't
> like relying on it for correct code generation. Although I'll admit I
> couldn't actually get the compiler to produce incorrect code when I tried.

Sure, I'll pick your suggestion here.

> 
> > +	drm_gpuvm_put(&vm->base);
> > +}

[...]

> > +
> > +/**
> > + * panthor_vm_prepare_mapped_bos_resvs() - Prepare resvs on VM BOs.
> > + * @exec: Locking/preparation context.
> > + * @vm: VM targeted by the GPU job.
> > + * @slot_count: Number of slots to reserve.
> > + *
> > + * GPU jobs assume all BOs bound to the VM at the time the job is submitted
> > + * are available when the job is executed. In order to guarantee that, we
> > + * need to reserve a slot on all BOs mapped to a VM and update this slot with
> > + * the job fence after its submission.
> > + *
> > + * Return: 0 on success, a negative error code otherwise.
> > + */
> > +int panthor_vm_prepare_mapped_bos_resvs(struct drm_exec *exec, struct panthor_vm *vm,
> > +					u32 slot_count)
> > +{
> > +	int ret;
> > +
> > +	/* Acquire the VM lock an reserve a slot for this GPU job. */
> > +	ret = drm_gpuvm_prepare_vm(&vm->base, exec, slot_count);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* VM operations are not protected by the VM resv-lock. We need to
> > +	 * take the op_lock to make sure the shared_bos list is not updated
> > +	 * while we're walking it.
> > +	 */  
> 
> Is the above comment stale? AFAIK the shared_bos list doesn't exist
> anymore and this doesn't appear to relate to anything here.

Oops, indeed. That predates to transition to drm_gpuvm for the VM <-> BO
association.

Thanks for the review!

[1]https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_mm.c#L441