[PATCH v2 08/15] drm/panthor: Add the MMU/VM logical block
Boris Brezillon
boris.brezillon at collabora.com
Tue Aug 29 15:33:17 UTC 2023
On Mon, 14 Aug 2023 16:53:09 +0100
Steven Price <steven.price at arm.com> wrote:
> > +
> > +/**
> > + * struct panthor_vm_op_ctx - VM operation context
> > + *
> > + * With VM operations potentially taking place in a dma-signaling path, we
> > + * need to make sure everything that might require resource allocation is
> > + * pre-allocated upfront. This is what this operation context is far.
> > + *
> > + * We also collect resources that have been freed, so we can release them
> > + * asynchronously, and let the VM_BIND scheduler process the next VM_BIND
> > + * request.
> > + */
> > +struct panthor_vm_op_ctx {
> > + /** @rsvd_page_tables: Pages reserved for the MMU page table update. */
> > + struct {
> > + /** @count: Number of pages reserved. */
> > + u32 count;
> > +
> > + /** @ptr: Point to the first unused page in the @pages table. */
> > + u32 ptr;
> > +
> > + /**
> > + * @page: Array of pages that can be used for an MMU page table update.
> > + *
> > + * After an VM operation, there might be free pages left in this array.
> > + * They should be returned to the pt_cache as part of the op_ctx cleanup.
> > + */
> > + void **pages;
> > + } rsvd_page_tables;
>
> Two questions:
>
> 1) Would a mempool simplify the implementation? It looks like a
> reasonable match.
Not sure what you mean by mempool, but I'm using a kmem_cache here for
all page table allocations. The pages that are passed to
panthor_vm_op_ctx::rsvd_page_tables::pages are allocated from this
pool. It's just that for each VM operation we pre-allocate page-tables,
and release those that were not used when the operation is done (we
over-allocate for the worst case scenario).
>
> 2) Does it really make sense to have a separate pool of memory for every
> operation? Instead of having a separate pool for each operation, it
> would be possible to just keep track of the total number needed for all
> outstanding operations. Then a single (per device or maybe per-VM if
> necessary) mempool could be resized to ensure it has the right amount of
> space.
The pool is per-driver (see the global pt_cache). rsvd_page_tables just
holds pages needed for a specific VM operation. To be more specific, it
holds pages for the worst case (page table tree is empty, except for the
root page table).
>
> I'm also a little wary that the VM_BIND infrastructure could potentially
> be abused to trigger a large amount of kernel allocation as it allocates
> up-front for the worst case but those pages are not charged to the
> process (AFAICT). But I haven't fully got my head round that yet.
Yep, that's problematic, indeed. I considered allocating page tables
as GEM objects, but the overhead of a GEM object is quite big
(hundreds of bytes of meta-data) compared to the size of a page table
(4k), and kmem_cache was just super convenient for this page table
cache :-).
>
> > +
> > + /** @flags: Combination of drm_panthor_vm_bind_op_flags. */
> > + u32 flags;
> > +
> > + /** @va: Virtual range targeted by the VM operation. */
> > + struct {
> > + /** @addr: Start address. */
> > + u64 addr;
> > +
> > + /** @range: Range size. */
> > + u64 range;
> > + } va;
> > +
> > + /**
> > + * @returned_vmas: List of panthor_vma objects returned after a VM operation.
> > + *
> > + * For unmap operations, this will contain all VMAs that were covered by the
> > + * specified VA range.
> > + *
> > + * For map operations, this will contain all VMAs that previously mapped to
> > + * the specified VA range.
> > + *
> > + * Those VMAs, and the resources they point to will be released as part of
> > + * the op_ctx cleanup operation.
> > + */
> > + struct list_head returned_vmas;
> > +
> > + /** @map: Fields specific to a map operation. */
> > + struct {
> > + /** @gem: GEM object information. */
> > + struct {
> > + /** @obj: GEM object to map. */
> > + struct drm_gem_object *obj;
> > +
> > + /** @offset: Offset in the GEM object. */
> > + u64 offset;
> > + } gem;
> > +
> > + /**
> > + * @sgt: sg-table pointing to pages backing the GEM object.
> > + *
> > + * This is gathered at job creation time, such that we don't have
> > + * to allocate in ::run_job().
> > + */
> > + struct sg_table *sgt;
> > +
> > + /**
> > + * @prev_vma: Pre-allocated VMA object to deal with a remap situation.
> > + *
> > + * If the map request covers a region that's inside another VMA, the
> > + * previous VMA will be split, requiring instantiation of a maximum of
> > + * two new VMA objects.
> > + */
> > + struct panthor_vma *prev_vma;
> > +
> > + /**
> > + * @new_vma: The new VMA object that will be inserted to the VA tree.
> > + */
> > + struct panthor_vma *new_vma;
> > +
> > + /**
> > + * @next_vma: Pre-allocated VMA object to deal with a remap situation.
> > + *
> > + * See @prev_vma.
> > + */
> > + struct panthor_vma *next_vma;
>
> It's probably premature optimization, but it feels like having a cache
> of these VMA structures might be an idea.
If it's needed, I'll probably go for a kmem_cache, but I need to
check if it's worth it first (if the closest kmalloc cache is
significantly biffer than the struct size).
> I'm also struggling to
> understand how both a new prev and new next VMA are needed - but I
> haven't dug into the GPU VA manager.
prev/next are for mapping splits: an object is already mapped, and a new
object is mapped in the middle of this pre-existing mapping. In that
case, we need 2 vma object for the preceeding and succeeding mappings,
since the old mapping object will be released.
new_vma is for the new mapping.
>
> > + } map;
> > +};
> > +
[...]
> > +/**
> > + * panthor_vm_active() - Flag a VM as active
> > + * @VM: VM to flag as active.
> > + *
> > + * Assigns an address space to a VM so it can be used by the GPU/MCU.
> > + *
> > + * Return: 0 on success, a negative error code otherwise.
> > + */
> > +int panthor_vm_active(struct panthor_vm *vm)
> > +{
> > + struct panthor_device *ptdev = vm->ptdev;
> > + struct io_pgtable_cfg *cfg = &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg;
> > + int ret = 0, as, cookie;
> > + u64 transtab, transcfg;
> > +
> > + if (!drm_dev_enter(&ptdev->base, &cookie))
> > + return -ENODEV;
> > +
> > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > +
> > + as = vm->as.id;
> > + if (as >= 0) {
> > + u32 mask = panthor_mmu_as_fault_mask(ptdev, as);
> > +
> > + if (ptdev->mmu->as.faulty_mask & mask) {
> > + /* Unhandled pagefault on this AS, the MMU was
> > + * disabled. We need to re-enable the MMU after
> > + * clearing+unmasking the AS interrupts.
> > + */
> > + gpu_write(ptdev, MMU_INT_CLEAR, mask);
> > + ptdev->mmu->as.faulty_mask &= ~mask;
> > + gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask);
> > + goto out_enable_as;
> > + }
> > +
> > + goto out_unlock;
> > + }
> > +
> > + /* Check for a free AS */
> > + if (vm->for_mcu) {
> > + drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0));
> > + as = 0;
> > + } else {
> > + as = ffz(ptdev->mmu->as.alloc_mask | BIT(0));
> > + }
> > +
> > + if (!(BIT(as) & ptdev->gpu_info.as_present)) {
> > + struct panthor_vm *lru_vm;
> > +
> > + lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list,
> > + struct panthor_vm,
> > + as.lru_node);
> > + if (drm_WARN_ON(&ptdev->base, !lru_vm)) {
> > + ret = -EBUSY;
> > + goto out_unlock;
> > + }
> > +
> > + list_del_init(&lru_vm->as.lru_node);
> > + as = lru_vm->as.id;
>
> Should this not set lru_vm->as.id = -1, so that the code knows the VM no
> longer has an address space?
Good catch!
>
> > + } else {
> > + set_bit(as, &ptdev->mmu->as.alloc_mask);
> > + }
> > +
> > + /* Assign the free or reclaimed AS to the FD */
> > + vm->as.id = as;
> > + ptdev->mmu->as.slots[as].vm = vm;
> > +
> > +out_enable_as:
> > + transtab = cfg->arm_lpae_s1_cfg.ttbr;
> > + transcfg = AS_TRANSCFG_PTW_MEMATTR_WB |
> > + AS_TRANSCFG_PTW_RA |
> > + AS_TRANSCFG_ADRMODE_AARCH64_4K;
> > + if (ptdev->coherent)
> > + transcfg |= AS_TRANSCFG_PTW_SH_OS;
> > +
> > + ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr);
> > +
> > +out_unlock:
> > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > + drm_dev_exit(cookie);
> > + return ret;
> > +}
> > +
[...]
> > +
> > +static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
> > +{
> > + status = panthor_mmu_fault_mask(ptdev, status);
> > + while (status) {
> > + u32 as = ffs(status | (status >> 16)) - 1;
> > + u32 mask = panthor_mmu_as_fault_mask(ptdev, as);
> > + u32 new_int_mask;
> > + u64 addr;
> > + u32 fault_status;
> > + u32 exception_type;
> > + u32 access_type;
> > + u32 source_id;
> > +
> > + fault_status = gpu_read(ptdev, AS_FAULTSTATUS(as));
> > + addr = gpu_read(ptdev, AS_FAULTADDRESS_LO(as));
> > + addr |= (u64)gpu_read(ptdev, AS_FAULTADDRESS_HI(as)) << 32;
> > +
> > + /* decode the fault status */
> > + exception_type = fault_status & 0xFF;
> > + access_type = (fault_status >> 8) & 0x3;
> > + source_id = (fault_status >> 16);
> > +
> > + /* Page fault only */
>
> This comment makes no sense - it looks like it's copied over from panfrost.
Uh, it made sense before I dropped map/alloc-on-fault :-).
>
> If I understand correctly we don't (currently) support growing on page
> fault - and it's not really needed now the MCU can handle the tiler heaps.
Exaclty. Map/alloc on fault is a bit challenging because of the whole
'we have to guarantee that a job is done in finite time, and we must
make sure fence signaling is not blocked on allocation'. Given
drm_gem_get_pages() doesn't do non-blocking allocations, I thought it'd
be preferable to postpone map-on-fault until we actually decide we need
it. Note that i915 seems to have some sort of non-blocking page
allocator in shmem_sg_alloc_table()[1].
>
> > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > +
> > + new_int_mask =
> > + panthor_mmu_fault_mask(ptdev, ~ptdev->mmu->as.faulty_mask);
> > +
> > + /* terminal fault, print info about the fault */
> > + drm_err(&ptdev->base,
> > + "Unhandled Page fault in AS%d at VA 0x%016llX\n"
> > + "raw fault status: 0x%X\n"
> > + "decoded fault status: %s\n"
> > + "exception type 0x%X: %s\n"
> > + "access type 0x%X: %s\n"
> > + "source id 0x%X\n",
> > + as, addr,
> > + fault_status,
> > + (fault_status & (1 << 10) ? "DECODER FAULT" : "SLAVE FAULT"),
> > + exception_type, panthor_exception_name(ptdev, exception_type),
> > + access_type, access_type_name(ptdev, fault_status),
> > + source_id);
> > +
> > + /* Ignore MMU interrupts on this AS until it's been
> > + * re-enabled.
> > + */
> > + ptdev->mmu->irq.mask = new_int_mask;
> > + gpu_write(ptdev, MMU_INT_MASK, new_int_mask);
> > +
> > + /* Disable the MMU to kill jobs on this AS. */
> > + panthor_mmu_as_disable(ptdev, as);
> > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > +
> > + status &= ~mask;
> > + }
> > +}
> > +PANTHOR_IRQ_HANDLER(mmu, MMU, panthor_mmu_irq_handler);
> > +
[...]
> > +
> > +/**
> > + * panthor_mmu_unplug() - Unplug the MMU logic
> > + * @ptdev: Device.
> > + *
> > + * No access to the MMU regs should be done after this function is called.
> > + * We suspend the IRQ and disable all VMs to guarantee that.
> > + */
> > +void panthor_mmu_unplug(struct panthor_device *ptdev)
> > +{
> > + if (ptdev->mmu->irq.irq > 0)
>
> In what situation is this not true? AFAICT the driver probe will fail if
> the IRQ can't be obtained.
Right, I'll drop this test.
[1]https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/gem/i915_gem_shmem.c#L63
More information about the dri-devel
mailing list