[Nouveau] [PATCH 4/6] drm: enable big page mapping for small pages when IOMMU is available

Thu Apr 16 23:26:54 PDT 2015

On Thu, Apr 16, 2015 at 8:06 PM, Vince Hsu <vinceh at nvidia.com> wrote:
> Some platforms have IOMMU to map non-contiguous physical memory into
> contiguous GPU virtual address. We can use this feature to enable big pages
> mapping on scattered small pages. To achieve that, we also need changes in
> subdev/mmu as well.
>
> Signed-off-by: Vince Hsu <vinceh at nvidia.com>
> ---
>  drm/nouveau/nouveau_bo.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/drm/nouveau/nouveau_bo.c b/drm/nouveau/nouveau_bo.c
> index 77326e344dad..da76ee1121e4 100644
> --- a/drm/nouveau/nouveau_bo.c
> +++ b/drm/nouveau/nouveau_bo.c
> @@ -221,6 +221,11 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
>         if (drm->client.vm) {
>                 if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
>                         nvbo->page_shift = drm->client.vm->mmu->lpg_shift;
> +
> +               if ((flags & TTM_PL_FLAG_TT) &&
> +                               drm->client.vm->mmu->iommu_capable &&
> +                               (size % (1 << drm->client.vm->mmu->lpg_shift)) == 0)
> +                       nvbo->page_shift = drm->client.vm->mmu->lpg_shift;

I wonder if we should not just use the same size heuristics as for VRAM above?

Here, unless your buffer size is an exact multiple of the big page
size (128K for GK20A), you will not use big pages at all. In effect,
many buffers will be rejected for this reason. A behavior like "if the
buffer size of more than 256KB, increase the size of the buffer to the
next multiple of 128K and use big pages" would probably yield better
results.

>         }
>
>         nouveau_bo_fixup_align(nvbo, flags, &align, &size);
> @@ -1641,6 +1646,10 @@ nouveau_bo_vma_add(struct nouveau_bo *nvbo, struct nvkm_vm *vm,
>             (nvbo->bo.mem.mem_type == TTM_PL_VRAM ||
>              nvbo->page_shift != vma->vm->mmu->lpg_shift))
>                 nvkm_vm_map(vma, nvbo->bo.mem.mm_node);
> +       else if (nvbo->bo.mem.mem_type == TTM_PL_TT &&
> +               vma->vm->mmu->iommu_capable &&
> +               nvbo->page_shift == vma->vm->mmu->lpg_shift)
> +               nvkm_vm_map(vma, nvbo->bo.mem.mm_node);

Sorry, I don't understand why this is needed, could you explain?