[PATCH] drm/ttm: fix ttm tt init fail when size exceeds kmalloc limit

Wang, Yang(Kevin) KevinYang.Wang at amd.com
Wed Apr 20 12:39:52 UTC 2022


[AMD Official Use Only]

Hi Chirs,

yes, right, the amdgpu drive rwill use amdgpu_bo_validate_size() function to verify bo size,
but when driver try to allocate VRAM domain bo fail, the amdgpu driver will fall back to allocate domain = (GTT | VRAM)  bo.
please check following code, it will cause the 2nd time to allocate bo fail during allocate 256Mb buffer to store dma address (via kvmalloc()).

        initial_domain = (u32)(0xffffffff & args->in.domains);
retry:
        r = amdgpu_gem_object_create(adev, size, args->in.alignment,
                                     initial_domain,
                                     flags, ttm_bo_type_device, resv, &gobj);
        if (r && r != -ERESTARTSYS) {
                if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
                        flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
                        goto retry;
                }

                if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
                        initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
                        goto retry;
                }
                DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n",
                                size, initial_domain, args->in.alignment, r);
        }

Best Regards,
Kevin

________________________________
From: Christian König <ckoenig.leichtzumerken at gmail.com>
Sent: Wednesday, April 20, 2022 7:55 PM
To: Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; dri-devel at lists.freedesktop.org <dri-devel at lists.freedesktop.org>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH] drm/ttm: fix ttm tt init fail when size exceeds kmalloc limit

Hi Kevin,

no, the test case should already fail in amdgpu_bo_validate_size().

If we have a system with 2TiB of memory where the test case could succeed then we should increase the requested size to something larger.

And if the underlying core Linux kernel functions don't allow allocations as large as the requested one we should *NEVER* ever add workarounds like this.

It is perfectly expected that this test case is not able to fulfill the desired allocation. That it fails during kvmalloc is unfortunate, but not a show stopper.

Regards,
Christian.


Am 20.04.22 um 13:32 schrieb Wang, Yang(Kevin):

[AMD Official Use Only]

Hi Chris,

you misunderstood background about this case.

although we expect this test case to fail, it should fail at the location where the Bo actual memory is actually allocated. now the code logic will cause the failure to allocate memory to store DMA address.

e.g: the case is failed in 2TB system ram machine, it should be allocated successful, but it is failed.

allocate 1TB BO, the ttm should allocate 1TB/4k * 8 buffer to store allocate result (page address), this should not be failed usually.

There is a similar fix in upstream kernel 5.18, before this fix entered the TTM code, this problem existed in TTM.

kernel/git/torvalds/linux.git - Linux kernel source tree<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.18-rc3%26id%3Da421ef303008b0ceee2cfc625c3246fa7654b0ca&data=05%7C01%7CKevinYang.Wang%40amd.com%7C2e9fd86a27d64a39432508da22c4b1f1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860525454702938%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CiP9x3grwQ2aXFZPjpsAtwLCpBVeJufbGngy%2BtXLFbM%3D&reserved=0>
mm: allow !GFP_KERNEL allocations for kvmalloc

Best Regards,
Kevin

________________________________
From: Koenig, Christian <Christian.Koenig at amd.com><mailto:Christian.Koenig at amd.com>
Sent: Wednesday, April 20, 2022 6:53 PM
To: Wang, Yang(Kevin) <KevinYang.Wang at amd.com><mailto:KevinYang.Wang at amd.com>; dri-devel at lists.freedesktop.org<mailto:dri-devel at lists.freedesktop.org> <dri-devel at lists.freedesktop.org><mailto:dri-devel at lists.freedesktop.org>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org><mailto:amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH] drm/ttm: fix ttm tt init fail when size exceeds kmalloc limit

Am 20.04.22 um 11:07 schrieb Wang, Yang(Kevin):

[AMD Official Use Only]


________________________________
From: Koenig, Christian <Christian.Koenig at amd.com><mailto:Christian.Koenig at amd.com>
Sent: Wednesday, April 20, 2022 5:00 PM
To: Wang, Yang(Kevin) <KevinYang.Wang at amd.com><mailto:KevinYang.Wang at amd.com>; dri-devel at lists.freedesktop.org<mailto:dri-devel at lists.freedesktop.org> <dri-devel at lists.freedesktop.org><mailto:dri-devel at lists.freedesktop.org>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org><mailto:amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH] drm/ttm: fix ttm tt init fail when size exceeds kmalloc limit

Am 20.04.22 um 10:56 schrieb Yang Wang:
> if the __GFP_ZERO is set, the kvmalloc() can't fallback to use vmalloc()

Hui what? Why should kvmalloc() not be able to fallback to vmalloc()
when __GFP_ZERO is set?

And even that is really the case then that sounds like a bug in kvmalloc().

Regards,
Christian.

[kevin]:
it is really test case from libdrm amdgpu test, which try to allocate a big BO which will cause ttm tt init fail.


LOL! Guys, this test case is intended to fail!

The test consists of allocating a buffer so ridiculous large that it should never succeed and be rejected by the kernel driver.

This patch here is a really clear NAK.

Regards,
Christian.

it may be a kvmalloc() bug, and this patch can as a workaround in ttm before this issue fix.

void *kvmalloc_node(size_t size, gfp_t flags, int node)
{
...
        if ((flags & GFP_KERNEL) != GFP_KERNEL)
                return kmalloc_node(size, flags, node);
...
}

Best Regards,
Kevin

> to allocate memory, when request size is exceeds kmalloc limit, it will
> cause allocate memory fail.
>
> e.g: when ttm want to create a BO with 1TB size, it maybe fail.
>
> Signed-off-by: Yang Wang <KevinYang.Wang at amd.com><mailto:KevinYang.Wang at amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_tt.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 79c870a3bef8..9f2f3e576b8d 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -97,9 +97,12 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc)
>   static int ttm_tt_alloc_page_directory(struct ttm_tt *ttm)
>   {
>        ttm->pages = kvmalloc_array(ttm->num_pages, sizeof(void*),
> -                     GFP_KERNEL | __GFP_ZERO);
> +                                 GFP_KERNEL);
>        if (!ttm->pages)
>                return -ENOMEM;
> +
> +     memset(ttm->pages, 0, ttm->num_pages * sizeof(void *));
> +
>        return 0;
>   }
>
> @@ -108,10 +111,12 @@ static int ttm_dma_tt_alloc_page_directory(struct ttm_tt *ttm)
>        ttm->pages = kvmalloc_array(ttm->num_pages,
>                                    sizeof(*ttm->pages) +
>                                    sizeof(*ttm->dma_address),
> -                                 GFP_KERNEL | __GFP_ZERO);
> +                                 GFP_KERNEL);
>        if (!ttm->pages)
>                return -ENOMEM;
>
> +     memset(ttm->pages, 0, ttm->num_pages * (sizeof(*ttm->pages) + sizeof(*ttm->dma_address)));
> +
>        ttm->dma_address = (void *)(ttm->pages + ttm->num_pages);
>        return 0;
>   }
> @@ -120,9 +125,12 @@ static int ttm_sg_tt_alloc_page_directory(struct ttm_tt *ttm)
>   {
>        ttm->dma_address = kvmalloc_array(ttm->num_pages,
>                                          sizeof(*ttm->dma_address),
> -                                       GFP_KERNEL | __GFP_ZERO);
> +                                       GFP_KERNEL);
>        if (!ttm->dma_address)
>                return -ENOMEM;
> +
> +     memset(ttm->dma_address, 0, ttm->num_pages * sizeof(*ttm->dma_address));
> +
>        return 0;
>   }
>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20220420/06d6e0ac/attachment-0001.htm>


More information about the amd-gfx mailing list