[PATCH] drm/ttm: fix ttm tt init fail when size exceeds kmalloc limit
Lazar, Lijo
lijo.lazar at amd.com
Wed Apr 20 13:23:15 UTC 2022
On 4/20/2022 6:26 PM, Christian König wrote:
> Am 20.04.22 um 14:54 schrieb Wang, Yang(Kevin):
>>
>> [AMD Official Use Only]
>>
>>
>> Hi Chris,
>>
>> 1) Change the test case to use something larger than 1TiB.
>> sure, we can increase the size of BO and make test pass,
>> but if user really want to allocate 1TB GTT BO, we have no reason to
>> let it fail? right?
>
> No, the reason is the underlying core kernel doesn't allow kvmalloc
> allocations with GFP_ZERO which are large enough to hold the array of
> allocated pages for this.
>
> We are working on top of the core Linux kernel and should *NEVER* ever
> add workarounds like what was suggested here. >
AFAIU, for the purpose of ttm use, fallback to vmalloc is fine.
* Please note that any use of gfp flags outside of GFP_KERNEL is
careful to not
* fall back to vmalloc.
*
Actually the current implementation documents the behavior, but it is
deep inside the implementation to be noticeable - at least not obvious
while using kvmalloc_array.
Thanks,
Lijo
> Regards,
> Christian.
>
>> the system availed memory about 2T, but it will still fail.
>>
>> 2) Change kvmalloc to allow GFP_ZERO allocations even in the vmalloc
>> fallback path.
>> the 5.18 kernel will add this patch to fix this issue .
>>
>> Best Regards,
>> Kevin
>> ------------------------------------------------------------------------
>> *From:* Koenig, Christian <Christian.Koenig at amd.com>
>> *Sent:* Wednesday, April 20, 2022 8:42 PM
>> *To:* Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; Christian König
>> <ckoenig.leichtzumerken at gmail.com>; dri-devel at lists.freedesktop.org
>> <dri-devel at lists.freedesktop.org>; amd-gfx at lists.freedesktop.org
>> <amd-gfx at lists.freedesktop.org>
>> *Subject:* Re: [PATCH] drm/ttm: fix ttm tt init fail when size exceeds
>> kmalloc limit
>> Hi Kevin,
>>
>> yes and that is perfectly valid and expected behavior. There is
>> absolutely no need to change anything in TTM here.
>>
>> What we could do is:
>> 1) Change the test case to use something larger than 1TiB.
>> 2) Change kvmalloc to allow GFP_ZERO allocations even in the vmalloc
>> fallback path.
>>
>> Regards,
>> Christian.
>>
>> Am 20.04.22 um 14:39 schrieb Wang, Yang(Kevin):
>>>
>>> [AMD Official Use Only]
>>>
>>>
>>> Hi Chirs,
>>>
>>> yes, right, the amdgpu drive rwill use amdgpu_bo_validate_size()
>>> function to verify bo size,
>>> but when driver try to allocate VRAM domain bo fail, the amdgpu
>>> driver will fall back to allocate domain = (GTT | VRAM) bo.
>>> please check following code, it will cause the 2nd time to allocate
>>> bo fail during allocate 256Mb buffer to store dma address (via
>>> kvmalloc()).
>>>
>>> initial_domain = (u32)(0xffffffff & args->in.domains);
>>> retry:
>>> r = amdgpu_gem_object_create(adev, size, args->in.alignment,
>>> initial_domain,
>>> flags, ttm_bo_type_device, resv, &gobj);
>>> if (r && r != -ERESTARTSYS) {
>>> if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>>> flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>>> goto retry;
>>> }
>>>
>>> if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>>> initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>>> goto retry;
>>> }
>>> DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n",
>>> size, initial_domain, args->in.alignment, r);
>>> }
>>>
>>> Best Regards,
>>> Kevin
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Christian König <ckoenig.leichtzumerken at gmail.com>
>>> <mailto:ckoenig.leichtzumerken at gmail.com>
>>> *Sent:* Wednesday, April 20, 2022 7:55 PM
>>> *To:* Wang, Yang(Kevin) <KevinYang.Wang at amd.com>
>>> <mailto:KevinYang.Wang at amd.com>; Koenig, Christian
>>> <Christian.Koenig at amd.com> <mailto:Christian.Koenig at amd.com>;
>>> dri-devel at lists.freedesktop.org
>>> <mailto:dri-devel at lists.freedesktop.org>
>>> <dri-devel at lists.freedesktop.org>
>>> <mailto:dri-devel at lists.freedesktop.org>;
>>> amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
>>> <amd-gfx at lists.freedesktop.org> <mailto:amd-gfx at lists.freedesktop.org>
>>> *Subject:* Re: [PATCH] drm/ttm: fix ttm tt init fail when size
>>> exceeds kmalloc limit
>>> Hi Kevin,
>>>
>>> no, the test case should already fail in amdgpu_bo_validate_size().
>>>
>>> If we have a system with 2TiB of memory where the test case could
>>> succeed then we should increase the requested size to something larger.
>>>
>>> And if the underlying core Linux kernel functions don't allow
>>> allocations as large as the requested one we should *NEVER* ever add
>>> workarounds like this.
>>>
>>> It is perfectly expected that this test case is not able to fulfill
>>> the desired allocation. That it fails during kvmalloc is unfortunate,
>>> but not a show stopper.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>> Am 20.04.22 um 13:32 schrieb Wang, Yang(Kevin):
>>>>
>>>> [AMD Official Use Only]
>>>>
>>>>
>>>> Hi Chris,
>>>>
>>>> you misunderstood background about this case.
>>>>
>>>> although we expect this test case to fail, it should fail at the
>>>> location where the Bo actual memory is actually allocated. now the
>>>> code logic will cause the failure to allocate memory to store DMA
>>>> address.
>>>>
>>>> e.g: the case is failed in 2TB system ram machine, it should be
>>>> allocated successful, but it is failed.
>>>>
>>>> allocate 1TB BO, the ttm should allocate 1TB/4k * 8 buffer to store
>>>> allocate result (page address), this should not be failed usually.
>>>>
>>>> There is a similar fix in upstream kernel 5.18, before this fix
>>>> entered the TTM code, this problem existed in TTM.
>>>>
>>>> kernel/git/torvalds/linux.git - Linux kernel source tree
>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.18-rc3%26id%3Da421ef303008b0ceee2cfc625c3246fa7654b0ca&data=05%7C01%7Clijo.lazar%40amd.com%7C908fad4b756248e06e5e08da22cd4463%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860562263637656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=34hU%2FcxcRgBiiZ2jdNxTVmU7t4JF7TR27hx1b119U9g%3D&reserved=0>
>>>> mm: allow !GFP_KERNEL allocations for kvmalloc
>>>>
>>>> Best Regards,
>>>> Kevin
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* Koenig, Christian <Christian.Koenig at amd.com>
>>>> <mailto:Christian.Koenig at amd.com>
>>>> *Sent:* Wednesday, April 20, 2022 6:53 PM
>>>> *To:* Wang, Yang(Kevin) <KevinYang.Wang at amd.com>
>>>> <mailto:KevinYang.Wang at amd.com>; dri-devel at lists.freedesktop.org
>>>> <mailto:dri-devel at lists.freedesktop.org>
>>>> <dri-devel at lists.freedesktop.org>
>>>> <mailto:dri-devel at lists.freedesktop.org>;
>>>> amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
>>>> <amd-gfx at lists.freedesktop.org> <mailto:amd-gfx at lists.freedesktop.org>
>>>> *Subject:* Re: [PATCH] drm/ttm: fix ttm tt init fail when size
>>>> exceeds kmalloc limit
>>>> Am 20.04.22 um 11:07 schrieb Wang, Yang(Kevin):
>>>>>
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> *From:* Koenig, Christian <Christian.Koenig at amd.com>
>>>>> <mailto:Christian.Koenig at amd.com>
>>>>> *Sent:* Wednesday, April 20, 2022 5:00 PM
>>>>> *To:* Wang, Yang(Kevin) <KevinYang.Wang at amd.com>
>>>>> <mailto:KevinYang.Wang at amd.com>; dri-devel at lists.freedesktop.org
>>>>> <mailto:dri-devel at lists.freedesktop.org>
>>>>> <dri-devel at lists.freedesktop.org>
>>>>> <mailto:dri-devel at lists.freedesktop.org>;
>>>>> amd-gfx at lists.freedesktop.org
>>>>> <mailto:amd-gfx at lists.freedesktop.org>
>>>>> <amd-gfx at lists.freedesktop.org> <mailto:amd-gfx at lists.freedesktop.org>
>>>>> *Subject:* Re: [PATCH] drm/ttm: fix ttm tt init fail when size
>>>>> exceeds kmalloc limit
>>>>> Am 20.04.22 um 10:56 schrieb Yang Wang:
>>>>> > if the __GFP_ZERO is set, the kvmalloc() can't fallback to use
>>>>> vmalloc()
>>>>>
>>>>> Hui what? Why should kvmalloc() not be able to fallback to vmalloc()
>>>>> when __GFP_ZERO is set?
>>>>>
>>>>> And even that is really the case then that sounds like a bug in
>>>>> kvmalloc().
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> [kevin]:
>>>>> it is really test case from libdrm amdgpu test, which try to
>>>>> allocate a big BO which will cause ttm tt init fail.
>>>>
>>>>
>>>> LOL! Guys, this test case is intended to fail!
>>>> *
>>>> *The test consists of allocating a buffer so ridiculous large that
>>>> it should never succeed and be rejected by the kernel driver.
>>>>
>>>> This patch here is a really clear NAK.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> it may be a kvmalloc() bug, and this patch can as a workaround
>>>>> in ttm before this issue fix.
>>>>>
>>>>> void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>>>> {
>>>>> ...
>>>>> if ((flags & GFP_KERNEL) != GFP_KERNEL)
>>>>> return kmalloc_node(size, flags, node);
>>>>> ...
>>>>> }
>>>>>
>>>>> Best Regards,
>>>>> Kevin
>>>>>
>>>>> > to allocate memory, when request size is exceeds kmalloc limit,
>>>>> it will
>>>>> > cause allocate memory fail.
>>>>> >
>>>>> > e.g: when ttm want to create a BO with 1TB size, it maybe fail.
>>>>> >
>>>>> > Signed-off-by: Yang Wang <KevinYang.Wang at amd.com>
>>>>> <mailto:KevinYang.Wang at amd.com>
>>>>> > ---
>>>>> > drivers/gpu/drm/ttm/ttm_tt.c | 14 +++++++++++---
>>>>> > 1 file changed, 11 insertions(+), 3 deletions(-)
>>>>> >
>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> > index 79c870a3bef8..9f2f3e576b8d 100644
>>>>> > --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> > @@ -97,9 +97,12 @@ int ttm_tt_create(struct ttm_buffer_object
>>>>> *bo, bool zero_alloc)
>>>>> > static int ttm_tt_alloc_page_directory(struct ttm_tt *ttm)
>>>>> > {
>>>>> > ttm->pages = kvmalloc_array(ttm->num_pages, sizeof(void*),
>>>>> > - GFP_KERNEL | __GFP_ZERO);
>>>>> > + GFP_KERNEL);
>>>>> > if (!ttm->pages)
>>>>> > return -ENOMEM;
>>>>> > +
>>>>> > + memset(ttm->pages, 0, ttm->num_pages * sizeof(void *));
>>>>> > +
>>>>> > return 0;
>>>>> > }
>>>>> >
>>>>> > @@ -108,10 +111,12 @@ static int
>>>>> ttm_dma_tt_alloc_page_directory(struct ttm_tt *ttm)
>>>>> > ttm->pages = kvmalloc_array(ttm->num_pages,
>>>>> > sizeof(*ttm->pages) +
>>>>> > sizeof(*ttm->dma_address),
>>>>> > - GFP_KERNEL | __GFP_ZERO);
>>>>> > + GFP_KERNEL);
>>>>> > if (!ttm->pages)
>>>>> > return -ENOMEM;
>>>>> >
>>>>> > + memset(ttm->pages, 0, ttm->num_pages * (sizeof(*ttm->pages)
>>>>> + sizeof(*ttm->dma_address)));
>>>>> > +
>>>>> > ttm->dma_address = (void *)(ttm->pages + ttm->num_pages);
>>>>> > return 0;
>>>>> > }
>>>>> > @@ -120,9 +125,12 @@ static int
>>>>> ttm_sg_tt_alloc_page_directory(struct ttm_tt *ttm)
>>>>> > {
>>>>> > ttm->dma_address = kvmalloc_array(ttm->num_pages,
>>>>> > sizeof(*ttm->dma_address),
>>>>> > - GFP_KERNEL | __GFP_ZERO);
>>>>> > + GFP_KERNEL);
>>>>> > if (!ttm->dma_address)
>>>>> > return -ENOMEM;
>>>>> > +
>>>>> > + memset(ttm->dma_address, 0, ttm->num_pages *
>>>>> sizeof(*ttm->dma_address));
>>>>> > +
>>>>> > return 0;
>>>>> > }
>>>>> >
>>>>>
>>>>
>>>
>>
>
More information about the amd-gfx
mailing list