[PATCH 1/2] drm/amdgpu: optimize VRAM allocation when using drm buddy

Wed Feb 15 02:51:02 UTC 2023

For public review

> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: Wednesday, February 15, 2023 3:02 AM
> To: Xiao, Shane <shane.xiao at amd.com>; Paneer Selvam, Arunpravin
> <Arunpravin.PaneerSelvam at amd.com>
> Subject: Re: [PATCH 1/2] drm/amdgpu: optimize VRAM allocation when using
> drm buddy
> 
> Am 14.02.23 um 15:53 schrieb Xiao, Shane:
> >> -----Original Message-----
> >> From: Koenig, Christian <Christian.Koenig at amd.com>
> >> Sent: Tuesday, February 14, 2023 8:41 PM
> >> To: Xiao, Shane <shane.xiao at amd.com>; brahma_sw_dev
> >> <brahma_sw_dev at amd.com>
> >> Cc: Paneer Selvam, Arunpravin <Arunpravin.PaneerSelvam at amd.com>
> >> Subject: Re: [PATCH 1/2] drm/amdgpu: optimize VRAM allocation when
> >> using drm buddy
> >>
> >> Am 14.02.23 um 12:18 schrieb Shane Xiao:
> >>> Since the VRAM manager changed from drm mm to drm buddy. It's not
> >>> necessary to allocate 2MB aligned VRAM for more than 2MB unaligned
> >>> size, and then do trim. This method improves the allocation
> >>> efficiency and reduces memory fragmentation.
> >> Well that is a trade off.
> >>
> >> Allocating the BO as one contiguous chunk and then trimming is
> >> beneficial because if we then later need it contiguous we don't need
> >> to re-allocate and copy. This can be needed to display something for
> example.

Hi Christian,

This case means that you allocate BO that is unnecessary to be continuous at first time,
and latter the BO should be continuous. I'm not familiar with display. Could you give me 
a few more specific examples ?

> > Yes, I agree that one contiguous chunk may get beneficial sometimes.
> > But as far as I know, you cannot guarantee that amdgpu_vram_mgr_new
> can get one contiguous chunk  if you don't set TTM_PL_FLAG_CONTIGUOUS
> flags.
> > For example, if you want to allocate 4M+4K BO, it will allocate one 4M block
> + one 2M block which is unnecessary to be continuous, then 2M block will be
> trimmed.
> 
> Oh, that's indeed not something which should happen. Sounds more like a
> bug fix then.

Yes, I think this case should not be happened. 
Actually, I'm not sure that why the allocated BO should be aligned with pages_per_block, which is set to 2MB by default.
Does this help improve performance when allocating 2M or above BO?
From my point of view, the TLB may be one of reason of this. But I'm not sure about this.

Best Regards,
Shane 

> 
> >
> >> On the other hand I completely agree allocating big and then trimming
> >> creates more fragmentation than necessary.
> >>
> >> Do you have some test case which can show the difference?
> > I have use rocrtst to show the difference.
> > The attachment is shown that after applying this patch, the order < 9 total
> vram size decrease from 99MB to 43MB.
> > And the latter has more higher order block memory.
> 
> Arun can you take a look? That problem here sounds important.
> 
> Thanks,
> Christian.
> 
> >
> >> BTW: No need to discuss that on the internal mailing list, please use
> >> the public one instead.
> >>
> > I will send it to public. Thank you for your remind.
> >
> > Best Regards,
> > Shane
> >
> >> Regards,
> >> Christian.
> >>
> >>> Signed-off-by: Shane Xiao <shane.xiao at amd.com>
> >>> ---
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
> >>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> >>> index 75c80c557b6e..3fea58f9427c 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> >>> @@ -453,7 +453,7 @@ static int amdgpu_vram_mgr_new(struct
> >> ttm_resource_manager *man,
> >>>    		/* Limit maximum size to 2GiB due to SG table limitations */
> >>>    		size = min(remaining_size, 2ULL << 30);
> >>>
> >>> -		if (size >= (u64)pages_per_block << PAGE_SHIFT)
> >>> +		if (!(size % ((u64)pages_per_block << PAGE_SHIFT)))
> >>>    			min_block_size = (u64)pages_per_block <<
> >> PAGE_SHIFT;
> >>>    		cur_size = size;