[Intel-xe] [PATCH v4 4/6] drm/xe/buddy: remove the virtualized start

Mon Mar 20 09:03:32 UTC 2023

On 20/03/2023 08:44, Gwan-gyeong Mun wrote:
> 
> 
> On 3/17/23 3:02 PM, Matthew Auld wrote:
>> On 17/03/2023 11:52, Gwan-gyeong Mun wrote:
>>>
>>>
>>> On 3/14/23 10:58 AM, Matthew Auld wrote:
>>>> Hopefully not needed anymore. We can add a .compatible() hook once we
>>>> need to differentiate between mappable and non-mappable vram. If the
>>>> allocation is not contiguous then the start value is kind of
>>>> meaningless, so rather just mark as invalid.
>>>>
>>>> In upstream, TTM wants to eventually remove the ttm_resource.start
>>>> usage.
>>> Hi Matt,
>>> If there is a related rfc or discussion, could you please share a 
>>> reference?
>>
>> I think it was maybe discussed off list. But there have been patches 
>> that try to move in that general direction like: 544432703b2f 
>> ("drm/ttm: Add new callbacks to ttm res mgr"). Which gets rid of the 
>> main user in ttm for base.start (drivers still make use of it though). 
>> Better is to keep it at the resource manager level. The "virtualized" 
>> start here is basically trying to workaround the fact that [res.start, 
>> res.size] can't really represent a non-contiguous allocations.
>>
>> For example, if you look at that patch, ttm was using [res.start, 
>> res.size] to help determine if the existing resource was compatible 
>> with the new placement. Basically if you want to move a resource from 
>> A -> B, where both are the same VRAM, without the above it would 
>> always turn the move into a noop, since A & B are the same resource 
>> type, and are therefore "compatible".
>>
>> However with small-bar, you also have mappable vs non-mappable VRAM, 
>> and you might want to move between them, so the "virtualized" start 
>> ensures that [res.start, res.end] <= mappable size, even for 
>> non-contiguous allocations, if the entire resource is mappable, and 
>> vice versa. With the .compatible() hook that is all pushed into the 
>> resource manager, so we don't need such tricks. See also the i915 
>> equivalent: da6198afb01d ("drm/i915/ttm: remove the virtualized start 
>> hack").
>>
> It seems that the part that is difficult to fully understand with this 
> patch alone is explained well. Thank you for a detailed description.
> 
> And referring to the explanation, the commit where the use of 
> ttm_resource.start in ttm_resource_compatible() was deleted was 
> confirmed in the commit below.
> commit 6d3c900c12d72667341bcff338c252e22728b942 ("drm/ttm: Switch to 
> using the new res callback")
> 
> Not directly related to this patch, contiguous allocation (using 
> TTM_PL_FLAG_CONTIGUOUS flag) allocates resources from one place?

TTM_PL_FLAG_CONTIGUOUS tells us that the underlying memory for the 
resource is one contiguous physical area. It could still be multiple 
nodes underneath, but in that case they should always be correctly 
ordered by smallest offset.

In the case of the buddy, it just does a roundup_power_of_two() and then 
trims the size down, to avoid any wastage, since the buddy only accepts 
a power-of-two size for block sizes (and that's kind of what it's 
optimised for). Also if the allocation happens to be contiguous by fluke 
(caller didn't specify), then we also annotate the resource as such, 
with the hope that the upper layers can do something useful with that.

> 
> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun at intel.com>

Thanks.

>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>>>> ---
>>>>   drivers/gpu/drm/xe/xe_bo.c           |  6 ++++++
>>>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 29 
>>>> ++++++++++++++--------------
>>>>   2 files changed, 20 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>>> index 73a7f2cd4ad8..346573f406c2 100644
>>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>>> @@ -662,6 +662,12 @@ static int xe_bo_move(struct ttm_buffer_object 
>>>> *ttm_bo, bool evict,
>>>>                   void *new_addr = gt->mem.vram.mapping +
>>>>                       (new_mem->start << PAGE_SHIFT);
>>>> +                if (XE_WARN_ON(new_mem->start == 
>>>> XE_BO_INVALID_OFFSET)) {
>>> Are there really any use cases for moving to memory allocated with 
>>> xe_ttm_gtt_mgr_func()?
>>
>> The above is only for pinned VRAM objects, which are always 
>> contiguous, and therefore have a valid start value. I just added the 
>> above sanity check to double check that, and make the error more 
>> obvious if we ever trigger it.
>> yes, that makes sense.
>>>
>>> Br,
>>> G.G.
>>>> +                    ret = -EINVAL;
>>>> +                    xe_device_mem_access_put(xe);
>>>> +                    goto out;
>>>> +                }
>>>> +
>>>>                   XE_BUG_ON(new_mem->start !=
>>>>                         bo->placements->fpfn);
>>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c 
>>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>>> index 159ca7105df1..bafcadaed6b0 100644
>>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>>> @@ -54,7 +54,6 @@ static int xe_ttm_vram_mgr_new(struct 
>>>> ttm_resource_manager *man,
>>>>       struct xe_ttm_vram_mgr_resource *vres;
>>>>       u64 size, remaining_size, lpfn, fpfn;
>>>>       struct drm_buddy *mm = &mgr->mm;
>>>> -    struct drm_buddy_block *block;
>>>>       unsigned long pages_per_block;
>>>>       int r;
>>>> @@ -186,24 +185,24 @@ static int xe_ttm_vram_mgr_new(struct 
>>>> ttm_resource_manager *man,
>>>>               list_splice_tail(trim_list, &vres->blocks);
>>>>       }
>>>> -    vres->base.start = 0;
>>>> -    list_for_each_entry(block, &vres->blocks, link) {
>>>> -        unsigned long start;
>>>> +    if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
>>>> +        xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
>>>> +        vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>>>> -        start = drm_buddy_block_offset(block) +
>>>> -            drm_buddy_block_size(mm, block);
>>>> -        start >>= PAGE_SHIFT;
>>>> +    /*
>>>> +     * For some kernel objects we still rely on the start when io 
>>>> mapping
>>>> +     * the object.
>>>> +     */
>>>> +    if (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) {
>>>> +        struct drm_buddy_block *block = 
>>>> list_first_entry(&vres->blocks,
>>>> +                                 typeof(*block),
>>>> +                                 link);
>>>> -        if (start > PFN_UP(vres->base.size))
>>>> -            start -= PFN_UP(vres->base.size);
>>>> -        else
>>>> -            start = 0;
>>>> -        vres->base.start = max(vres->base.start, start);
>>>> +        vres->base.start = drm_buddy_block_offset(block) >> 
>>>> PAGE_SHIFT;
>>>> +    } else {
>>>> +        vres->base.start = XE_BO_INVALID_OFFSET;
>>>>       }
>>>> -    if (xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
>>>> -        vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>>>> -
>>>>       *res = &vres->base;
>>>>       return 0;