[Intel-xe] [PATCH v4 4/6] drm/xe/buddy: remove the virtualized start

Fri Mar 17 13:02:43 UTC 2023

On 17/03/2023 11:52, Gwan-gyeong Mun wrote:
> 
> 
> On 3/14/23 10:58 AM, Matthew Auld wrote:
>> Hopefully not needed anymore. We can add a .compatible() hook once we
>> need to differentiate between mappable and non-mappable vram. If the
>> allocation is not contiguous then the start value is kind of
>> meaningless, so rather just mark as invalid.
>>
>> In upstream, TTM wants to eventually remove the ttm_resource.start
>> usage.
> Hi Matt,
> If there is a related rfc or discussion, could you please share a 
> reference?

I think it was maybe discussed off list. But there have been patches 
that try to move in that general direction like: 544432703b2f ("drm/ttm: 
Add new callbacks to ttm res mgr"). Which gets rid of the main user in 
ttm for base.start (drivers still make use of it though). Better is to 
keep it at the resource manager level. The "virtualized" start here is 
basically trying to workaround the fact that [res.start, res.size] can't 
really represent a non-contiguous allocations.

For example, if you look at that patch, ttm was using [res.start, 
res.size] to help determine if the existing resource was compatible with 
the new placement. Basically if you want to move a resource from A -> B, 
where both are the same VRAM, without the above it would always turn the 
move into a noop, since A & B are the same resource type, and are 
therefore "compatible".

However with small-bar, you also have mappable vs non-mappable VRAM, and 
you might want to move between them, so the "virtualized" start ensures 
that [res.start, res.end] <= mappable size, even for non-contiguous 
allocations, if the entire resource is mappable, and vice versa. With 
the .compatible() hook that is all pushed into the resource manager, so 
we don't need such tricks. See also the i915 equivalent: da6198afb01d 
("drm/i915/ttm: remove the virtualized start hack").

>>
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.c           |  6 ++++++
>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 29 ++++++++++++++--------------
>>   2 files changed, 20 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index 73a7f2cd4ad8..346573f406c2 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -662,6 +662,12 @@ static int xe_bo_move(struct ttm_buffer_object 
>> *ttm_bo, bool evict,
>>                   void *new_addr = gt->mem.vram.mapping +
>>                       (new_mem->start << PAGE_SHIFT);
>> +                if (XE_WARN_ON(new_mem->start == 
>> XE_BO_INVALID_OFFSET)) {
> Are there really any use cases for moving to memory allocated with 
> xe_ttm_gtt_mgr_func()?

The above is only for pinned VRAM objects, which are always contiguous, 
and therefore have a valid start value. I just added the above sanity 
check to double check that, and make the error more obvious if we ever 
trigger it.

> 
> Br,
> G.G.
>> +                    ret = -EINVAL;
>> +                    xe_device_mem_access_put(xe);
>> +                    goto out;
>> +                }
>> +
>>                   XE_BUG_ON(new_mem->start !=
>>                         bo->placements->fpfn);
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c 
>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> index 159ca7105df1..bafcadaed6b0 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> @@ -54,7 +54,6 @@ static int xe_ttm_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>>       struct xe_ttm_vram_mgr_resource *vres;
>>       u64 size, remaining_size, lpfn, fpfn;
>>       struct drm_buddy *mm = &mgr->mm;
>> -    struct drm_buddy_block *block;
>>       unsigned long pages_per_block;
>>       int r;
>> @@ -186,24 +185,24 @@ static int xe_ttm_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>>               list_splice_tail(trim_list, &vres->blocks);
>>       }
>> -    vres->base.start = 0;
>> -    list_for_each_entry(block, &vres->blocks, link) {
>> -        unsigned long start;
>> +    if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
>> +        xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
>> +        vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>> -        start = drm_buddy_block_offset(block) +
>> -            drm_buddy_block_size(mm, block);
>> -        start >>= PAGE_SHIFT;
>> +    /*
>> +     * For some kernel objects we still rely on the start when io 
>> mapping
>> +     * the object.
>> +     */
>> +    if (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) {
>> +        struct drm_buddy_block *block = list_first_entry(&vres->blocks,
>> +                                 typeof(*block),
>> +                                 link);
>> -        if (start > PFN_UP(vres->base.size))
>> -            start -= PFN_UP(vres->base.size);
>> -        else
>> -            start = 0;
>> -        vres->base.start = max(vres->base.start, start);
>> +        vres->base.start = drm_buddy_block_offset(block) >> PAGE_SHIFT;
>> +    } else {
>> +        vres->base.start = XE_BO_INVALID_OFFSET;
>>       }
>> -    if (xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
>> -        vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>> -
>>       *res = &vres->base;
>>       return 0;