[Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage

Matthew Auld matthew.auld at intel.com
Wed Aug 31 14:53:28 UTC 2022


On 31/08/2022 14:34, Christian König wrote:
> Am 31.08.22 um 14:50 schrieb Matthew Auld:
>> On 31/08/2022 13:35, Christian König wrote:
>>> Am 31.08.22 um 14:06 schrieb Matthew Auld:
>>>> On 31/08/2022 12:03, Christian König wrote:
>>>>> Am 31.08.22 um 12:37 schrieb Matthew Auld:
>>>>>> [SNIP]
>>>>>>>>
>>>>>>>> That hopefully just leaves i915_ttm_shrink(), which is swapping 
>>>>>>>> out shmem ttm_tt and is calling ttm_bo_validate() with empty 
>>>>>>>> placements to force the pipeline-gutting path, which importantly 
>>>>>>>> unpopulates the ttm_tt for us (since ttm_tt_unpopulate is not 
>>>>>>>> exported it seems). But AFAICT it looks like that will now also 
>>>>>>>> nuke the bo->resource, instead of just leaving it in system 
>>>>>>>> memory. My assumption is that when later calling 
>>>>>>>> ttm_bo_validate(), it will just do the bo_move_null() in 
>>>>>>>> i915_ttm_move(), instead of re-populating the ttm_tt and then 
>>>>>>>> potentially copying it back to local-memory?
>>>>>>>
>>>>>>> Well you do ttm_bo_validate() with something like GTT domain, 
>>>>>>> don't you? This should result in re-populating the tt object, but 
>>>>>>> I'm not 100% sure if that really works as expected.
>>>>>>
>>>>>> AFAIK for domains we either have system memory (which uses ttm_tt 
>>>>>> and might be shmem underneath) or local-memory. But perhaps i915 
>>>>>> is doing something wrong here, or abusing TTM in some way. I'm not 
>>>>>> sure tbh.
>>>>>>
>>>>>> Anyway, I think we have two cases here:
>>>>>>
>>>>>> - We have some system memory only object. After doing 
>>>>>> i915_ttm_shrink(), bo->resource is now NULL. We then call 
>>>>>> ttm_bo_validate() at some later point, but here we don't need to 
>>>>>> copy anything, but it also looks like ttm_bo_handle_move_mem() 
>>>>>> won't populate the ttm_tt or us either, since mem_type == 
>>>>>> TTM_PL_SYSTEM. It looks like i915_ttm_move() was taking care of 
>>>>>> this, but now we just call ttm_bo_move_null().
>>>>>>
>>>>>> - We have a local-memory only object, which was evicted to shmem, 
>>>>>> and then swapped out by the shrinker like above. The bo->resource 
>>>>>> is NULL. However this time when calling ttm_bo_validate() we need 
>>>>>> to actually do a copy in i915_ttm_move(), as well as re-populate 
>>>>>> the ttm_tt. i915_ttm_move() was taking care of this, but now we 
>>>>>> just call ttm_bo_move_null().
>>>>>>
>>>>>> Perhaps i915 is doing something wrong in the above two cases?
>>>>>
>>>>> Mhm, as far as I can see that should still work.
>>>>>
>>>>> See previously you should got a transition from SYSTEM->GTT in 
>>>>> i915_ttm_move() to re-create your backing store. Not you get 
>>>>> NULL->SYSTEM which is handled by ttm_bo_move_null() and then 
>>>>> SYSTEM->GTT.
>>>>
>>>> What is GTT here in TTM world? Also I'm not seeing where there is 
>>>> this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, i915 
>>>> is only calling ttm_bo_validate() once when acquiring the pages, and 
>>>> we don't call it again, unless it was evicted (and potentially 
>>>> swapped out).
>>>
>>> Well GTT means TTM_PL_TT.
>>>
>>> And calling it only once is perfectly fine, TTM will internally see 
>>> that we need two hops to reach TTM_PL_TT and so does the NULL->SYSTEM 
>>> transition and then SYSTEM->TT.
>>
>> Ah interesting, so that's what the multi-hop thing does. But AFAICT 
>> i915 is not using either TTM_PL_TT or -EMULTIHOP.
> 
> Mhm, it could be that we then have a problem and the i915 driver only 
> sees NULL->TT directly. But I really don't know the i915 driver code 
> good enough to judge that.
> 
> Can you take a look at this and test it maybe?

I'll grab a machine and try to see what is going on here.

> 
>>
>> Also what is the difference between TTM_PL_TT and TM_PL_SYSTEM? When 
>> should you use one over the other?
> 
> TTM_PL_SYSTEM means the device is not accessing the buffer and TTM has 
> the control over the backing store and can swapout/swapin as it wants it.
> 
> TTM_PL_TT means that the device is accessing the data (TT stands for 
> translation table) and so TTM can't swap the backing store in/out.
> 
> TTM_PL_VRAM well that one is obvious.

Thanks for the explanation. So it looks like i915 is using TTM_PL_SYSTEM 
even for device access it seems.

> 
> Thanks,
> Christian.
> 
>>
>>>
>>> As far as I can see that should work like it did before.
>>>
>>> Christian.
>>>
>>>>
>>>>>
>>>>> If you just validated to SYSTEM memory before I think the tt object 
>>>>> wouldn't have been populated either.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've been considering to replacing the ttm_bo_type with a bunch 
>>>>>>>>> of behavior flags for a bo. I'm hoping that this will clean 
>>>>>>>>> things up a bit.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>       caching = i915_ttm_select_tt_caching(obj);
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
>>>>>>>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>>>>>>>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>>>>>>>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct 
>>>>>>>>>>>> ttm_buffer_object *bo, bool evict,
>>>>>>>>>>>>       bool clear;
>>>>>>>>>>>>       int ret;
>>>>>>>>>>>> -    if (GEM_WARN_ON(!obj)) {
>>>>>>>>>>>> +    if (GEM_WARN_ON(!obj) || !bo->resource) {
>>>>>>>>>>>>           ttm_bo_move_null(bo, dst_mem);
>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
> 


More information about the Intel-gfx mailing list