[PATCH 1/2] drm/ttm: cleanup and add TTM_PL_FLAG_TEMPORARY

Mon May 31 12:09:59 UTC 2021

On 5/31/21 2:02 PM, Christian König wrote:
> Am 31.05.21 um 13:19 schrieb Thomas Hellström (Intel):
>>
>> On 5/31/21 12:56 PM, Christian König wrote:
>>> Am 31.05.21 um 12:46 schrieb Thomas Hellström (Intel):
>>>>
>>>> On 5/31/21 12:32 PM, Christian König wrote:
>>>>> Am 31.05.21 um 11:52 schrieb Thomas Hellström (Intel):
>>>>>> Hi, Lang,
>>>>>>
>>>>>> On 5/31/21 10:19 AM, Yu, Lang wrote:
>>>>>>> [Public]
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> On 5/27/21 3:30 AM, Lang Yu wrote:
>>>>>>>>> Make TTM_PL_FLAG_* start from zero and add
>>>>>>>>> TTM_PL_FLAG_TEMPORARY flag for temporary
>>>>>>>>> GTT allocation use.
>>>>>>>> GTT is a driver private acronym, right? And it doesn't look like
>>>>>>>> TTM_PL_FLAG_TEMPORARY will be used in core TTM, so should we 
>>>>>>>> instead set
>>>>>>>> aside a mask in the PL flag for driver-private use?
>>>>>>> Hi Thomas,
>>>>>>>
>>>>>>> Thanks for your comments and advice, GTT means Graphics 
>>>>>>> Translation Table here, seems
>>>>>>> a general acronym. TTM_PL_FLAG_TEMPORARY may also be used by 
>>>>>>> other drives.
>>>>>>> I have made other patches for this. Please help review.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Lang
>>>>>>>
>>>>>> My point was really that the flag naming and documentation should 
>>>>>> reflect what core ttm is doing with that flag. If there is no 
>>>>>> specific core TTM usage, IMO we should move it to a driver 
>>>>>> specific flag to avoid future confusion. In particular a writer 
>>>>>> of a generic TTM resource manager should be able to know without 
>>>>>> looking at an old commit message what the placement flag is 
>>>>>> intended for.
>>>>>>
>>>>>> So here we add a flag where core TTM forces a bo move on validate 
>>>>>> and that's it. And that appears to be how it's used when an 
>>>>>> amdgpu bo is evicted to GTT, (btw should it be accounted in this 
>>>>>> situation?)
>>>>>>
>>>>>> The other use is to force the amdgpu driver to temporarily accept 
>>>>>> it into GTT when there is a lack of space, and IMHO that's a 
>>>>>> driver specific use and we should allocate a mask for driver 
>>>>>> specific flags for that.
>>>>>>
>>>>>> So shouldn't this be two flags, really?
>>>>>
>>>>> Well one flag makes sense for the use case at hand that drivers 
>>>>> want to signal to TTM that an allocation is only temporary and not 
>>>>> considered valid.
>>>>>
>>>>> That we then use this flag to implement temporary GTT allocations 
>>>>> to avoid problems during eviction is just extending that use case.
>>>>>
>>>> OK, but it looked like there were actually two use-cases. One for 
>>>> possibly long-term VRAM evictions to GTT, the other for the 
>>>> temporary use-case where the hop resource allocations sometimes 
>>>> failed. Or did I misunderstand the code?
>>>
>>> Ok sounds like we need more documentation here. That's really one 
>>> use case.
>>>
>>> Key point is we need temporary allocation during multihop which 
>>> should be handled differently to normal allocations.
>>
>> Yes, that part is clear from the patches. The part that I can't fit 
>> into that pattern is why the evict flags when evicting from visible 
>> VRAM to GTT or ordinary VRAM is marked with TTM_PL_FLAG_TEMPORARY. 
>> Wouldn't those remain evicted in that placement until re-validated to 
>> visible VRAM at an unknown future time?
>
> Not necessarily.
>
> The situation we ran into was the following:
> 1. OOM on VRAM, we try to evict.
>
> 2. GTT space is used up as well, ok evict directly to SYSTEM.
>
> 3. For VRAM->SYSTEM eviction we use a temporary bounce buffer.
>
> 4. Waiting for the bounce buffer to become idle is interrupted by a 
> signal so BO is still backed by bounce buffer.
>
> 5. Next CS, BO is validated with VRAM|GTT. TTM sees that it is in GTT 
> and doesn't move the buffer.
>
> 6. CS goes into nirvana because bounce buffers are not meant to be 
> used for CS (we can ignore alignment and accounting for them).
>
Yes, makes sense to me.

/Thomas