[PATCH 1/2] drm/ttm: Don't evict SG BOs
Christian König
ckoenig.leichtzumerken at gmail.com
Wed Apr 28 09:05:39 UTC 2021
Am 28.04.21 um 09:49 schrieb Felix Kuehling:
> Am 2021-04-28 um 3:04 a.m. schrieb Christian König:
>> Am 28.04.21 um 07:33 schrieb Felix Kuehling:
>>> SG BOs do not occupy space that is managed by TTM. So do not evict them.
>>>
>>> This fixes unexpected evictions of KFD's userptr BOs. KFD only expects
>>> userptr "evictions" in the form of MMU notifiers.
>> NAK, SG BOs also account for the memory the GPU can currently access.
>>
>> We can ignore them for the allocated memory, but not for the GTT domain.
> Hmm, the only reason I found this problem is, that I am now testing with
> IOMMU enabled. Evicting the userptr BO destroys the DMA mapping. Without
> IOMMU-enforced device isolation I was blissfully unaware that the
> userptr BOs were being evicted. The GPUVM mappings were unaffected and
> just worked without problems. Having to evict these BOs is crippling
> KFD's ability to map system memory for GPU access, once again.
>
> I think this affects not only userptr BOs but also DMABuf imports for
> BOs shared between multiple GPUs.
Correct, yes.
> The GTT size limitation is entirely artificial. And the only reason I
> know of for keeping it limited to the VRAM size is to work around some
> OOM issues with GTT BOs. Applying this to userptrs and DMABuf imports
> makes no sense. But I understand that the way TTM manages the GTT domain
> there is no easy fix for this. Maybe we'd have to create a new domain
> for validating SG BOs that's separate from GTT, so that TTM would not
> try to allocate GTT space for them.
Well that is contradict to what the GTT domain is all about.
It should limit the amount of system memory the GPU can access at the
same time. This includes imported DMA-bus as well as userptrs.
That the GPUVM mappings are still there is certainly a bug we should
look into, but in general if we don't want that limitation we need to
increase the GTT size and not work around it.
But increasing the GTT size in turn as has a huge negative impact on OOM
situations up to the point that the OOM killer can't work any more.
> Failing that, I'd probably have to abandon userptr BOs altogether and
> switch system memory mappings over to using the new SVM API on systems
> where it is avaliable.
Well as long as that provides the necessary functionality through HMM it
would be an option.
Regards,
Christian.
>
> Regards,
> Felix
>
>
>> Christian.
>>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>> ---
>>> drivers/gpu/drm/ttm/ttm_bo.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index de1ec838cf8b..0b953654fdbf 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -655,6 +655,10 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>>> list_for_each_entry(bo, &man->lru[i], lru) {
>>> bool busy;
>>> + /* Don't evict SG BOs */
>>> + if (bo->ttm && bo->ttm->sg)
>>> + continue;
>>> +
>>> if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
>>> &busy)) {
>>> if (busy && !busy_bo && ticket !=
More information about the amd-gfx
mailing list