[PATCH] drm/amdgpu: don't limit gtt size on apus
Christian König
christian.koenig at amd.com
Thu Jan 7 16:10:16 UTC 2021
Am 06.01.21 um 18:04 schrieb Joshua Ashton:
>
>
> On 1/6/21 2:59 PM, Christian König wrote:
>> Am 06.01.21 um 15:18 schrieb Joshua Ashton:
>>> [SNIP]
>>>>>>> For Vulkan we (both RADV and AMDVLK) use GTT as the total size.
>>>>>>> Usage in modern games is essentially "bindless" so there is no
>>>>>>> way to track at a per-submission level what memory needs to be
>>>>>>> resident. (and even with tracking applications are allowed to
>>>>>>> use all the memory in a single draw call, which would be
>>>>>>> unsplittable anyway ...)
>>>>>>
>>>>>> Yeah, that is a really good point.
>>>>>>
>>>>>> The issue is that we need some limitation since 3/4 of system
>>>>>> memory is way to much and the max texture size test in piglit can
>>>>>> cause a system crash.
>>>>>>
>>>>>> The alternative is a better OOM handling, so that an application
>>>>>> which uses to much system memory through the driver stack has a
>>>>>> more likely chance to get killed. Cause currently that is either
>>>>>> X or Wayland :(
>>>>>>
>>>>>> Christian.
>>>>>
>>>>> As I understand it, what is being exposed right now is essentially
>>>>> max(vram size, 3GiB) limited by 3/4ths of the memory. Previously,
>>>>> before the revert what was being taken was just max(3GiB, 3/4ths).
>>>>>
>>>>> If you had < 3GiB of system memory that seems like a bit of an
>>>>> issue that could easily leat to OOM to me?
>>>>
>>>> Not really, as I said GTT is only the memory the GPU can lock at
>>>> the same time. It is perfectly possible to have that larger than
>>>> the available system memory.
>>>>
>>>> In other words this is *not* to prevent using to much system
>>>> memory, for this we have an additional limit inside TTM. But
>>>> instead to have a reasonable limit for applications to not use to
>>>> much memory at the same time.
>>>>
>>>
>>> Worth noting that this GTT size here also affects the memory
>>> reporting and budgeting for applications. If the user has 1GiB of
>>> total system memory and 3GiB set here, then 3GiB will be the budget
>>> and size exposed to applications too...
>>
>> Yeah, that's indeed problematic.
>>
>>>
>>> (On APUs,) we really don't want to expose more GTT than system
>>> memory. Apps will eat into it and end up swapping or running into
>>> OOM or swapping *very* quickly. (I imagine this is likely what was
>>> being run into before the revert.)
>>
>> No, the issue is that some applications try to allocate textures way
>> above some reasonable limit.
>>
>>> Alternatively, in RADV and other user space drivers like AMDVLK, we
>>> could limit this to the system memory size or 3/4ths ourselves.
>>> Although that's kinda gross and I don't think that's the correct
>>> path...
>>
>> Ok, let me explain from the other side: We have this limitation
>> because otherwise some tests like the maximum texture size test for
>> OpenGL crashes the system. And this is independent of your system
>> configuration.
>>
>> We could of course add another limit for the texture size in
>> OpenGL/RADV/AMDVLK, but I agree that this is rather awkward.
>>
>>>>>
>>>>> Are you hitting on something smaller than 3/4ths right now? I
>>>>> remember the source commit mentioned they only had 1GiB of system
>>>>> memory available, so that could be possible if you had a carveout
>>>>> of < 786MiB...
>>>>
>>>> What do you mean with that? I don't have a test system at hand for
>>>> this if that's what you are asking for.
>>>
>>> This was mainly a question to whoever did the revert. The question
>>> to find out some extra info about what they are using at the time.
>>
>> You don't need a specific system configuration for this, just try to
>> run the max texture size test in piglit.
>>
>> Regards,
>> Christian.
>
> I see... I have not managed to reproduce a hang as described in the
> revert commit, but I have had a soft crash and delay with the OOM
> killer ending X.org after a little bit when GTT > system memory.
>
> I tested with max-texture-size on both Renoir and Picasso the
> following conditions:
> 16GiB RAM + 12 GiB GTT -> test works fine
> 16GiB RAM + 64 GiB GTT -> OOM killer kills X.org after a little bit of
> waiting (piglit died with it)
> 2 GiB RAM + 1.5GiB GTT -> test works fine
>
> I also tested on my Radeon VII and it worked fine regardless of the
> GTT size there, although that card has more than enough video memory
> any way for nothing to be an issue there 🐸.
> Limiting my system memory to 2GiB, the card's memory and visible
> memory to 1GiB and the GTT to 1.75GiB, the test works fine.
>
> The only time I ever had problems with a crash or pesudo-hang (waiting
> for OOM killer but the system was locked up) was whenever GTT was >
> system memory (ie. in the reverted commit)
>
> If I edited my commit to universally use 3/4ths of the system memory
> for GTT for all hardware, would that be considered to be merged?
Well maybe 1/2 and only on APUs. And you need to find somebody with
another Raven to test that. Maybe Nirmoy has time for this.
Regards,
Christian.
>
> Thanks!
> - Joshie 🐸✨
>
>>
>>>
>>> - Joshie 🐸✨
>>
More information about the amd-gfx
mailing list