Random short freezes due to TTM buffer migrations

Wed Aug 17 23:39:09 UTC 2016

Actually, I was wrong.

The buffers in that app are pretty small. The largest one has 86 MB and
others have 52 MB. I must have misread that as 520 MB.

At one point, ttm_bo_validate with a 32 MB buffer moved 971 MB.

Maybe it's just a VRAM fragmentation issue (i.e. a lack of contiguous free
memory).

Marek

On Wed, Aug 17, 2016 at 9:19 PM, Christian König <deathsimple at vodafone.de>
wrote:

> Sharing buffers between applications is handled by the DRM layer and
> transparent to the driver.
>
> E.g. the driver is not even informed if a sharing is done by DMA-buf or
> GEM flink, it's just another reference to the BO.
>
> So there isn't any change to that at all.
>
> Regards,
> Christian.
>
>
> Am 17.08.2016 um 21:03 schrieb Felix Kuehling:
>
>> I think the scatter-gather tables only support system memory. As I
>> understand it, a buffer in VRAM has be migrated to system memory before
>> it can be shared with another driver.
>>
>> I'm more concerned about sharing with the same driver. There is a
>> special code path for that, where we simply add another reference to the
>> same BO, instead of looking at a scatter gather table. We use that for
>> OpenGL-OpenCL interop, and also planning to use it for IPC buffer
>> sharing in HSA. As long as a split VRAM buffer is still a single
>> amdgpu_bo, and becomes a single dmabuf when exporting it, I think that
>> should work.
>>
>> Regards,
>>    Felix
>>
>>
>> On 16-08-17 02:58 AM, Christian König wrote:
>>
>>> One question: Will it be possible to share these split BOs as dmabufs?
>>>>
>>> In theory yes, in practice I'm not sure.
>>>
>>> DMA-bufs are designed around scatter gather tables, those fortunately
>>> support buffers split over the whole address space.
>>>
>>> The problem is the importing device needs to be able to handle that as
>>> well.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 16.08.2016 um 20:33 schrieb Felix Kuehling:
>>>
>>>> Very nice. I'm looking forward to this for KFD as well.
>>>>
>>>> One question: Will it be possible to share these split BOs as dmabufs?
>>>>
>>>> Regards,
>>>>     Felix
>>>>
>>>>
>>>> On 16-08-16 11:27 AM, Christian König wrote:
>>>>
>>>>> Hi Marek,
>>>>>
>>>>> I'm already working on this.
>>>>>
>>>>> My current approach is to use a custom BO manager for VRAM with TTM
>>>>> and so split allocations into chunks of 4MB.
>>>>>
>>>>> Large BOs are still swapped out as one, but it makes it much more
>>>>> likely to that you can allocate 1/2 of VRAM as one buffer.
>>>>>
>>>>> Give me till the end of the week to finish this and then we can test
>>>>> if that's sufficient or if we need to do more.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 16.08.2016 um 16:33 schrieb Marek Olšák:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm seeing random temporary freezes (up to 2 seconds) under memory
>>>>>> pressure. Before I describe the exact circumstances, I'd like to say
>>>>>> that this is a serious issue affecting playability of certain AAA
>>>>>> Linux games.
>>>>>>
>>>>>> In order to reproduce this, an application should:
>>>>>> - allocate a few very large buffers (256-512 MB per buffer)
>>>>>> - allocate more memory than there is available VRAM. The issue also
>>>>>> occurs (but at a lower frequency) if the app needs only 80% of VRAM.
>>>>>>
>>>>>> Example: ttm_bo_validate needs to migrate a 512 MB buffer. The total
>>>>>> size of moved memory for that call can be as high as 1.5 GB. This is
>>>>>> always followed by a big temporary drop in VRAM usage.
>>>>>>
>>>>>> The game I'm testing needs 3.4 GB of VRAM.
>>>>>>
>>>>>> Setups:
>>>>>> Tonga - 2 GB: It's nearly unplayable, because freezes occur too often.
>>>>>> Fiji - 4 GB: There is one freeze at the beginning (which is annoying
>>>>>> too), after that it's smooth.
>>>>>>
>>>>>> So even 4 GB is not enough.
>>>>>>
>>>>>> Workarounds:
>>>>>> - Split buffers into smaller pieces in the kernel. It's not necessary
>>>>>> to manage memory at page granularity (64KB). Splitting buffers into
>>>>>> 16MB-large pieces might not be optimal but it would be a significant
>>>>>> improvement.
>>>>>> - Or do the same in Mesa. This would prevent inter-process and
>>>>>> inter-API buffer sharing for split buffers (DRI, OpenCL), but we would
>>>>>> at least verify how much the situation improves.
>>>>>>
>>>>>> Other issues sharing the same cause:
>>>>>> - Allocations requesting 1/3 or more VRAM have a high chance of
>>>>>> failing. It's generally not possible to allocate 1/2 or more VRAM as
>>>>>> one buffer.
>>>>>>
>>>>>> Comments welcome,
>>>>>>
>>>>>> Marek
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
>>>>
>>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160818/1fce3f71/attachment.html>