Reverted another change to fix buffer move hangs (was Re: [PATCH] drm/ttm: partial revert "cleanup ttm_tt_(unbind|destroy)" v2)

Felix Kuehling felix.kuehling at amd.com
Tue Aug 16 18:45:03 UTC 2016


Thank you. Sorry, I already pushed it with Alex's R-B, without yours.


On 16-08-16 03:53 AM, Christian König wrote:
> Am 15.08.2016 um 23:03 schrieb Alex Deucher:
>> On Mon, Aug 15, 2016 at 3:06 PM, Felix Kuehling
>> <felix.kuehling at amd.com> wrote:
>>> Patch against current amd-staging-4.6 is attached.
>> Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
>
> Reviewed-by: Christian König <christian.koenig at amd.com>.
>
>>
>>> Regards,
>>>    Felix
>>>
>>>
>>> On 16-08-13 05:25 AM, Christian König wrote:
>>>> Am 13.08.2016 um 01:22 schrieb Felix Kuehling:
>>>>> [CC Kent FYI]
>>>>>
>>>>> On 16-08-11 04:31 PM, Deucher, Alexander wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On
>>>>>>> Behalf
>>>>>>> Of Felix Kuehling
>>>>>>> Sent: Thursday, August 11, 2016 3:52 PM
>>>>>>> To: Michel Dänzer; Christian König
>>>>>>> Cc: amd-gfx at lists.freedesktop.org
>>>>>>> Subject: Reverted another change to fix buffer move hangs (was Re:
>>>>>>> [PATCH] drm/ttm: partial revert "cleanup
>>>>>>> ttm_tt_(unbind|destroy)" v2)
>>>>>>>
>>>>>>> We had to revert another change on the KFD branch to fix a
>>>>>>> buffer move
>>>>>>> problem: 8b6b79f43801f00ddcdc10a4d5719eba4b2e32aa (drm/amdgpu:
>>>>>>> group BOs
>>>>>>> by log2 of the size on the LRU v2
>>>>>> That makes sense.  I think you may want a different LRU scheme for
>>>>>> KFD or at least special handling for KFD buffers.
>>>>> [FK] But I think the patch shouldn't cause hangs, regardless.
>>>>>
>>>>> I eventually found what the problem was. The "group BOs by log2 of
>>>>> the
>>>>> size on the LRU v2" patch exposed a latent bug related to the GART
>>>>> size.
>>>>> On our KFD branch, we calculate the GART size differently, and it can
>>>>> easily go above 4GB. I think on amd-staging-4.6 the GART size can
>>>>> also
>>>>> go above 4GB on cards with lots of VRAM.
>>>>>
>>>>> However, the offset parameter in amdgpu_gart_bind and unbind is only
>>>>> 32-bit. With the patch our test ended up using GART offsets beyond
>>>>> 4GB
>>>>> for the first time. Changing the offset parameter to uint64_t
>>>>> fixes the
>>>>> problem.
>>>> Nice catch, please provide a patch to fix this.
>>>>
>>>>> Our test also demonstrates a potential flaw in the log2 grouping
>>>>> patch:
>>>>> When a buffer of a previously unused size is added to the LRU, it
>>>>> gets
>>>>> added to the front of the list, rather than the tail. So an
>>>>> application
>>>>> that allocates a very large buffer after a bunch of smaller
>>>>> buffers, is
>>>>> very likely to have that buffer evicted over and over again before
>>>>> any
>>>>> smaller buffers are considered for eviction. I believe, this can
>>>>> result
>>>>> in thrashing of large buffers.
>>>>>
>>>>> Some other observations: When the last BO of a given size is removed
>>>>> from the LRU list, the LRU tail for that size is left "floating"
>>>>> in the
>>>>> middle of the LRU list. So the next BO of that size that is added,
>>>>> will
>>>>> be added at an arbitrary position in the list. It may even end up
>>>>> in the
>>>>> middle of a block of pages of a different size. So a log2 grouping
>>>>> may
>>>>> end up being split.
>>>> Yeah, those are more or less known issues.
>>>>
>>>> Keep in mind that we only added the grouping by log2 of the size to
>>>> have a justification to push the TTM changes upstream for the coming
>>>> KFD fences.
>>>>
>>>> E.g. so that we are able to have this upstream before we try to push
>>>> on the fence code.
>>>>
>>>> I will take a look at fixing those issues when I have time, shouldn't
>>>> be to complicated to set the entries to zero when they aren't used or
>>>> adjust other entries as well when some are added.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Regards,
>>>>>     Felix
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>



More information about the amd-gfx mailing list