[PATCH 0/9] Replace use of radeon_sa with a new sub allocator

Wed Dec 31 23:46:06 PST 2014


On 12/31/2014 07:07 PM, Christian König wrote:
>> The long-term solution
> That was the part that I missed in the description. Please note somewhere that
> we still need to improve this.
OK, I'll add it to commit msg of the relevant patch (and cover letter)
>
> Apart from that the patches look fine to me, but I need more time to review them
> in detail.
Thanks! I hope we can push it to 3.20
>
> Regards,
> Christian.
>
> Am 31.12.2014 um 15:06 schrieb Oded Gabbay:
>>
>> On 12/31/2014 03:49 PM, Christian König wrote:
>>> Am 31.12.2014 um 14:39 schrieb Oded Gabbay:
>>>> Background:
>>>>
>>>> amdkfd needs GART memory for several things, such as runlist packets,
>>>> MQDs, HPDs and more. Unfortunately, all of this memory must be always
>>>> pinned (due to several reasons which were discussed during the
>>>> initial review of amdkfd).
>>> In general seems to be a good idea, but so far I still don't have seen a
>>> good explanation why all those memory must be pinned. So please summarize
>>> that one once more.
>>>
>>> Regards,
>>> Christian.
>>>
>> ok, once more :)
>>
>> The bulk of the allocations in the GART is for MQDs. MQDs represent active
>> user-mode queues, which are on the current runlist. It is important to
>> remember that active queues doesn't necessarily mean scheduled/running
>> queues, especially if there is over-subscription of queues or more than a
>> single HSA process.
>>
>> Because the scheduling of the user-mode queues is done by the CP firmware,
>> amdkfd doesn't have any indication if the queue is scheduled or not. If the
>> CP will try to schedule a queue, and its MQD is not present, this will
>> probably stuck the CP permanently, as it will load garbage from the GART
>> (the address of the MQD is given to the CP inside the runlist packet).
>>
>> In addition, there are a couple of small allocations which also should
>> always be pinned - runlist packets (2 packets) and HPDs. runlist packets can
>> be quite large, depending on number of processes and queues.
>>
>> A few solutions were proposed, but at the end Jerome agreed there is no harm
>> when limiting the total memory consumption to around 4MB.
>>
>> The long-term solution, which I will be working on, hopefully soon, is to
>> create a mechanism through which radeon/ttm can ask amdkfd to clear
>> GART/VRAM memory due to memory pressure. Then, amdkfd will preempt the
>> running queues and wait until the memory pressure is over. Then it will
>> reschedule the queues. But I'm getting ahead of myself. I hope to send an
>> RFC about that in the next couple of weeks.
>>
>>     Oded
>>
>>
>>
>>>> Current Solution:
>>>>
>>>> The current (short/mid-term) solution that was proposed by Jerome.G, is
>>>> to limit the amount of memory to a small size, roughly 4MB and allocate
>>>> this buffer at the start of the GART. To accomodate this, amdkfd has
>>>> two kernel module parameters, maximum number of HSA processes and
>>>> maximum number of queues per process, which require under 4MB of GART
>>>> memory when using their defaults, 32 and 128 respectively.
>>>>
>>>> Until now, amdkfd used the radeon sub-allocator module (radeon_sa)
>>>> to handle the sub-allocation of memory from this large buffer to
>>>> different modules inside the amdkfd.
>>>>
>>>> However, while running OpenCL conformance test suite, we found that
>>>> radeon_sa module is not suitable for this kind of task, due to its
>>>> design:
>>>> 1. Every allocation increments its interal pointer so the next
>>>> allocation is *always* done ahead of the previous allocation. This
>>>> causes the internal pointer to wrap-around when it reaches the end of
>>>> the buffer.
>>>>
>>>> 2. When encoutering an area that is already allocated, the module
>>>> waits for that area to be freed. If it is not freed in a timely manner
>>>> (or has no fence), the allocation fails. Simply put, it can't "skip"
>>>> the allocated area.
>>>>
>>>> Now, this is most probably good for graphics, but for amdkfd needs,
>>>> the combination of the two behaviors mentioned above eventually causes
>>>> a denial-of-service. This is because some memory allocations
>>>> are *always* present and *never* freed (such as HPDs).
>>>> Therefore, given enough time and workload, the radeon_sa eventually
>>>> wraps around, encounters an already allocated area and gets stuck.
>>>>
>>>> Proposed new solution:
>>>>
>>>> To solve this, I have written a simple sub-allocator module inside
>>>> amdkfd. It allocates fixed-size contiguous chunks (1 or more) and uses
>>>> a bitmap to manage the allocations. The next allocation is always
>>>> being searched for from the start of the GART buffer, and the module
>>>> knows how to skip allocated chunks.
>>>>
>>>> Because most allocations are MQDs, and MQDs are 512 Bytes in size, I
>>>> set the default chunk size to be 512 Bytes.
>>>>
>>>> The basic GART memory allocation is still being done in the
>>>> amdkfd <--> radeon interface, and it still occupies less than 4MB.
>>>>
>>>> I have chosen to implement a new allocator instead of changing
>>>> radeon_sa because the behavior of radeon_sa is very appropriate for
>>>> graphics, where allocations do not stay forever. Also, amdkfd doesn't
>>>> actually need the flexibility and features radeon_sa provides.
>>>>
>>>>      Oded
>>>>
>>>> Oded Gabbay (9):
>>>>     drm/amd: Add new kfd-->kgd interface for gart usage
>>>>     drm/radeon: Impl. new gtt allocate/free functions
>>>>     drm/amdkfd: Add gtt sa related data to kfd_dev struct
>>>>     drm/amdkfd: Add kfd gtt sub-allocator functions
>>>>     drm/amdkfd: Fixed calculation of gart buffer size
>>>>     drm/amdkfd: Allocate gart memory using new interface
>>>>     drm/amdkfd: Using new gtt sa in amdkfd
>>>>     drm/radeon: Remove old radeon_sa usage from kfd-->kgd interface
>>>>     drm/amd: Remove old radeon_sa funcs from kfd-->kgd interface
>>>>
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_device.c            | 217
>>>> ++++++++++++++++++++-
>>>>    .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  23 +--
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |  41 ++--
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |  16 +-
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  10 +-
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  28 ++-
>>>>    drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  23 +--
>>>>    drivers/gpu/drm/radeon/radeon_kfd.c                | 128 ++++++------
>>>>    8 files changed, 329 insertions(+), 157 deletions(-)
>>>>
>