[PATCH 0/9] Replace use of radeon_sa with a new sub allocator

Wed Dec 31 05:39:18 PST 2014

Background:

amdkfd needs GART memory for several things, such as runlist packets, 
MQDs, HPDs and more. Unfortunately, all of this memory must be always 
pinned (due to several reasons which were discussed during the 
initial review of amdkfd).

Current Solution: 

The current (short/mid-term) solution that was proposed by Jerome.G, is 
to limit the amount of memory to a small size, roughly 4MB and allocate
this buffer at the start of the GART. To accomodate this, amdkfd has
two kernel module parameters, maximum number of HSA processes and 
maximum number of queues per process, which require under 4MB of GART 
memory when using their defaults, 32 and 128 respectively.

Until now, amdkfd used the radeon sub-allocator module (radeon_sa) 
to handle the sub-allocation of memory from this large buffer to 
different modules inside the amdkfd.

However, while running OpenCL conformance test suite, we found that 
radeon_sa module is not suitable for this kind of task, due to its 
design:
1. Every allocation increments its interal pointer so the next 
allocation is *always* done ahead of the previous allocation. This 
causes the internal pointer to wrap-around when it reaches the end of 
the buffer.

2. When encoutering an area that is already allocated, the module 
waits for that area to be freed. If it is not freed in a timely manner 
(or has no fence), the allocation fails. Simply put, it can't "skip" 
the allocated area.

Now, this is most probably good for graphics, but for amdkfd needs, 
the combination of the two behaviors mentioned above eventually causes 
a denial-of-service. This is because some memory allocations 
are *always* present and *never* freed (such as HPDs).
Therefore, given enough time and workload, the radeon_sa eventually 
wraps around, encounters an already allocated area and gets stuck.

Proposed new solution:

To solve this, I have written a simple sub-allocator module inside 
amdkfd. It allocates fixed-size contiguous chunks (1 or more) and uses 
a bitmap to manage the allocations. The next allocation is always 
being searched for from the start of the GART buffer, and the module 
knows how to skip allocated chunks.

Because most allocations are MQDs, and MQDs are 512 Bytes in size, I 
set the default chunk size to be 512 Bytes.

The basic GART memory allocation is still being done in the 
amdkfd <--> radeon interface, and it still occupies less than 4MB.

I have chosen to implement a new allocator instead of changing 
radeon_sa because the behavior of radeon_sa is very appropriate for 
graphics, where allocations do not stay forever. Also, amdkfd doesn't 
actually need the flexibility and features radeon_sa provides.

	Oded

Oded Gabbay (9):
  drm/amd: Add new kfd-->kgd interface for gart usage
  drm/radeon: Impl. new gtt allocate/free functions
  drm/amdkfd: Add gtt sa related data to kfd_dev struct
  drm/amdkfd: Add kfd gtt sub-allocator functions
  drm/amdkfd: Fixed calculation of gart buffer size
  drm/amdkfd: Allocate gart memory using new interface
  drm/amdkfd: Using new gtt sa in amdkfd
  drm/radeon: Remove old radeon_sa usage from kfd-->kgd interface
  drm/amd: Remove old radeon_sa funcs from kfd-->kgd interface

 drivers/gpu/drm/amd/amdkfd/kfd_device.c            | 217 ++++++++++++++++++++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  23 +--
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |  41 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |  16 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  28 ++-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  23 +--
 drivers/gpu/drm/radeon/radeon_kfd.c                | 128 ++++++------
 8 files changed, 329 insertions(+), 157 deletions(-)

-- 
2.1.0