[Intel-gfx] [PATCH v5 00/19] 48-bit PPGTT

Tue Jul 28 05:18:58 PDT 2015

On Thu, Jul 16, 2015 at 10:33:12AM +0100, Michel Thierry wrote:
> This clean-up version delays the 48-bit work to later patches and includes
> other review comments from Akash and Chris Wilson. The first 4 patches
> prepare the dynamic page allocation code to handle independent pdps, but
> no specific code for 48-bit mode is added before the 5th patch.
> 
> In order expand the GPU address space, a 4th level translation is added,
> the Page Map Level 4 (PML4). This PML4 has 512 PML4 Entries (PML4E),
> PML4[0-511], each pointing to a PDP. All the existing "dynamic alloc
> ppgtt" functions are used, only adding the 4th level changes. I also
> updated some remaining variables that were 32b only.
> 
> There are 2 hardware workarounds needed to allow correct operation with
> 48b addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset).
> This new patchset version includes the comments and suggestions from Chris
> Wilson. A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given
> object can be allocated outside the first 4 PDPs; if not, the end range is
> forced to 4GB. Also, more objects now use the DRM_MM_CREATE_TOP flag. To
> maintain compatibility, in libdrm I added a new drm_intel_bo_emit_reloc_48bit
> function that will flag these objects, while the existing drm_intel_bo_emit_reloc
> clears it.
> 
> Finally, this feature is only available in BDW and Gen9, requires LRC
> submission mode (execlists) and it can be detected by i915.enable_ppgtt=3.
> 
> Also note that this expanded address space is only available for full
> PPGTT, aliasing PPGTT and Global GTT remain 32-bit.

A test I just thought of is to extend gem_evict_alignment to iterate
over

for (align = 1<<12; align < 1<<48; align <<= 1)
    exec(obj.align=align)

i.e. basically force the kernel to place the object in every
power-of-two zone. The idea here is to exercise and allocate as much of
the 4-level page table handling code as is trivially possible (to work
on extents tracking you could leave each level in place. Now this is
starting to feel more like a gem_ppgtt test). Using softpin we would
move control over exercising every boundary in the code (but then
requires softpin).

Also noticed that constructing the bitmaps for va_alloc_range tracking
was very expensive, even in the trivial no-op case (rebinding to the
same location). A benchmark to measure that allocation overhead would be
very useful. For that I think a synthetic like using softpin to move an
object through the entire address space or even flip between two locations
would do the job.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre