[Intel-gfx] [PATCH v4 00/18] 48-bit PPGTT

Sat Jul 11 12:52:54 PDT 2015

On Tue, Jul 07, 2015 at 04:14:45PM +0100, Michel Thierry wrote:
> These are the rebased patches, after Mika's final ppgtt clean-up series landed
> (it relies in the macros added) and Akash review comments.
> 
> In order expand the GPU address space, a 4th level translation is added, the
> Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
> each pointing to a PDP. All the existing "dynamic alloc ppgtt" functions are
> used, only adding the 4th level changes. I also updated some remaining
> variables that were 32b only.
> 
> There are 2 hardware workarounds needed to allow correct operation with 48b
> addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset). This
> new patchset version includes the comments and suggestions from Chris Wilson.
> A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given object can be
> allocated outside the first 4 PDPs; if not, the end range is forced to 4GB. Also,
> more objects now use the DRM_MM_CREATE_TOP flag. To maintain compatibility, in
> libdrm I added a new drm_intel_bo_emit_reloc_48bit function that will flag
> these objects, while the existing drm_intel_bo_emit_reloc clears it.
> 
> Finally, this feature is only available in BDW and Gen9, requires LRC submission
> mode (execlists) and it can be detected by i915.enable_ppgtt=3.

We should be adding ppgtt size to say get_aperture_ioctl so that we can
allow it change in the future.

It looks like we can switch between 32bit and 48bit contexts on the fly.
As we require opt-in from userspace to even make use of the extended vm,
should we also not create 32bit contexts by default? (i.e. are there any
noticeable regressions from switching to 48bit contexts?)
> 
> Also note that this expanded address space is only available for full PPGTT,
> aliasing PPGTT and Global GTT remain 32-bit.
> Michel Thierry (18):
>   drm/i915: Remove unnecessary gen8_clamp_pd
>   drm/i915/gen8: Make pdp allocation more dynamic
>   drm/i915/gen8: Add PML4 structure
>   drm/i915/gen8: Abstract PDP usage
>   drm/i915/gen8: Add dynamic page trace events
>   drm/i915/gen8: implement alloc/free for 4lvl
>   drm/i915/gen8: Add 4 level switching infrastructure and lrc support
>   drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
>   drm/i915/gen8: Pass sg_iter through pte inserts

sg_page_iter can be a ratelimiting step in some workloads. Though I hope
with 48bit support, the need for evictions to be reduced and so
insertion to be of less impact. However, do you have any ideas for speeding
up ppgtt_insert?

>   drm/i915/gen8: Add 4 level support in insert_entries and clear_range
>   drm/i915/gen8: Initialize PDPs
>   drm/i915: Expand error state's address width to 64b
>   drm/i915/gen8: Add ppgtt info and debug_dump
>   drm/i915: object size needs to be u64
>   drm/i915: batch_obj vm offset must be u64

Or just track the batch_vma.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre