[Intel-gfx] [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror

Ben Widawsky benjamin.widawsky at intel.com
Sat May 10 05:58:55 CEST 2014

Just as before, these patches are living based off of my Broadwell
branch, here:

This is the follow-on patches for [1]

This patch series brings 3 things:
1. Dynamic page table allocation for gen6-8
2. 64b (48b canonical) graphics virtual address space for Broadwell
3. An interface to specify a specific offset for a BO.

It's taken way longer than I thought to get this work done, and given
the current state of our driver, I fear I may not have time to see this
through to the end before I am pulled onto other things. If people want
to send me smallish bugfixes, I will gladly do my best to fix them
quickly. If there are more substantial change requests wrt design or
patch reorganization, I will not be able to accommodate. Someone else
must take over this patch series at that point if they want these
features. I do believe that everything up until the userptr patch is in
decent shape though, so we'll see, I guess. (if you are qualified to
take this over, and have interest, please let me know).

The patch series is highly volatile and not manicured. I've run exactly
1 test on the GPU mirror (see below for what that means), though many
more on the prior stuff. The series depends on full PPGTT, which is not
yet enabled by default, and has a few outstanding issues. It also has
been developed exclusively on pre-production hardware. I am only sending
out now because I will be on vacation for the next 10 days, and I know
there are people that can benefit from this code before I return. With
that, I got the last parts of this working very recently, and they're
very hackish. The reason for this lack of refinement is I expect the
interfaces for letting userspace dictate things to change (more on this
later), and the other part is I just ran out of time before my vacation.
Throughout development, I've been hitting issues which I am not yet sure
if they are bugs in my code, bugs in full PPGTT, bugs in userptr, or
generally flakiness. There are a few patches in here which say TESTME
reflecting upon this. Also, if you want to run this, I highly recommend
turning off semaphores, and rc6. (To be honest, I've not tried it
recently). You also need to turn on PPGTT since it is disabled by

modprobe i915 enable_ppgtt=2 semaphores=0 enable_rc6=0

What you get in this series is what I'm going to coin, GPU mirror. This
patch series allows one to allocate an arbitrary address for your GPU
buffer object, and map it to a specific space within the GPUs address
space. This is only possible because on Broadwell we get a 64b canonical
GPU address space, and this allows us to map any CPU address as a GPU
address. The obvious usage here is malloc(). malloc() returns a pointer
that is valid on the CPU. Now that address can be identical on the GPU.

The interface provided is identical to the userptr interface previously
posted by Chris Wilson. I've added a flag to that interface that
indicates this new functionality. This is not necessarily the final
version, and it's arguably not the best idea either. The reason for this
choice is we had users of userptr that wanted to try out this concept
and not have to do much porting.

To get to the userptr interface, I had to make a few things happen
first. I needed to get dynamic page table allocation and teardown
working. This was posted previously for gen6-7 [1] (with very rough code
for gen8). I've now added more robust support for gen8 dynamic page
table allocations. Doing the allocations dynamically was important
because preallocating all 4 levels of page tables is not feasible in a
real system. 4 level page tables are required in order to be able to
support the 64b canonical address space.

With that all done, I was able to make a few minor hacks to userptr,
take the intel-gpu-tools test from Tvrtko, and see at least one pass.
FWIW, I am currently running,
./tests/gem_userptr_blits --run-subtest coherency-unsync

Since I feel the interface will likely change, I do not feel compelled
to post either my libdrm, not my IGT changes. If you want the modified
test, let me know, as I don't think it's really relevant here.

One last thing. Intel GPU tools, as it stands today, makes a lot of
assumptions about using an address space > 32b. I have not had time to
fix this. It is something which needs fixing before this series could
even be considered testable.

[1] http://lists.freedesktop.org/archives/intel-gfx/2014-March/041814.html

Ben Widawsky (54):
  drm/i915: Fix flush before context switch comment
  Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again"
  drm/i915: Wrap VMA binding
  drm/i915: Make pin global flags explicit
  drm/i915: Split out aliasing binds
  drm/i915: fix gtt_total_entries()
  drm/i915: Rename to GEN8_LEGACY_PDPES
  drm/i915: Split out verbose PPGTT dumping
  drm/i915: s/pd/pdpe, s/pt/pde
  drm/i915: rename map/unmap to dma_map/unmap
  drm/i915: Setup less PPGTT on failed pagedir
  drm/i915: clean up PPGTT init error path
  drm/i915: Un-hardcode number of page directories
  drm/i915: Make gen6_write_pdes gen6_map_page_tables
  drm/i915: Range clearing is PPGTT agnostic
  drm/i915: Page table helpers, and define renames
  drm/i915: construct page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Generalize GEN6 mapping
  drm/i915: Clean up pagetable DMA map & unmap
  drm/i915: Always dma map page table allocations
  drm/i915: Consolidate dma mappings
  drm/i915: Always dma map page directory allocations
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip logic
  drm/i915: Force pd restore when PDEs change, gen6-7
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915/bdw: Use dynamic allocation idioms on free
  drm/i915/bdw: pagedirs rework allocation
  drm/i915/bdw: pagetable allocation rework
  drm/i915/bdw: Make the pdp switch a bit less hacky
  drm/i915: num_pd_pages/num_pd_entries isn't useful
  drm/i915: Extract PPGTT param from pagedir alloc
  drm/i915/bdw: Split out mappings
  drm/i915/bdw: begin bitmap tracking
  drm/i915/bdw: Dynamic page table allocations
  drm/i915/bdw: Scratch unused pages
  drm/i915/bdw: Add ppgtt info for dynamic pages
  drm/i915/bdw: Optimize PDP loads
  TESTME: Either drop the last patch or fix it.
  drm/i915/bdw: Add dynamic page trace events
  drm/i915/bdw: Make pdp allocation more dynamic
  drm/i915/bdw: Abstract PDP usage
  drm/i915/bdw: implement alloc/teardown for 4lvl
  drm/i915/bdw: 4 level pages tables
  drm/i915: Restructure map vs. insert entries
  drm/i915/bdw: make aliasing PPGTT dynamic
  drm/i915: Expand error state's address width to 64b
  drm/i915/bdw: Flip the 48b switch
  TESTME: Always force invalidate
  drm/i915: Track userptr VMAs
  drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC)

Chris Wilson (2):
  drm/i915: Prevent signals from interrupting close()
  drm/i915: Introduce mapping of user pages into video memory (userptr)

 drivers/gpu/drm/i915/Kconfig               |    1 +
 drivers/gpu/drm/i915/Makefile              |    1 +
 drivers/gpu/drm/i915/i915_debugfs.c        |  112 +-
 drivers/gpu/drm/i915/i915_dma.c            |   15 +-
 drivers/gpu/drm/i915/i915_drv.h            |   40 +-
 drivers/gpu/drm/i915/i915_gem.c            |   61 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |   31 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c     |    5 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   22 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 1810 +++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  354 +++++-
 drivers/gpu/drm/i915/i915_gem_userptr.c    |  767 ++++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.c      |   21 +-
 drivers/gpu/drm/i915/i915_reg.h            |    1 +
 drivers/gpu/drm/i915/i915_trace.h          |  140 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +-
 include/uapi/drm/i915_drm.h                |   20 +
 17 files changed, 2823 insertions(+), 580 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c


More information about the Intel-gfx mailing list