[Intel-gfx] [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror
Ben Widawsky
benjamin.widawsky at intel.com
Sat May 10 05:58:55 CEST 2014
Just as before, these patches are living based off of my Broadwell
branch, here:
http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=gpu_mirror
This is the follow-on patches for [1]
This patch series brings 3 things:
1. Dynamic page table allocation for gen6-8
2. 64b (48b canonical) graphics virtual address space for Broadwell
3. An interface to specify a specific offset for a BO.
It's taken way longer than I thought to get this work done, and given
the current state of our driver, I fear I may not have time to see this
through to the end before I am pulled onto other things. If people want
to send me smallish bugfixes, I will gladly do my best to fix them
quickly. If there are more substantial change requests wrt design or
patch reorganization, I will not be able to accommodate. Someone else
must take over this patch series at that point if they want these
features. I do believe that everything up until the userptr patch is in
decent shape though, so we'll see, I guess. (if you are qualified to
take this over, and have interest, please let me know).
The patch series is highly volatile and not manicured. I've run exactly
1 test on the GPU mirror (see below for what that means), though many
more on the prior stuff. The series depends on full PPGTT, which is not
yet enabled by default, and has a few outstanding issues. It also has
been developed exclusively on pre-production hardware. I am only sending
out now because I will be on vacation for the next 10 days, and I know
there are people that can benefit from this code before I return. With
that, I got the last parts of this working very recently, and they're
very hackish. The reason for this lack of refinement is I expect the
interfaces for letting userspace dictate things to change (more on this
later), and the other part is I just ran out of time before my vacation.
Throughout development, I've been hitting issues which I am not yet sure
if they are bugs in my code, bugs in full PPGTT, bugs in userptr, or
generally flakiness. There are a few patches in here which say TESTME
reflecting upon this. Also, if you want to run this, I highly recommend
turning off semaphores, and rc6. (To be honest, I've not tried it
recently). You also need to turn on PPGTT since it is disabled by
default.
modprobe i915 enable_ppgtt=2 semaphores=0 enable_rc6=0
What you get in this series is what I'm going to coin, GPU mirror. This
patch series allows one to allocate an arbitrary address for your GPU
buffer object, and map it to a specific space within the GPUs address
space. This is only possible because on Broadwell we get a 64b canonical
GPU address space, and this allows us to map any CPU address as a GPU
address. The obvious usage here is malloc(). malloc() returns a pointer
that is valid on the CPU. Now that address can be identical on the GPU.
The interface provided is identical to the userptr interface previously
posted by Chris Wilson. I've added a flag to that interface that
indicates this new functionality. This is not necessarily the final
version, and it's arguably not the best idea either. The reason for this
choice is we had users of userptr that wanted to try out this concept
and not have to do much porting.
To get to the userptr interface, I had to make a few things happen
first. I needed to get dynamic page table allocation and teardown
working. This was posted previously for gen6-7 [1] (with very rough code
for gen8). I've now added more robust support for gen8 dynamic page
table allocations. Doing the allocations dynamically was important
because preallocating all 4 levels of page tables is not feasible in a
real system. 4 level page tables are required in order to be able to
support the 64b canonical address space.
With that all done, I was able to make a few minor hacks to userptr,
take the intel-gpu-tools test from Tvrtko, and see at least one pass.
FWIW, I am currently running,
./tests/gem_userptr_blits --run-subtest coherency-unsync
Since I feel the interface will likely change, I do not feel compelled
to post either my libdrm, not my IGT changes. If you want the modified
test, let me know, as I don't think it's really relevant here.
One last thing. Intel GPU tools, as it stands today, makes a lot of
assumptions about using an address space > 32b. I have not had time to
fix this. It is something which needs fixing before this series could
even be considered testable.
[1] http://lists.freedesktop.org/archives/intel-gfx/2014-March/041814.html
Ben Widawsky (54):
drm/i915: Fix flush before context switch comment
Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again"
drm/i915: Wrap VMA binding
drm/i915: Make pin global flags explicit
drm/i915: Split out aliasing binds
drm/i915: fix gtt_total_entries()
drm/i915: Rename to GEN8_LEGACY_PDPES
drm/i915: Split out verbose PPGTT dumping
drm/i915: s/pd/pdpe, s/pt/pde
drm/i915: rename map/unmap to dma_map/unmap
drm/i915: Setup less PPGTT on failed pagedir
drm/i915: clean up PPGTT init error path
drm/i915: Un-hardcode number of page directories
drm/i915: Make gen6_write_pdes gen6_map_page_tables
drm/i915: Range clearing is PPGTT agnostic
drm/i915: Page table helpers, and define renames
drm/i915: construct page table abstractions
drm/i915: Complete page table structures
drm/i915: Create page table allocators
drm/i915: Generalize GEN6 mapping
drm/i915: Clean up pagetable DMA map & unmap
drm/i915: Always dma map page table allocations
drm/i915: Consolidate dma mappings
drm/i915: Always dma map page directory allocations
drm/i915: Track GEN6 page table usage
drm/i915: Extract context switch skip logic
drm/i915: Force pd restore when PDEs change, gen6-7
drm/i915: Finish gen6/7 dynamic page table allocation
drm/i915/bdw: Use dynamic allocation idioms on free
drm/i915/bdw: pagedirs rework allocation
drm/i915/bdw: pagetable allocation rework
drm/i915/bdw: Make the pdp switch a bit less hacky
drm/i915: num_pd_pages/num_pd_entries isn't useful
drm/i915: Extract PPGTT param from pagedir alloc
drm/i915/bdw: Split out mappings
drm/i915/bdw: begin bitmap tracking
drm/i915/bdw: Dynamic page table allocations
drm/i915/bdw: Scratch unused pages
drm/i915/bdw: Add ppgtt info for dynamic pages
drm/i915/bdw: Optimize PDP loads
TESTME: Either drop the last patch or fix it.
drm/i915/bdw: Add dynamic page trace events
drm/i915/bdw: Make pdp allocation more dynamic
drm/i915/bdw: Abstract PDP usage
drm/i915/bdw: implement alloc/teardown for 4lvl
drm/i915/bdw: 4 level pages tables
drm/i915: Restructure map vs. insert entries
drm/i915/bdw: make aliasing PPGTT dynamic
drm/i915: Expand error state's address width to 64b
drm/i915/bdw: Flip the 48b switch
TESTME: GFX_TLB_INVALIDATE_EXPLICIT
TESTME: Always force invalidate
drm/i915: Track userptr VMAs
drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC)
Chris Wilson (2):
drm/i915: Prevent signals from interrupting close()
drm/i915: Introduce mapping of user pages into video memory (userptr)
ioctl
drivers/gpu/drm/i915/Kconfig | 1 +
drivers/gpu/drm/i915/Makefile | 1 +
drivers/gpu/drm/i915/i915_debugfs.c | 112 +-
drivers/gpu/drm/i915/i915_dma.c | 15 +-
drivers/gpu/drm/i915/i915_drv.h | 40 +-
drivers/gpu/drm/i915/i915_gem.c | 61 +-
drivers/gpu/drm/i915/i915_gem_context.c | 31 +-
drivers/gpu/drm/i915/i915_gem_dmabuf.c | 5 +
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 22 +-
drivers/gpu/drm/i915/i915_gem_gtt.c | 1810 +++++++++++++++++++++-------
drivers/gpu/drm/i915/i915_gem_gtt.h | 354 +++++-
drivers/gpu/drm/i915/i915_gem_userptr.c | 767 ++++++++++++
drivers/gpu/drm/i915/i915_gpu_error.c | 21 +-
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/i915_trace.h | 140 +++
drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +-
include/uapi/drm/i915_drm.h | 20 +
17 files changed, 2823 insertions(+), 580 deletions(-)
create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c
--
1.9.2
More information about the Intel-gfx
mailing list