[Intel-gfx] [PATCH 00/26] [RFCish] GEN7 dynamic page tables

Tue Mar 18 06:48:32 CET 2014

These patches live here, based on my temporary Broadwell branch:
http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=dynamic_pt_alloc

First, and most importantly, this work should have no impact on current
drm-intel code because PPGTT is currently shut off there. To actually
test this patch series, one must re-enable PPGTT. On a single run of IGT
on IVB, it seem this doesn't introduce any regressions, but y'know, it's
PPGTT, so there's some instability, and it's hard to claim for certain
this doesn't break anything on top. Also, as stated below, the gen8 work
is only partially done.

Before I go too much further with this, I wanted to get eyes on it. I am
really open to any feedback. Before you do request a change though,
please realize that I've gone through several iterations of the
functions/interfaces. So please, spare me some pain and try to think
through what your request is before rattling it off. Daniel has
expressed to me already that he is unwilling to merge certain things
until PPGTT problems are fixed, and that can be enabled by default.
That's okay. In my opinion, many of the patches don't really have any
major behavioral changes, and only make the code so much more readable
and easy to deal with, that I believe merging it would only improve
PPGTT debugging in the future. There are several cleanups in the series
which could also go in relatively harmlessly.

Okay, so what does this do?
The patch series /dynamicizes/ page table allocation and teardown for
GEN7. It also starts to introduce GEN8, but the tricky stuff is still
not done. Up until now, all our page tables are pre-allocated when the
address space is created. That's actually okay for current GENs since we
don't use many address spaces, and the page tables occupy only 2MB each.
However, on GEN8 we can use a deeper page table, and to preallocate such
an address space would be very costly. This work was done for GEN7 first
because this is the most well tested with full PPGTT, and stable
platforms are readily available.

In this patch series, I've demonstrated how we will manage tracking used
page tables (bitmaps), and broken things out into much more discrete
functions. I'm hoping I'll get feedback on the way I've implemented
things (primarily if it seems fundamentally flawed in any way). The real
goal was to prove out the dynamic allocation so we can begin to enable
GEN8 in the same way. I'll emphasize now that I put in a lot of effort
limit risk with each patch, and this does result in some excess churn.

My next step is bring GEN8 up to par with GEN7. Once GEN8 is working,
and clean we can find where GEN7, and GEN8 overlap, and then recombine
where I haven't done so already. It's possible this plan will not work
out, and the above 2 steps will end up as one. After that, I plan to
merge the VA range allocation, and teardown into the insert/clear
entries (currently it's two steps). I think both of those steps should
be distinct.

On x86 code overlap:
I spent more time that I would have liked trying to conjoin our
pagetable management with x86 code. In the end I decided not to depend
on any of the x86 definitions (other than PAGE_SIZE) because I found the
maze of conditional compiles and defines a bit too cumbersome.  I also
didn't feel the abstract pagetable topology used in x86 code was
worthwhile given that with about 6 #defines, we achieve the same thing.
We just don't support nearly as many configurations, and our page table
format differs in too many places. One thing I had really considered,
and toyed around with was not having data structures to track the page
tables we've allocated and simply use the one that's in memory (which is
what x86 does). I was not able to make this work because of IOMMU. The
address we write into our page tables is an IOMMU address.  This means
we need to know, or be able to easily derive both the physical address
(or pfn, or struct page), and the DMA address. I failed to accomplish
this. I think using the bitmaps should be a fast way than having to kmap
the pagetables to determine their status anyway. And, one thing to keep
in mind is currently we don't have any GPU faulting capability. This
will greatly limit the ability to map things sparsely, which also will
greatly limit the effective virtual address space we can use.

Ben Widawsky (26):
  drm/i915: Split out verbose PPGTT dumping
  drm/i915: Extract switch to default context
  drm/i915: s/pd/pdpe, s/pt/pde
  drm/i915: rename map/unmap to dma_map/unmap
  drm/i915: Setup less PPGTT on failed pagedir
  drm/i915: Wrap VMA binding
  drm/i915: clean up PPGTT init error path
  drm/i915: Un-hardcode number of page directories
  drm/i915: Split out gtt specific header file
  drm/i915: Make gen6_write_pdes gen6_map_page_tables
  drm/i915: Range clearing is PPGTT agnostic
  drm/i915: Page table helpers, and define renames
  drm/i915: construct page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Generalize GEN6 mapping
  drm/i915: Clean up pagetable DMA map & unmap
  drm/i915: Always dma map page table allocations
  drm/i915: Consolidate dma mappings
  drm/i915: Always dma map page directory allocations
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip logic
  drm/i915: Force pd restore when PDEs change, gen6-7
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915: Print used ppgtt pages for gen6 in debugfs
  FOR REFERENCE ONLY

 drivers/gpu/drm/i915/i915_debugfs.c        |  47 +-
 drivers/gpu/drm/i915/i915_drv.h            | 169 +----
 drivers/gpu/drm/i915/i915_gem.c            |  10 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |  25 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 995 +++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h        | 417 ++++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.c      |   1 -
 drivers/gpu/drm/i915/i915_trace.h          | 108 ++++
 9 files changed, 1198 insertions(+), 584 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_gtt.h

-- 
1.9.0