[Intel-gfx] [RFC 00/22] Gen7 batch buffer command parser
bradley.d.volkin at intel.com
bradley.d.volkin at intel.com
Tue Nov 26 17:51:17 CET 2013
From: Brad Volkin <bradley.d.volkin at intel.com>
Certain OpenGL features (e.g. transform feedback, performance monitoring)
require userspace code to submit batches containing commands such as
MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
generations of the hardware will noop these commands in "unsecure" batches
(which includes all userspace batches submitted via i915) even though the
commands may be safe and represent the intended programming model of the device.
This series introduces a software command parser similar in operation to the
command parsing done in hardware for unsecure batches. However, the software
parser allows some operations that would be noop'd by hardware, if the parser
determines the operation is safe, and submits the batch as "secure" to prevent
hardware parsing. Currently the series implements this on IVB and HSW.
The series is divided into several phases:
patches 01-09: These implement infrastructure and the command parsing algorithm,
all behind a module parameter. I expect some discussion and
rework, but hopefully there's nothing too controversial.
patches 10-17: These define the checks performed by the parser.
I expect much discussion :)
patches 18-20: In a final pass over the command checks, I found some issues with
the definitions. They looked painful to rebase in, so I've added
them here.
patches 21-22: These enable the parser by default. It runs on all batches except
those that set the I915_EXEC_SECURE flag in the execbuffer2 call.
There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very
basic and do not test all of the commands used by the parser on the assumption
that I'm likely to make the same mistakes in both the parser and the test.
I've run the i-g-t gem_* tests, the piglit quick tests (w/Mesa git from a few
days ago), and generally used an Ubuntu 13.10 IVB system with the parser
running. Aside from a failure described below, I don't think there are any
regressions. That is, piglit claims some regressions, but from manually running
the tests I think these are false positives. However, I could use help in
getting broader testing, particularly around performance. In general, I see less
than 3% performance impact on HSW, with more like 10% impact for pathological
batch sizes. But we'll certainly want to run relevant benchmarks beyond what
I've done.
At this point there are a couple of known issues and potential improvements.
1) VLV. The parser is currently disabled for VLV. One type of check performed by
the parser is that commands which access memory do so via PPGTT. VLV does not
have PPGTT enabled at this time. I chose to implement the PPGTT checks via
generic bit checking infrastructure in the parser, so they are not easily
disabled for VLV. For now, I'm disabling parsing altogether in the hope that
PPGTT can be enabled for VLV in the near future.
2) Coherency. I've found two types of coherency issues when reading the batch
buffer from the CPU during execbuffer2. Looking for help with both issues.
i. First, the i-g-t test gem_cpu_reloc blits to a batch buffer and the
parser isn't properly waiting for the operation to complete before
parsing. I tried adding i915_gem_object_sync(batch_obj, [ring|NULL])
but that actually caused more failures.
ii. Second, on VLV, I've seen cache coherency issues when userspace writes
the batch via pwrite fast path before calling execbuffer2. The parser
reads stale data. This works fine on IVB and HSW, so I believe it's an
LLC vs. non-LLC issue. I'm just unclear on what the correct flushing or
synchronization is for this scenario.
3) 2nd-level batches. The parser currently allows MI_BATCH_BUFFER_START commands
in userspace batches without parsing them. The media driver uses 2nd-level
batches, so a solution is required. I have some ideas but don't want to delay
the review process for what I have so far. It may be that the 2nd-level
parsing is only needed for VCS and the current code (plus rejecting BBS)
would be sufficient for RCS.
4) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
to avoid GPU modifications to buffers via EUs or commands in the batch, we
should copy the userspace batch buffer to memory that userspace does not
have access to, map it into GGTT, and execute that batch buffer. I have a
sense of how to do this for 1st-level batches, but it would need changes to
tie in with the 2nd-level batch parsing I think, so I've again held off.
Brad Volkin (22):
drm/i915: Add data structures for command parser
drm/i915: Initial command parser table definitions
drm/i915: Hook command parser tables up to rings
drm/i915: Add per-ring command length decode functions
drm/i915: Implement command parsing
drm/i915: Add a HAS_CMD_PARSER getparam
drm/i915: Add support for rejecting commands during parsing
drm/i915: Add support for checking register accesses
drm/i915: Add support for rejecting commands via bitmasks
drm/i915: Reject unsafe commands
drm/i915: Add register whitelists for mesa
drm/i915: Enable register whitelist checks
drm/i915: Enable bit checking for some commands
drm/i915: Enable PPGTT command parser checks
drm/i915: Reject commands that would store to global HWS page
drm/i915: Reject additional commands
drm/i915: Add parser data for perf monitoring GL extensions
drm/i915: Reject MI_ARB_ON_OFF on VECS
drm/i915: Fix length handling for MFX_WAIT
drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
drm/i915: Clean up command parser enable decision
drm/i915: Enable command parsing by default
drivers/gpu/drm/i915/Makefile | 3 +-
drivers/gpu/drm/i915/i915_cmd_parser.c | 712 +++++++++++++++++++++++++++++
drivers/gpu/drm/i915/i915_dma.c | 3 +
drivers/gpu/drm/i915/i915_drv.c | 5 +
drivers/gpu/drm/i915/i915_drv.h | 96 ++++
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 +
drivers/gpu/drm/i915/i915_reg.h | 66 +++
drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +
drivers/gpu/drm/i915/intel_ringbuffer.h | 25 +
include/uapi/drm/i915_drm.h | 1 +
10 files changed, 927 insertions(+), 1 deletion(-)
create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
--
1.8.4.4
More information about the Intel-gfx
mailing list