[Intel-gfx] [PATCH v10 0/8] Per context dynamic (sub)slice power-gating
Tvrtko Ursulin
tursulin at ursulin.net
Tue Aug 14 14:45:08 UTC 2018
On 14/08/18 15:40, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> Updated series after continuing Lionel's work.
>
> Userspace for the feature is the media-driver project on GitHub. Please see
> https://github.com/intel/media-driver/pull/271/commits.
>
> Headline changes:
>
> 1.
>
> No more master allow/disallow sysfs switch. Feature is unconditionally
> enabled for Gen11 and on other platforms it requires CAP_SYS_ADMIN.
>
> *** To be discussed if this is a good idea or not. ***
>
> 2.
>
> Two new patches due a) breaking out the global barrier, and b) fixing one
> GEM_BUG_ON regarding incorrent kernel context classification by i915_is_ggtt.
>
>
> Otherwise please see individial patch change logs.
>
> Main topic for the cover letter though is addressing the question of dynamic
> slice re-configuration performance impact.
>
> Introduction into this problem space is that changing the (sub)slice
> configuration has a cost at context switch time in the order of tens of milli-
> seconds. (It varies per Gen and with different slice count transitions.)
>
> So the question is whether a malicious unprivileged workload can negatively
> impact other clients. To try and answer this question I have extended gem_wsim
> and creating some test workloads. (Note that my testing was done on a Gen9
> system. Overall message could be the same on Gen11 but needs to be verified.)
>
> First test was a simulated video playback client running in parallel with a
> simulated game of both medium and high complexity (uses around 60% or 90% of the
> render engine respectively, and 7% of the blitter engine). I had two flavours of
> the playback client, one which runs normally and one which requests reduced
> slice configuration. Both workloads are targetting to run at 60fps.
>
> Second test is the same but against a heavier simulated game workload, the one
> which uses around 90% of the render engine.
>
> Results are achieved frames per second as observed from the game client:
>
> No player Normal player SSEU enabled player
> Medium game 59.6 59.6 59.6
> Heavy game 59.7 58.4 58.1
>
> Here we can see that the medium workload was not affected either by the normal
> or SSEU player, while the heavy workload did see a performance hit. Both with
> the video player running in parallel, and slighlty larger when the player was
> SSEU enabled.
>
> Second test is running a malicious client (or clients) in parallel to the same
> simulated game workloads. These clients try to trigger many context switches by
> using multiple contexts with dependencies set up so request coalescing is
> defeated as much as possible.
>
> I tested both with normal and SSEU enabled malicious clients:
>
> DoS client SSEU DoS client
> Medium game 59.5 59.6
> Heavy game 57.8 55.4
>
> For here we can see a similar picture as with the first test. Medium game client
> is not affected by either DoS client, while the heavy game client is, more so
> with the SSEU enabled attacker.
>
>>From both tests I think the conclusion is that dynamic SSEU switching does
> increase the magnitude of performance loss, especially with over-subscribed
> engines, due cost being proportional to context switch frequency.
>
> Likelyhood is that it slightly lowers the utilization level at which this starts
> to happen, but does not introduce a completely new vector of attack - that is -
> where it was possible to DoS a system from an unprivileged client, it still is.
> In both cases (SSEU enabled or not), a malicious client has the option to grind
> the system to a halt, albeit it may need fewer submission threads to do so when
> it is SSEU enabled.
For reference, gem_wsim workloads used to test this (even though the
number of people familiar with them is quite low):
Medium game workload:
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
P.2.1
2.BCS.1000.-2.0
2.RCS.2000.-1.1
p.16667
Heavy game workload:
1.RCS.500.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
P.2.1
2.BCS.1000.-2.0
2.RCS.2000.-1.1
p.16667
Normal video player:
1.VCS.5000-10000.0.0
2.RCS.1000-2000.-1.0
P.3.1
3.BCS.1000.-2.0
p.16667
SSEU enabled video player:
S.1.1
S.2.1
1.VCS.5000-10000.0.0
2.RCS.1000-2000.-1.0
P.3.1
3.BCS.1000.-2.0
p.16667
Malicious client:
1.RCS.1.0.0
2.RCS.1.-1.0
SSEU enabled malicious client:
S.2.1
1.RCS.1.0.0
2.RCS.1.-1.0
Regards,
Tvrtko
>
> Chris Wilson (3):
> drm/i915: Program RPCS for Broadwell
> drm/i915: Record the sseu configuration per-context & engine
> drm/i915: Expose RPCS (SSEU) configuration to userspace
>
> Lionel Landwerlin (3):
> drm/i915/perf: simplify configure all context function
> drm/i915/perf: reuse intel_lrc ctx regs macro
> drm/i915/perf: lock powergating configuration to default when active
>
> Tvrtko Ursulin (2):
> drm/i915: Add global barrier support
> drm/i915: Explicitly mark Global GTT address spaces
>
> drivers/gpu/drm/i915/i915_drv.h | 56 +++++++
> drivers/gpu/drm/i915/i915_gem.c | 2 +
> drivers/gpu/drm/i915/i915_gem_context.c | 189 +++++++++++++++++++++++-
> drivers/gpu/drm/i915/i915_gem_context.h | 4 +
> drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +
> drivers/gpu/drm/i915/i915_gem_gtt.h | 5 +-
> drivers/gpu/drm/i915/i915_perf.c | 68 +++++----
> drivers/gpu/drm/i915/i915_request.c | 16 ++
> drivers/gpu/drm/i915/i915_request.h | 10 ++
> drivers/gpu/drm/i915/intel_lrc.c | 87 ++++++++---
> drivers/gpu/drm/i915/intel_lrc.h | 3 +
> drivers/gpu/drm/i915/intel_ringbuffer.h | 4 +
> include/uapi/drm/i915_drm.h | 43 ++++++
> 13 files changed, 439 insertions(+), 50 deletions(-)
>
More information about the Intel-gfx
mailing list