[Intel-gfx] [PATCH v10 0/8] Per context dynamic (sub)slice power-gating

Tue Aug 14 14:45:08 UTC 2018

On 14/08/18 15:40, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Updated series after continuing Lionel's work.
> 
> Userspace for the feature is the media-driver project on GitHub. Please see
> https://github.com/intel/media-driver/pull/271/commits.
> 
> Headline changes:
> 
>   1.
> 
>    No more master allow/disallow sysfs switch. Feature is unconditionally
>    enabled for Gen11 and on other platforms it requires CAP_SYS_ADMIN.
> 
>    *** To be discussed if this is a good idea or not. ***
> 
>   2.
> 
>    Two new patches due a) breaking out the global barrier, and b) fixing one
>    GEM_BUG_ON regarding incorrent kernel context classification by i915_is_ggtt.
> 
> 
> Otherwise please see individial patch change logs.
> 
> Main topic for the cover letter though is addressing the question of dynamic
> slice re-configuration performance impact.
> 
> Introduction into this problem space is that changing the (sub)slice
> configuration has a cost at context switch time in the order of tens of milli-
> seconds. (It varies per Gen and with different slice count transitions.)
> 
> So the question is whether a malicious unprivileged workload can negatively
> impact other clients. To try and answer this question I have extended gem_wsim
> and creating some test workloads. (Note that my testing was done on a Gen9
> system. Overall message could be the same on Gen11 but needs to be verified.)
> 
> First test was a simulated video playback client running in parallel with a
> simulated game of both medium and high complexity (uses around 60% or 90% of the
> render engine respectively, and 7% of the blitter engine). I had two flavours of
> the playback client, one which runs normally and one which requests reduced
> slice configuration. Both workloads are targetting to run at 60fps.
> 
> Second test is the same but against a heavier simulated game workload, the one
> which uses around 90% of the render engine.
> 
> Results are achieved frames per second as observed from the game client:
> 
>                       No player  Normal player   SSEU enabled player
>          Medium game     59.6        59.6               59.6
>           Heavy game     59.7        58.4               58.1
> 
> Here we can see that the medium workload was not affected either by the normal
> or SSEU player, while the heavy workload did see a performance hit. Both with
> the video player running in parallel, and slighlty larger when the player was
> SSEU enabled.
> 
> Second test is running a malicious client (or clients) in parallel to the same
> simulated game workloads. These clients try to trigger many context switches by
> using multiple contexts with dependencies set up so request coalescing is
> defeated as much as possible.
> 
> I tested both with normal and SSEU enabled malicious clients:
> 
>                       DoS client   SSEU DoS client
>          Medium game     59.5           59.6
>           Heavy game     57.8           55.4
> 
> For here we can see a similar picture as with the first test. Medium game client
> is not affected by either DoS client, while the heavy game client is, more so
> with the SSEU enabled attacker.
> 
>>From both tests I think the conclusion is that dynamic SSEU switching does
> increase the magnitude of performance loss, especially with over-subscribed
> engines, due cost being proportional to context switch frequency.
> 
> Likelyhood is that it slightly lowers the utilization level at which this starts
> to happen, but does not introduce a completely new vector of attack - that is -
> where it was possible to DoS a system from an unprivileged client, it still is.
> In both cases (SSEU enabled or not), a malicious client has the option to grind
> the system to a halt, albeit it may need fewer submission threads to do so when
> it is SSEU enabled.

For reference, gem_wsim workloads used to test this (even though the 
number of people familiar with them is quite low):

Medium game workload:

1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
P.2.1
2.BCS.1000.-2.0
2.RCS.2000.-1.1
p.16667

Heavy game workload:

1.RCS.500.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
P.2.1
2.BCS.1000.-2.0
2.RCS.2000.-1.1
p.16667

Normal video player:

1.VCS.5000-10000.0.0
2.RCS.1000-2000.-1.0
P.3.1
3.BCS.1000.-2.0
p.16667

SSEU enabled video player:

S.1.1
S.2.1
1.VCS.5000-10000.0.0
2.RCS.1000-2000.-1.0
P.3.1
3.BCS.1000.-2.0
p.16667

Malicious client:

1.RCS.1.0.0
2.RCS.1.-1.0

SSEU enabled malicious client:

S.2.1
1.RCS.1.0.0
2.RCS.1.-1.0

Regards,

Tvrtko

> 
> Chris Wilson (3):
>    drm/i915: Program RPCS for Broadwell
>    drm/i915: Record the sseu configuration per-context & engine
>    drm/i915: Expose RPCS (SSEU) configuration to userspace
> 
> Lionel Landwerlin (3):
>    drm/i915/perf: simplify configure all context function
>    drm/i915/perf: reuse intel_lrc ctx regs macro
>    drm/i915/perf: lock powergating configuration to default when active
> 
> Tvrtko Ursulin (2):
>    drm/i915: Add global barrier support
>    drm/i915: Explicitly mark Global GTT address spaces
> 
>   drivers/gpu/drm/i915/i915_drv.h         |  56 +++++++
>   drivers/gpu/drm/i915/i915_gem.c         |   2 +
>   drivers/gpu/drm/i915/i915_gem_context.c | 189 +++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context.h |   4 +
>   drivers/gpu/drm/i915/i915_gem_gtt.c     |   2 +
>   drivers/gpu/drm/i915/i915_gem_gtt.h     |   5 +-
>   drivers/gpu/drm/i915/i915_perf.c        |  68 +++++----
>   drivers/gpu/drm/i915/i915_request.c     |  16 ++
>   drivers/gpu/drm/i915/i915_request.h     |  10 ++
>   drivers/gpu/drm/i915/intel_lrc.c        |  87 ++++++++---
>   drivers/gpu/drm/i915/intel_lrc.h        |   3 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>   include/uapi/drm/i915_drm.h             |  43 ++++++
>   13 files changed, 439 insertions(+), 50 deletions(-)
>