[Intel-gfx] [PATCH v7 0/3] Dynamic EU configuration of Slice/Sub-slice/EU

Lionel Landwerlin lionel.g.landwerlin at intel.com
Sun Mar 15 16:56:39 UTC 2020


On 15/03/2020 02:12, Francisco Jerez wrote:
> srinivasan.s at intel.com writes:
>
>> From: Srinivasan S <srinivasan.s at intel.com>
>>
>>        drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
>>
>> This patch sets improves GPU power consumption on Linux kernel based OS such as
>> Chromium OS, Ubuntu, etc. Following are the power savings.
>>
>> Power savings on GLK-GT1 Bobba platform running on Chrome OS.
>> -----------------------------------------------|
>> App /KPI                | % Power Benefit (mW) |
>> ------------------------|----------------------|
>> Hangout Call- 20 minute |	1.8%           |
>> Youtube 4K VPB          |       14.13%         |
>> WebGL Aquarium          |       13.76%         |
>> Unity3D                 |       6.78%          |
>> 			|		       |
>> ------------------------|----------------------|
>> Chrome PLT              | BatteryLife Improves |
>> 			| by ~45 minute        |
>> -----------------------------------------------|
>>
>> Power savings on KBL-GT3 running on  Android and Ubuntu (Linux).
>> -----------------------------------------------|
>> App /KPI		| % Power Benefit (mW) |
>>                          |----------------------|
>> 			|  Android |  Ubuntu   |
>> ------------------------|----------|-----------|
>> 3D Mark (Ice storm)     | 2.30%    | N.A.      |
>> TRex On screen          | 2.49%    | 2.97%     |
>> Manhattan On screen     | 3.11%    | 4.90%     |
>> Carchase On Screen	| N.A.     | 5.06%     |
>> AnTuTu 6.1.4            | 3.42%    | N.A.      |
>> SynMark2		| N.A.     | 1.7%      |
>> -----------------------------------------------|
>>
> Did you get any performance (e.g. FPS) measurements from those
> test-cases?  There is quite some potential for this feature to constrain
> the GPU throughput inadvertently, which could lead to an apparent
> reduction in power usage not accompanied by an improvement in energy
> efficiency -- In fact AFAIUI there is some potential for this feature to
> *decrease* the energy efficiency of the system if the GPU would have
> been able to keep all EUs busy at a lower frequency, but the parallelism
> constraint forces it to run at a higher frequency above RPe in order to
> achieve the same throughput, because due to the convexity of the power
> curve of the EU we have:
>
>    P(k * f) > k * P(f)
>
> Where 'k' is the ratio between the EU parallelism without and with SSEU
> control, and f > RPe is the original GPU frequency without SSEU control.
>
> In scenarios like that we *might* seem to be using less power with SSEU
> control if the workload is running longer, but it would end up using
> more energy overall by the time it completes, so it would be good to
> have some performance-per-watt numbers to make sure that's not
> happening.
>
>> We have also observed GPU core residencies improves by 1.035%.
>>
>> Technical Insights of the patch:
>> Current GPU configuration code for i915 does not allow us to change
>> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
>> is created.
>>
>> While particular graphics application is running, if we examine the command
>> requests from user space, we observe that command density is not consistent.
>> It means there is scope to change the graphics configuration dynamically even
>> while context is running actively. This patch series proposes the solution to
>> find the active pending load for all active context at given time and based on
>> that, dynamically perform graphics configuration for each context.
>>
>> We use a hr (high resolution) timer with i915 driver in kernel to get a
>> callback every few milliseconds (this timer value can be configured through
>> debugfs, default is '0' indicating timer is in disabled state i.e. original
>> system without any intervention).In the timer callback, we examine pending
>> commands for a context in the queue, essentially, we intercept them before
>> they are executed by GPU and we update context with required number of EUs.
>>
> Given that the EU configuration update is synchronous with command
> submission, do you really need a timer?  It sounds like it would be less
> CPU overhead to adjust the EU count on demand whenever the counter
> reaches or drops below the threshold instead of polling some CPU-side
> data structure.
>
>> Two questions, how did we arrive at right timer value? and what's the right
>> number of EUs? For the prior one, empirical data to achieve best performance
>> in least power was considered. For the later one, we roughly categorized number
>> of EUs logically based on platform. Now we compare number of pending commands
>> with a particular threshold and then set number of EUs accordingly with update
>> context. That threshold is also based on experiments & findings. If GPU is able
>> to catch up with CPU, typically there are no pending commands, the EU config
>> would remain unchanged there. In case there are more pending commands we
>> reprogram context with higher number of EUs. Please note, here we are changing
>> EUs even while context is running by examining pending commands every 'x'
>> milliseconds.
>>
> I have doubts that the number of requests pending execution is a
> particularly reliable indicator of the optimal number of EUs the
> workload needs enabled, for starters because the execlists submission
> code seems to be able to merge multiple requests into the same port, so
> there might seem to be zero pending commands even if the GPU has a
> backlog of several seconds or minutes worth of work.
>
> But even if you were using an accurate measure of the GPU load, would
> that really be a good indicator of whether the GPU would run more
> efficiently with more or less EUs enabled?  I can think of many
> scenarios where a short-lived GPU request would consume less energy and
> complete faster while running with all EUs enabled (e.g. if it actually
> has enough parallelism to take advantage of all EUs in the system).
> Conversely I can think of some scenarios where a long-running GPU
> request would benefit from SSEU control (e.g. a poorly parallelizable
> but heavy 3D geometry pipeline or GPGPU workload).  The former seems
> more worrying than the latter since it could lead to performance or
> energy efficiency regressions.
>
> IOW it seems to me that the optimal number of EUs enabled is more of a
> function of the internal parallelism constraints of each request rather
> than of the overall GPU load.  You should be able to get some
> understanding of that by e.g. calculating the number of threads loaded
> on the average based on the EU SPM counters, but unfortunately the ones
> you'd need are only available on TGL+ IIRC.  On earlier platforms you
> should be able to achieve the same thing by sampling some FLEXEU
> counters, but you'd likely have to mess with the mux configuration which
> would interfere with OA sampling -- However it sounds like this feature
> may have to be disabled anytime OA is active anyway so that may not be a
> problem after all?


FLEXEU has to be configured on all contexts but does not need the mux 
configuration.

I think this feature would have to be shut off everytime you end using 
OA from userspace though.


-Lionel


>
> Regards,
> Francisco.
>
>> Srinivasan S (3):
>>    drm/i915: Get active pending request for given context
>>    drm/i915: set optimum eu/slice/sub-slice configuration based on load
>>      type
>>    drm/i915: Predictive governor to control slice/subslice/eu
>>
>>   drivers/gpu/drm/i915/Makefile                     |   1 +
>>   drivers/gpu/drm/i915/gem/i915_gem_context.c       |  20 +++++
>>   drivers/gpu/drm/i915/gem/i915_gem_context.h       |   2 +
>>   drivers/gpu/drm/i915/gem/i915_gem_context_types.h |  38 ++++++++
>>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c    |   1 +
>>   drivers/gpu/drm/i915/gt/intel_deu.c               | 104 ++++++++++++++++++++++
>>   drivers/gpu/drm/i915/gt/intel_deu.h               |  31 +++++++
>>   drivers/gpu/drm/i915/gt/intel_lrc.c               |  44 ++++++++-
>>   drivers/gpu/drm/i915/i915_drv.h                   |   6 ++
>>   drivers/gpu/drm/i915/i915_gem.c                   |   4 +
>>   drivers/gpu/drm/i915/i915_params.c                |   4 +
>>   drivers/gpu/drm/i915/i915_params.h                |   1 +
>>   drivers/gpu/drm/i915/intel_device_info.c          |  74 ++++++++++++++-
>>   13 files changed, 325 insertions(+), 5 deletions(-)
>>   create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.c
>>   create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.h
>>
>> -- 
>> 2.7.4
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20200315/d7fb7350/attachment.htm>


More information about the Intel-gfx mailing list