<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 07/09/2018 09:26, Tvrtko Ursulin
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:4bfe6648-203f-fbc8-9bcd-7b4e28279d02@linux.intel.com">
<br>
On 06/09/2018 11:36, Lionel Landwerlin wrote:
<br>
<blockquote type="cite">On 06/09/2018 11:22, Chris Wilson wrote:
<br>
<blockquote type="cite">Quoting Lionel Landwerlin (2018-09-06
11:18:01)
<br>
<blockquote type="cite">On 06/09/2018 11:10, Chris Wilson
wrote:
<br>
<blockquote type="cite">Quoting Lionel Landwerlin
(2018-09-06 10:57:47)
<br>
<blockquote type="cite">On 05/09/2018 15:22, Tvrtko
Ursulin wrote:
<br>
<blockquote type="cite">From: Lionel Landwerlin
<a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a>
<br>
<br>
If some of the contexts submitting workloads to the
GPU have been
<br>
configured to shutdown slices/subslices, we might
loose the NOA
<br>
configurations written in the NOA muxes.
<br>
<br>
One possible solution to this problem is to reprogram
the NOA muxes
<br>
when we switch to a new context. We initially tried
this in the
<br>
workaround batchbuffer but some concerns where raised
about the cost
<br>
of reprogramming at every context switch. This
solution is also not
<br>
without consequences from the userspace point of view.
Reprogramming
<br>
of the muxes can only happen once the powergating
configuration has
<br>
changed (which happens after context switch). This
means for a window
<br>
of time during the recording, counters recorded by the
OA unit might
<br>
be invalid. This requires userspace dealing with OA
reports to discard
<br>
the invalid values.
<br>
<br>
Minimizing the reprogramming could be implemented by
tracking of the
<br>
last programmed configuration somewhere in GGTT and
use MI_PREDICATE
<br>
to discard some of the programming commands, but the
command streamer
<br>
would still have to parse all the MI_LRI instructions
in the
<br>
workaround batchbuffer.
<br>
<br>
Another solution, which this change implements, is to
simply disregard
<br>
the user requested configuration for the period of
time when i915/perf
<br>
is active. There is no known issue with this apart
from a performance
<br>
penality for some media workloads that benefit from
running on a
<br>
partially powergated GPU. We already prevent RC6 from
affecting the
<br>
programming so it doesn't sound completely
unreasonable to hold on
<br>
powergating for the same reason.
<br>
<br>
v2: Leave RPCS programming in intel_lrc.c (Lionel)
<br>
<br>
v3: Update for s/union intel_sseu/struct intel_sseu/
(Lionel)
<br>
More to_intel_context() (Tvrtko)
<br>
s/dev_priv/i915/ (Tvrtko)
<br>
<br>
Tvrtko Ursulin:
<br>
<br>
v4:
<br>
* Rebase for make_rpcs changes.
<br>
<br>
v5:
<br>
* Apply OA restriction from make_rpcs directly.
<br>
<br>
v6:
<br>
* Rebase for context image setup changes.
<br>
<br>
Signed-off-by: Lionel Landwerlin
<a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a>
<br>
Signed-off-by: Tvrtko Ursulin
<a class="moz-txt-link-rfc2396E" href="mailto:tvrtko.ursulin@intel.com"><tvrtko.ursulin@intel.com></a>
<br>
---
<br>
drivers/gpu/drm/i915/i915_perf.c | 5 +++++
<br>
drivers/gpu/drm/i915/intel_lrc.c | 30
++++++++++++++++++++----------
<br>
drivers/gpu/drm/i915/intel_lrc.h | 3 +++
<br>
3 files changed, 28 insertions(+), 10 deletions(-)
<br>
<br>
diff --git a/drivers/gpu/drm/i915/i915_perf.c
b/drivers/gpu/drm/i915/i915_perf.c
<br>
index ccb20230df2c..dd65b72bddd4 100644
<br>
--- a/drivers/gpu/drm/i915/i915_perf.c
<br>
+++ b/drivers/gpu/drm/i915/i915_perf.c
<br>
@@ -1677,6 +1677,11 @@ static void
gen8_update_reg_state_unlocked(struct i915_gem_context
*ctx,
<br>
CTX_REG(reg_state, state_offset,
flex_regs[i], value);
<br>
}
<br>
+
<br>
+ CTX_REG(reg_state, CTX_R_PWR_CLK_STATE,
GEN8_R_PWR_CLK_STATE,
<br>
+ gen8_make_rpcs(dev_priv,
<br>
+
&to_intel_context(ctx,
<br>
+
dev_priv->engine[RCS])->sseu));
<br>
</blockquote>
I think there is one issue I missed on the previous
iterations of this
<br>
patch.
<br>
<br>
This gen8_update_reg_state_unlocked() is called when the
GPU is parked
<br>
on the kernel context.
<br>
<br>
It's supposed to update all contexts, but I think we
might not be able
<br>
to update the kernel context image while the GPU is
using it.
<br>
</blockquote>
The kernel context is only ever taken in extremis (you are
either
<br>
parking or stalling userspace) so I don't care.
<br>
</blockquote>
<br>
The patch exposing the RPCS configuration to userspace will
make use of
<br>
the kernel context while OA/perf is enabled. Even if it
reprograms the
<br>
locked value that will break the power configuration
stability on Gen11
<br>
(because the locked configuration will be different from the
kernel
<br>
context configuration).
<br>
</blockquote>
Sure, but as you point out that's only on changing
configuration.
<br>
<br>
What's missing in the patch is that we only bail early if the
new sseu
<br>
matches the ce->sseu, but that doesn't necessarily match
whats in the
<br>
context due to OA. (Or maybe I missed the conversion to rpcs
value and
<br>
checking.)
<br>
-Chris
<br>
<br>
</blockquote>
<br>
Yep, because the gen8_make_rpcs() post processes the values
store at the gem context level, we risk rerunning the kernel
context to write the exiting value.
<br>
Sorry this is all so messy :(
<br>
</blockquote>
<br>
Lets see if I managed to follow here.
<br>
<br>
The current code indeed bails out at the set ctx param level if
the requested state matches the ce->state. My thinking was that
ce->state is the master state and whatever happens in "post
processing" via gen8_make_rpcs should be hidden from it since the
design is that the i915_perf.c will re-configure all contexts when
the OA active status changes (to either direction).
<br>
<br>
So I don't see a problem in those two interactions.
<br>
</blockquote>
<p><br>
</p>
<p>Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff)
for ICL.<br>
</p>
<p>You then enable OA which locks the configuration at (0x1,0xf).</p>
<p>The kernel context has retained its (0x1/0xff) configuration.<br>
</p>
<p><br>
</p>
<p>And after you change the config of contextA to (0x1,0x7).</p>
<p><br>
</p>
<p>This would lead to the kernel context scheduled with (0x1,0xff)
while OA is active.<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:4bfe6648-203f-fbc8-9bcd-7b4e28279d02@linux.intel.com">
<br>
Apart from one, get_param_sseu will lie a bit - we can discuss
about this one more. At one point I suggested we have two sets of
masks in the uAPI, requested and active in a way. So userspace
could query what it set and what is actually active.
<br>
<br>
Now second issue is if i915_perf.c is able to reprogram the kernel
config.
<br>
<br>
Here its true, it will write to the context image and that will
get overwritten by context save.
<br>
<br>
If that is a problem for OA, I was initially if a throw-away
second "kernel" context could be use to re-program the real one,
but perhaps even simpler - what about a mmio write to program the
RPCS while kernel context is active?
<br>
</blockquote>
<p><br>
</p>
<p>Documentation says : "<span style="color: rgb(35, 35, 35);
font-family: Arial, sans-serif; font-size: 13.3333px;
font-style: normal; font-variant-ligatures: normal;
font-variant-caps: normal; font-weight: 400; letter-spacing:
normal; orphans: 2; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration-style:
initial; text-decoration-color: initial; display: inline
!important; float: none;">This register must not be programmed
directly through CPU MMIO cycle.<span> "<br>
</span></span></p>
<p><br>
</p>
<p>Sorry :(</p>
<p><br>
</p>
<p>-</p>
<p>Lionel<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:4bfe6648-203f-fbc8-9bcd-7b4e28279d02@linux.intel.com">
<br>
Regards,
<br>
<br>
Tvrtko
<br>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>