<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 07/09/2018 09:26, Tvrtko Ursulin
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:4bfe6648-203f-fbc8-9bcd-7b4e28279d02@linux.intel.com">
      <br>
      On 06/09/2018 11:36, Lionel Landwerlin wrote:
      <br>
      <blockquote type="cite">On 06/09/2018 11:22, Chris Wilson wrote:
        <br>
        <blockquote type="cite">Quoting Lionel Landwerlin (2018-09-06
          11:18:01)
          <br>
          <blockquote type="cite">On 06/09/2018 11:10, Chris Wilson
            wrote:
            <br>
            <blockquote type="cite">Quoting Lionel Landwerlin
              (2018-09-06 10:57:47)
              <br>
              <blockquote type="cite">On 05/09/2018 15:22, Tvrtko
                Ursulin wrote:
                <br>
                <blockquote type="cite">From: Lionel Landwerlin
                  <a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a>
                  <br>
                  <br>
                  If some of the contexts submitting workloads to the
                  GPU have been
                  <br>
                  configured to shutdown slices/subslices, we might
                  loose the NOA
                  <br>
                  configurations written in the NOA muxes.
                  <br>
                  <br>
                  One possible solution to this problem is to reprogram
                  the NOA muxes
                  <br>
                  when we switch to a new context. We initially tried
                  this in the
                  <br>
                  workaround batchbuffer but some concerns where raised
                  about the cost
                  <br>
                  of reprogramming at every context switch. This
                  solution is also not
                  <br>
                  without consequences from the userspace point of view.
                  Reprogramming
                  <br>
                  of the muxes can only happen once the powergating
                  configuration has
                  <br>
                  changed (which happens after context switch). This
                  means for a window
                  <br>
                  of time during the recording, counters recorded by the
                  OA unit might
                  <br>
                  be invalid. This requires userspace dealing with OA
                  reports to discard
                  <br>
                  the invalid values.
                  <br>
                  <br>
                  Minimizing the reprogramming could be implemented by
                  tracking of the
                  <br>
                  last programmed configuration somewhere in GGTT and
                  use MI_PREDICATE
                  <br>
                  to discard some of the programming commands, but the
                  command streamer
                  <br>
                  would still have to parse all the MI_LRI instructions
                  in the
                  <br>
                  workaround batchbuffer.
                  <br>
                  <br>
                  Another solution, which this change implements, is to
                  simply disregard
                  <br>
                  the user requested configuration for the period of
                  time when i915/perf
                  <br>
                  is active. There is no known issue with this apart
                  from a performance
                  <br>
                  penality for some media workloads that benefit from
                  running on a
                  <br>
                  partially powergated GPU. We already prevent RC6 from
                  affecting the
                  <br>
                  programming so it doesn't sound completely
                  unreasonable to hold on
                  <br>
                  powergating for the same reason.
                  <br>
                  <br>
                  v2: Leave RPCS programming in intel_lrc.c (Lionel)
                  <br>
                  <br>
                  v3: Update for s/union intel_sseu/struct intel_sseu/
                  (Lionel)
                  <br>
                         More to_intel_context() (Tvrtko)
                  <br>
                         s/dev_priv/i915/ (Tvrtko)
                  <br>
                  <br>
                  Tvrtko Ursulin:
                  <br>
                  <br>
                  v4:
                  <br>
                      * Rebase for make_rpcs changes.
                  <br>
                  <br>
                  v5:
                  <br>
                      * Apply OA restriction from make_rpcs directly.
                  <br>
                  <br>
                  v6:
                  <br>
                      * Rebase for context image setup changes.
                  <br>
                  <br>
                  Signed-off-by: Lionel Landwerlin
                  <a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a>
                  <br>
                  Signed-off-by: Tvrtko Ursulin
                  <a class="moz-txt-link-rfc2396E" href="mailto:tvrtko.ursulin@intel.com"><tvrtko.ursulin@intel.com></a>
                  <br>
                  ---
                  <br>
                      drivers/gpu/drm/i915/i915_perf.c |  5 +++++
                  <br>
                      drivers/gpu/drm/i915/intel_lrc.c | 30
                  ++++++++++++++++++++----------
                  <br>
                      drivers/gpu/drm/i915/intel_lrc.h |  3 +++
                  <br>
                      3 files changed, 28 insertions(+), 10 deletions(-)
                  <br>
                  <br>
                  diff --git a/drivers/gpu/drm/i915/i915_perf.c
                  b/drivers/gpu/drm/i915/i915_perf.c
                  <br>
                  index ccb20230df2c..dd65b72bddd4 100644
                  <br>
                  --- a/drivers/gpu/drm/i915/i915_perf.c
                  <br>
                  +++ b/drivers/gpu/drm/i915/i915_perf.c
                  <br>
                  @@ -1677,6 +1677,11 @@ static void
                  gen8_update_reg_state_unlocked(struct i915_gem_context
                  *ctx,
                  <br>
                                  CTX_REG(reg_state, state_offset,
                  flex_regs[i], value);
                  <br>
                          }
                  <br>
                  +
                  <br>
                  +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE,
                  GEN8_R_PWR_CLK_STATE,
                  <br>
                  +             gen8_make_rpcs(dev_priv,
                  <br>
                  +                           
                  &to_intel_context(ctx,
                  <br>
                  +                                             
                  dev_priv->engine[RCS])->sseu));
                  <br>
                </blockquote>
                I think there is one issue I missed on the previous
                iterations of this
                <br>
                patch.
                <br>
                <br>
                This gen8_update_reg_state_unlocked() is called when the
                GPU is parked
                <br>
                on the kernel context.
                <br>
                <br>
                It's supposed to update all contexts, but I think we
                might not be able
                <br>
                to update the kernel context image while the GPU is
                using it.
                <br>
              </blockquote>
              The kernel context is only ever taken in extremis (you are
              either
              <br>
              parking or stalling userspace) so I don't care.
              <br>
            </blockquote>
            <br>
            The patch exposing the RPCS configuration to userspace will
            make use of
            <br>
            the kernel context while OA/perf is enabled. Even if it
            reprograms the
            <br>
            locked value that will break the power configuration
            stability on Gen11
            <br>
            (because the locked configuration will be different from the
            kernel
            <br>
            context configuration).
            <br>
          </blockquote>
          Sure, but as you point out that's only on changing
          configuration.
          <br>
          <br>
          What's missing in the patch is that we only bail early if the
          new sseu
          <br>
          matches the ce->sseu, but that doesn't necessarily match
          whats in the
          <br>
          context due to OA. (Or maybe I missed the conversion to rpcs
          value and
          <br>
          checking.)
          <br>
          -Chris
          <br>
          <br>
        </blockquote>
        <br>
        Yep, because the gen8_make_rpcs() post processes the values
        store at the gem context level, we risk rerunning the kernel
        context to write the exiting value.
        <br>
        Sorry this is all so messy :(
        <br>
      </blockquote>
      <br>
      Lets see if I managed to follow here.
      <br>
      <br>
      The current code indeed bails out at the set ctx param level if
      the requested state matches the ce->state. My thinking was that
      ce->state is the master state and whatever happens in "post
      processing" via gen8_make_rpcs should be hidden from it since the
      design is that the i915_perf.c will re-configure all contexts when
      the OA active status changes (to either direction).
      <br>
      <br>
      So I don't see a problem in those two interactions.
      <br>
    </blockquote>
    <p><br>
    </p>
    <p>Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff)
      for ICL.<br>
    </p>
    <p>You then enable OA which locks the configuration at (0x1,0xf).</p>
    <p>The kernel context has retained its (0x1/0xff) configuration.<br>
    </p>
    <p><br>
    </p>
    <p>And after you change the config of contextA to (0x1,0x7).</p>
    <p><br>
    </p>
    <p>This would lead to the kernel context scheduled with (0x1,0xff)
      while OA is active.<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:4bfe6648-203f-fbc8-9bcd-7b4e28279d02@linux.intel.com">
      <br>
      Apart from one, get_param_sseu will lie a bit - we can discuss
      about this one more. At one point I suggested we have two sets of
      masks in the uAPI, requested and active in a way. So userspace
      could query what it set and what is actually active.
      <br>
      <br>
      Now second issue is if i915_perf.c is able to reprogram the kernel
      config.
      <br>
      <br>
      Here its true, it will write to the context image and that will
      get overwritten by context save.
      <br>
      <br>
      If that is a problem for OA, I was initially if a throw-away
      second "kernel" context could be use to re-program the real one,
      but perhaps even simpler - what about a mmio write to program the
      RPCS while kernel context is active?
      <br>
    </blockquote>
    <p><br>
    </p>
    <p>Documentation says : "<span style="color: rgb(35, 35, 35);
        font-family: Arial, sans-serif; font-size: 13.3333px;
        font-style: normal; font-variant-ligatures: normal;
        font-variant-caps: normal; font-weight: 400; letter-spacing:
        normal; orphans: 2; text-align: start; text-indent: 0px;
        text-transform: none; white-space: normal; widows: 2;
        word-spacing: 0px; -webkit-text-stroke-width: 0px;
        background-color: rgb(255, 255, 255); text-decoration-style:
        initial; text-decoration-color: initial; display: inline
        !important; float: none;">This register must not be programmed
        directly through CPU MMIO cycle.<span> "<br>
        </span></span></p>
    <p><br>
    </p>
    <p>Sorry :(</p>
    <p><br>
    </p>
    <p>-</p>
    <p>Lionel<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:4bfe6648-203f-fbc8-9bcd-7b4e28279d02@linux.intel.com">
      <br>
      Regards,
      <br>
      <br>
      Tvrtko
      <br>
      <br>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>