<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 22/08/2018 18:07, Tvrtko Ursulin
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:09d263ed-bc18-2925-f3a8-9875ed584895@linux.intel.com">
      <br>
      On 22/08/2018 17:33, Lionel Landwerlin wrote:
      <br>
      <blockquote type="cite">On 22/08/2018 17:18, Tvrtko Ursulin wrote:
        <br>
        <blockquote type="cite">From: Tvrtko Ursulin
          <a class="moz-txt-link-rfc2396E" href="mailto:tvrtko.ursulin@intel.com"><tvrtko.ursulin@intel.com></a>
          <br>
          <br>
          Bitfield width for configuring the active slice count has
          grown in Gen11
          <br>
          so we need to program the GEN8_R_PWR_CLK_STATE accordingly.
          <br>
          <br>
          Current code was always requesting eight times the number of
          slices (due
          <br>
          writting to a bitfield starting three bits higher than it
          should). These
          <br>
          requests were luckily a) capped by the hardware to the
          available number of
          <br>
          slices, and b) we haven't yet exported the code to ask for
          reduced slice
          <br>
          configurations.
          <br>
          <br>
          Due both of the above there was no impact from this incorrect
          programming
          <br>
          but we should still fix it.
          <br>
          <br>
          Signed-off-by: Tvrtko Ursulin <a class="moz-txt-link-rfc2396E" href="mailto:tvrtko.ursulin@intel.com"><tvrtko.ursulin@intel.com></a>
          <br>
          Bspec: 12247
          <br>
          Reported-by: <a class="moz-txt-link-abbreviated" href="mailto:tony.ye@intel.com">tony.ye@intel.com</a>
          <br>
          Suggested-by: Lionel Landwerlin
          <a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a>
          <br>
          Cc: Lionel Landwerlin <a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a>
          <br>
          Cc: <a class="moz-txt-link-abbreviated" href="mailto:tony.ye@intel.com">tony.ye@intel.com</a>
          <br>
          ---
          <br>
            drivers/gpu/drm/i915/i915_reg.h  |  2 ++
          <br>
            drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++----
          <br>
            2 files changed, 10 insertions(+), 4 deletions(-)
          <br>
          <br>
          diff --git a/drivers/gpu/drm/i915/i915_reg.h
          b/drivers/gpu/drm/i915/i915_reg.h
          <br>
          index 59d06d0055bb..640f7b774a26 100644
          <br>
          --- a/drivers/gpu/drm/i915/i915_reg.h
          <br>
          +++ b/drivers/gpu/drm/i915/i915_reg.h
          <br>
          @@ -344,6 +344,8 @@ static inline bool
          i915_mmio_reg_valid(i915_reg_t reg)
          <br>
            #define   GEN8_RPCS_S_CNT_ENABLE    (1 << 18)
          <br>
            #define   GEN8_RPCS_S_CNT_SHIFT        15
          <br>
            #define   GEN8_RPCS_S_CNT_MASK        (0x7 <<
          GEN8_RPCS_S_CNT_SHIFT)
          <br>
          +#define   GEN11_RPCS_S_CNT_SHIFT    12
          <br>
          +#define   GEN11_RPCS_S_CNT_MASK        (0x3f <<
          GEN11_RPCS_S_CNT_SHIFT)
          <br>
            #define   GEN8_RPCS_SS_CNT_ENABLE    (1 << 11)
          <br>
            #define   GEN8_RPCS_SS_CNT_SHIFT    8
          <br>
            #define   GEN8_RPCS_SS_CNT_MASK        (0x7 <<
          GEN8_RPCS_SS_CNT_SHIFT)
          <br>
          diff --git a/drivers/gpu/drm/i915/intel_lrc.c
          b/drivers/gpu/drm/i915/intel_lrc.c
          <br>
          index 36050f085071..43b8b0675ba0 100644
          <br>
          --- a/drivers/gpu/drm/i915/intel_lrc.c
          <br>
          +++ b/drivers/gpu/drm/i915/intel_lrc.c
          <br>
          @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private
          *dev_priv)
          <br>
                 * enablement.
          <br>
                */
          <br>
                if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
          <br>
          -        rpcs |= GEN8_RPCS_S_CNT_ENABLE;
          <br>
          -        rpcs |=
          hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
          <br>
          -            GEN8_RPCS_S_CNT_SHIFT;
          <br>
          -        rpcs |= GEN8_RPCS_ENABLE;
          <br>
          +        rpcs =
          hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
          <br>
          +
          <br>
          +        if (INTEL_GEN(dev_priv) >= 11)
          <br>
          +            rpcs <<= GEN11_RPCS_S_CNT_SHIFT;
          <br>
          +        else
          <br>
          +            rpcs <<= GEN8_RPCS_S_CNT_SHIFT;
          <br>
          +
          <br>
          +        rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE;
          <br>
        </blockquote>
        <br>
        <br>
        I don't know if you saw that wording in the documentation :
        <br>
        <br>
          "
        <br>
        <br>
        Note: In ICL, software programs this register as if GT consists
        of 2 slices with 4 subslices in each slice. Hardware maps this
        to the 1 slice/8-subslice physical layout.
        <br>
        <br>
        "
        <br>
        <br>
        <br>
        My understanding is that it would make this function a bit more
        complicated ;)
        <br>
        <br>
        It also implies that on ICL you cannot select 3 subslices, which
        is unfortunately what Tony was trying to do.
        <br>
        <br>
        Maybe some opens need to be raised as to what's possible on ICL.
        <br>
      </blockquote>
      <br>
      I interpreted the note in my head as "In ICL, _if_ _the_ software
      programs.." so did not see a problem. Thought that would be just
      some hidden/under the covers remapping hw would do. But I see now
      that was wrong, and you are most likely right. I'll try to do some
      digging to understand this better.
      <br>
      <br>
      But for the second part of it, I don't see why 1x3 configuration
      would be illegal. If software must assume hw is 2x4, even if in
      reality it is 1x8, then 1x3 would still be legal, no?
      <br>
    </blockquote>
    <p><br>
    </p>
    <p>We still seem to put a subslice number in the register for ICL
      (values being 0b001, 0b010, 0b011 & 0b100).</p>
    <p>My understanding is that the hardware will just multiply that
      value by 2 to map to the 1x8 underlying topology.</p>
    <p>So if that's the case, you can't really do odd numbers... <span
        style="color: rgb(0, 0, 0); font-family: "Helvetica
        Neue", Helvetica, Arial, sans-serif; font-size: 16px;
        font-style: normal; font-variant-ligatures: normal;
        font-variant-caps: normal; font-weight: 400; letter-spacing:
        normal; orphans: 2; text-align: start; text-indent: 0px;
        text-transform: none; white-space: normal; widows: 2;
        word-spacing: 0px; -webkit-text-stroke-width: 0px;
        text-decoration-style: initial; text-decoration-color: initial;
        display: inline !important; float: none;">¯\_(ツ)_/¯</span></p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:09d263ed-bc18-2925-f3a8-9875ed584895@linux.intel.com">
      <br>
      I thought the cause of the hang on ICL was that when Tony was
      asking for 1x3, the code was actually programming a request for
      8x3 - which is illegal (as in slice count must be 1 to enable
      subslice pg) and so would fail to actually turn of the unwanted
      subslices.
      <br>
      <br>
      But then I also though on ICL we deal with masks and not counts
      when programming the hardware. Since apparently it is counts both
      for slices and subslices, I am mightily confused as to how
      media-driver would even theoretically be able to turn off a
      _specific_ (sub)slice?!
      <br>
    </blockquote>
    <p><br>
    </p>
    <p>The fact that the feature needed isn't implemented at by the
      thread dispatcher is really strange to me too.</p>
    <p>It sounds like we're forced to use a bigger hammer than what we
      really need.</p>
    <p>As to how that maps to the right subslices is also unknown to me.</p>
    <p>The only explanation I have is that subslices with no VME
      samplers get turn off first in the list of subslices to turn off.</p>
    <p><br>
    </p>
    <p>Cheers,</p>
    <p><br>
    </p>
    <p>-</p>
    <p>Lionel<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:09d263ed-bc18-2925-f3a8-9875ed584895@linux.intel.com">
      <br>
      Regards,
      <br>
      <br>
      Tvrtko
      <br>
      <br>
      <blockquote type="cite">
        <br>
        <blockquote type="cite">      }
          <br>
                if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
          <br>
        </blockquote>
        <br>
        <br>
        _______________________________________________
        <br>
        Intel-gfx mailing list
        <br>
        <a class="moz-txt-link-abbreviated" href="mailto:Intel-gfx@lists.freedesktop.org">Intel-gfx@lists.freedesktop.org</a>
        <br>
        <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/intel-gfx">https://lists.freedesktop.org/mailman/listinfo/intel-gfx</a>
        <br>
      </blockquote>
      <br>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>