<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 22/08/2018 18:07, Tvrtko Ursulin
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:09d263ed-bc18-2925-f3a8-9875ed584895@linux.intel.com">
<br>
On 22/08/2018 17:33, Lionel Landwerlin wrote:
<br>
<blockquote type="cite">On 22/08/2018 17:18, Tvrtko Ursulin wrote:
<br>
<blockquote type="cite">From: Tvrtko Ursulin
<a class="moz-txt-link-rfc2396E" href="mailto:tvrtko.ursulin@intel.com"><tvrtko.ursulin@intel.com></a>
<br>
<br>
Bitfield width for configuring the active slice count has
grown in Gen11
<br>
so we need to program the GEN8_R_PWR_CLK_STATE accordingly.
<br>
<br>
Current code was always requesting eight times the number of
slices (due
<br>
writting to a bitfield starting three bits higher than it
should). These
<br>
requests were luckily a) capped by the hardware to the
available number of
<br>
slices, and b) we haven't yet exported the code to ask for
reduced slice
<br>
configurations.
<br>
<br>
Due both of the above there was no impact from this incorrect
programming
<br>
but we should still fix it.
<br>
<br>
Signed-off-by: Tvrtko Ursulin <a class="moz-txt-link-rfc2396E" href="mailto:tvrtko.ursulin@intel.com"><tvrtko.ursulin@intel.com></a>
<br>
Bspec: 12247
<br>
Reported-by: <a class="moz-txt-link-abbreviated" href="mailto:tony.ye@intel.com">tony.ye@intel.com</a>
<br>
Suggested-by: Lionel Landwerlin
<a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a>
<br>
Cc: Lionel Landwerlin <a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a>
<br>
Cc: <a class="moz-txt-link-abbreviated" href="mailto:tony.ye@intel.com">tony.ye@intel.com</a>
<br>
---
<br>
drivers/gpu/drm/i915/i915_reg.h | 2 ++
<br>
drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++----
<br>
2 files changed, 10 insertions(+), 4 deletions(-)
<br>
<br>
diff --git a/drivers/gpu/drm/i915/i915_reg.h
b/drivers/gpu/drm/i915/i915_reg.h
<br>
index 59d06d0055bb..640f7b774a26 100644
<br>
--- a/drivers/gpu/drm/i915/i915_reg.h
<br>
+++ b/drivers/gpu/drm/i915/i915_reg.h
<br>
@@ -344,6 +344,8 @@ static inline bool
i915_mmio_reg_valid(i915_reg_t reg)
<br>
#define GEN8_RPCS_S_CNT_ENABLE (1 << 18)
<br>
#define GEN8_RPCS_S_CNT_SHIFT 15
<br>
#define GEN8_RPCS_S_CNT_MASK (0x7 <<
GEN8_RPCS_S_CNT_SHIFT)
<br>
+#define GEN11_RPCS_S_CNT_SHIFT 12
<br>
+#define GEN11_RPCS_S_CNT_MASK (0x3f <<
GEN11_RPCS_S_CNT_SHIFT)
<br>
#define GEN8_RPCS_SS_CNT_ENABLE (1 << 11)
<br>
#define GEN8_RPCS_SS_CNT_SHIFT 8
<br>
#define GEN8_RPCS_SS_CNT_MASK (0x7 <<
GEN8_RPCS_SS_CNT_SHIFT)
<br>
diff --git a/drivers/gpu/drm/i915/intel_lrc.c
b/drivers/gpu/drm/i915/intel_lrc.c
<br>
index 36050f085071..43b8b0675ba0 100644
<br>
--- a/drivers/gpu/drm/i915/intel_lrc.c
<br>
+++ b/drivers/gpu/drm/i915/intel_lrc.c
<br>
@@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private
*dev_priv)
<br>
* enablement.
<br>
*/
<br>
if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
<br>
- rpcs |= GEN8_RPCS_S_CNT_ENABLE;
<br>
- rpcs |=
hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
<br>
- GEN8_RPCS_S_CNT_SHIFT;
<br>
- rpcs |= GEN8_RPCS_ENABLE;
<br>
+ rpcs =
hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
<br>
+
<br>
+ if (INTEL_GEN(dev_priv) >= 11)
<br>
+ rpcs <<= GEN11_RPCS_S_CNT_SHIFT;
<br>
+ else
<br>
+ rpcs <<= GEN8_RPCS_S_CNT_SHIFT;
<br>
+
<br>
+ rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE;
<br>
</blockquote>
<br>
<br>
I don't know if you saw that wording in the documentation :
<br>
<br>
"
<br>
<br>
Note: In ICL, software programs this register as if GT consists
of 2 slices with 4 subslices in each slice. Hardware maps this
to the 1 slice/8-subslice physical layout.
<br>
<br>
"
<br>
<br>
<br>
My understanding is that it would make this function a bit more
complicated ;)
<br>
<br>
It also implies that on ICL you cannot select 3 subslices, which
is unfortunately what Tony was trying to do.
<br>
<br>
Maybe some opens need to be raised as to what's possible on ICL.
<br>
</blockquote>
<br>
I interpreted the note in my head as "In ICL, _if_ _the_ software
programs.." so did not see a problem. Thought that would be just
some hidden/under the covers remapping hw would do. But I see now
that was wrong, and you are most likely right. I'll try to do some
digging to understand this better.
<br>
<br>
But for the second part of it, I don't see why 1x3 configuration
would be illegal. If software must assume hw is 2x4, even if in
reality it is 1x8, then 1x3 would still be legal, no?
<br>
</blockquote>
<p><br>
</p>
<p>We still seem to put a subslice number in the register for ICL
(values being 0b001, 0b010, 0b011 & 0b100).</p>
<p>My understanding is that the hardware will just multiply that
value by 2 to map to the 1x8 underlying topology.</p>
<p>So if that's the case, you can't really do odd numbers... <span
style="color: rgb(0, 0, 0); font-family: "Helvetica
Neue", Helvetica, Arial, sans-serif; font-size: 16px;
font-style: normal; font-variant-ligatures: normal;
font-variant-caps: normal; font-weight: 400; letter-spacing:
normal; orphans: 2; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration-style: initial; text-decoration-color: initial;
display: inline !important; float: none;">¯\_(ツ)_/¯</span></p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:09d263ed-bc18-2925-f3a8-9875ed584895@linux.intel.com">
<br>
I thought the cause of the hang on ICL was that when Tony was
asking for 1x3, the code was actually programming a request for
8x3 - which is illegal (as in slice count must be 1 to enable
subslice pg) and so would fail to actually turn of the unwanted
subslices.
<br>
<br>
But then I also though on ICL we deal with masks and not counts
when programming the hardware. Since apparently it is counts both
for slices and subslices, I am mightily confused as to how
media-driver would even theoretically be able to turn off a
_specific_ (sub)slice?!
<br>
</blockquote>
<p><br>
</p>
<p>The fact that the feature needed isn't implemented at by the
thread dispatcher is really strange to me too.</p>
<p>It sounds like we're forced to use a bigger hammer than what we
really need.</p>
<p>As to how that maps to the right subslices is also unknown to me.</p>
<p>The only explanation I have is that subslices with no VME
samplers get turn off first in the list of subslices to turn off.</p>
<p><br>
</p>
<p>Cheers,</p>
<p><br>
</p>
<p>-</p>
<p>Lionel<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:09d263ed-bc18-2925-f3a8-9875ed584895@linux.intel.com">
<br>
Regards,
<br>
<br>
Tvrtko
<br>
<br>
<blockquote type="cite">
<br>
<blockquote type="cite"> }
<br>
if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
<br>
</blockquote>
<br>
<br>
_______________________________________________
<br>
Intel-gfx mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:Intel-gfx@lists.freedesktop.org">Intel-gfx@lists.freedesktop.org</a>
<br>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/intel-gfx">https://lists.freedesktop.org/mailman/listinfo/intel-gfx</a>
<br>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>