[Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

Tue Dec 26 16:58:43 UTC 2017

Quoting Rogozhkin, Dmitry V (2017-12-26 16:39:23)
> Clarification on the issue. Consider that you have a massive load on GT and just tiny one on IA. If GT will program the RING frequency to be lower than IA frequency, then you will fall into the situation when RING frequency constantly transits from GT to IA level and back. Each transition of a RING frequency is a full system stall. If you will have "good" transition rate with few transitions per few milliseconds you will lose ~10% of performance. That's the case for media workloads when you easily can step into this since 1) media utilizes few GPU engines and with few parallel workloads you can make sure that at least 1 engine is _always_ doing something, 2) media BB are relatively small, so you have regular wakeups of the IA to manage requests. This will affect Gen9 platforms due to HW design change (we've spot this in SKL). This will not happen in Gen8 (old HW design). This will be fixed in Gen10+ (CNL+).

To clarify, the HW will flip between the two GT/IA requests rather than
stick to the highest? Iirc, the expectation was that we were setting a
requested minimum frequency for the ring/ia based off the gpu freq.

> On SKL we ran into this with the GPU frequency pinned to 700MHz, CPU to 2GHz. Multipliers were x2 for GT, x1 for IA.

Basically, with the GPU clocked to mid frequency, memory throughput is
insufficient to keep the fixed functions occupied, and you need to
increase the ring frequency. Is there ever a case where we don't need
max ring frequency? (Perhaps we still need to set low frequency for GT
idle?) I guess media is more susceptible to this as that workload should
be sustainable at reduced clocks, GL et al are much more likely to keep
the clocks ramped all the way up.

Do you know anything about the opposite position. I heard a suggestion
that simply increasing the ringfreq universally caused thermal
throttling in some other workloads. Do you have any knowledge of those?

> So, effectively, what we need to do is to make sure that RING frequency request from GT is _not_ below the request from IA. If IA requests 2GHz, we can't request 1.4GHz, we need request at least 2GHz. Multiplier patch was intended to do exactly that, but manually. Can  we somehow automate that managing IA frequency requests to the RING?

You are thinking of plugging into intel_pstate to make it smarter for ia
freq transitions? That seems possible, certainly. I'm not sure if the
ring frequency is actually poked from anywhere else in the kernel, would
be interesting to find out.
-Chris