[Intel-gfx] [PATCH 2/2] drm/i915: Apply WAC6entrylatency to kbl/cfl
Chris Wilson
chris at chris-wilson.co.uk
Thu Oct 15 20:19:18 UTC 2020
Quoting Ville Syrjala (2020-07-16 20:04:26)
> From: Ville Syrjälä <ville.syrjala at linux.intel.com>
>
> WAC6entrylatency is trying to fix excessive rc6 entry latency caused
> by the extra delay from FBC_LLC_READ_CTRL, which is there for some
> extra sync with uncore for frame buffer caching in LLC.
>
> Reading through the hsd the recommendation was to set the FBC_LLC_FULLY_OPEN
> bit to disable this extra delay entirely. This can be done whenever fb LLC
> caching is not used.
Ah, is that what it means by 'must not be set unless coordinated with
uncore?' Ok.
> The alternative suggestion was to reduce the delay to
> eg. 0x5 via updated BIOS programming instructions. But all the kbl/cfl
> machines I've seen still have the default 0xff programmed. As we never use
> fb LLC caching let's just apply the w/a to all skl derivatives to get
> consistent rc6 latencies.
>
> I was able to measure the effect of FBC_LLC_READ_CTRL to rc6 latency
> via forcewake. Here's a graph of some of the results:
>
> sleep;fw_req=1;wait fw_ack==1;sleep;fw_req=0;wait fw_ack==0
> fw_ack==1 duration
> 160us +----------------------------------------------------------------+
> | + + $$+ + + |
> | $$ $ $ ******$$ ** $ $**$* #########$$######|
> 140us |-$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$*$$$$$$$$$$$$$$$$ $$$$$$|
> | $ * # |
> | $ * # |
> 120us |$+ * # +-|
> |$ * # |
> |$ * # # |
> 100us |$+ ************######################## +-|
> |$ * *# |
> |$ ***** ######### |
> 80us |$+ * # #### ## +-|
> |$ **** ### # # |
> | ** #### FBC_LLC_READ_CTRL: 0x8000 ******* |
> 60us |-###### FBC_LLC_READ_CTRL: 0xffff #######-|
> |## + + FBC_LLC_READ_CTRL: 0x400000ff $$$$$$$ |
> +----------------------------------------------------------------+
> 0ms 10ms 20ms 30ms 40ms 50ms 60ms
> sleep duration
>
> The default FBC_LLC_READ_CTRL value of 0xff is documented to give us
> a 170usec delay. That tracks well with the knees at 0xffff->~44usec and
> 0x8000->~22usec we see in the graph.
>
> We can see that if we sleep longer than the FBC_LLC_READ_CTRL delay
> we always observe the full (~145usec) rc6 wakeup latency. But if we sleep
> for less than the FBC_LLC_READ_CTRL delay we see a quicker fw wakeup,
> presumably due the hardware not having yet entered rc6 fully.
> The other plateaus in the graph I suspect correspond to some shallower
> internal rc states.
Hmm, so by setting LLC as fully open, there is always a fixed 140us
latency for rc6, implying that we always immediately try to enter rc6
rather than after ~50ms.
I realize that my rc6 power measurements should be with the display
already off and so will not show any effect. :|
The graph does imply that there should be a noticeable effect for
composited desktops, both in power saving and a latency penalty. 140us
is about the same sort of ballpark as all the over startup costs, and
one hopes they overlap.
> Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
-Chris
More information about the Intel-gfx
mailing list