[Bug 100572] [SKL dmc] Headless mode media transcoding is 20-30% slower comparing to connected monitor use case

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Apr 7 12:30:03 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=100572

Imre Deak <imre.deak at intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |imre.deak at intel.com

--- Comment #11 from Imre Deak <imre.deak at intel.com> ---
(In reply to Tvrtko Ursulin from comment #9)
> (In reply to Imre Deak from comment #7)
> > (In reply to Tvrtko Ursulin from comment #6)
> > > I tried not loading the DMC firmware and can confirm that the issue is not
> > > present in that case.
> > > 
> > > Also, it is possible to reproduce this in the default kernel config (no
> > > pinning is required) simply with igt/benchmarks/gem_latency -n 0 in which
> > > case the perf difference between the two setups was ~8x in my testing.
> > 
> > One possibility is that DC6 enables deeper system level power states and
> > this causes latency elsewhere. What are the PC state residencies shown by
> > powertop or the kernel's tools/power/x86/turbostat when DMC is loaded and
> > not?
> 
> 1. With DMC, idle system, no displays:
> 
> PKG is in PC2, 

PC2 vs. PC7 without DMC is weird, no idea for the reason. Normally you should
reach PC8+ with display off, but for that you'd also need to enable power
saving for other devices too.

> CPU is in C7, GPU is in RC6.

Was this also by booting with 'intel_idle.max_cstate=1 i915.enable_rc6=0'?
Those should prevent C7 and RC6.. Dmitry saw the problem even with these
settings, but would be good to double check on your side too, since RC6 would
be the most logical root cause. Did you check the CPU cstate also when you ran
with max_cstate=0?

> When looking in i915_dmc_info I can see that the "DC3 - > DC5" transition
> counter increases exactly by one each second. "DC5 -> DC6 counter is zero".

Err, forgot to say that reading that file itself increases the counter (if DC
states are enabled, so display is off):/ So you should sample only at the
beginning and end of the test and deduct the increment caused by the sampling.

> 
> If I now run gem_latency -n 0:
> 
> "DC3 -> DC5" counter starts increasing by ~2k per second.

Same here as above, in case you now sampled with higher freq.

> 
> PKG is not any deeper states now.
> CPU split between C2/C3/C6/C7 is approx. 42/2/10/40%.
> GPU is 0% RC6.
> 
> Benchmark goes slow.
> 
> 2. Now I force turn on a display (echo on | 
> tee /sys/class/drm/card0-HDMI-A-1/status).
> 
> "DC3 -> DC5" transition counter stops increasing.

Right, display-on keeps it in DC0.

> 
> PKG is still in PC2, CPU in C7 and GPU in RC6.
> 
> Benchmark is not normal speed and while it is running PKG is not in any low
> power states, RC6 is 0% and CPU C2/C3/C6/C7 is approx 52/0/0/25%.

Hm, so now we are constantly in DC0 and so DMC should be completely inactive
(it only ever activates when either entering DC5 or DC6). Yet there is a
slow-down, seemingly caused by it.

> 
> 3. DMC not loaded, idle system, no displays
> 
> PKG is now in PC7 (not PC2 as above!), CPU is C7, GPU is RC6.
> 
> gem_latency is now normal speed with power states like above.
> 
> Out of curiosity I tried forcing the display on in this config. That makes
> the PKG go to ~3% PC2, rest in PC7. Turning it off again brings it back to
> <0.5% PC2 and the rest in PC7.
>  
> > What's the effect of limiting max_cstates to 0 (and having DMC loaded)?
> 
> No effect on benchmark speed or reported "DC3 -> DC5" transitions.

As above, did you double check if the cstate limit is really in effect?

> 
> > An other problem could be that the GPU is trying to access the display,
> > (maybe checking scan line counts or something?).
> 
> You mean something behind the covers or explicitly by i915?

It was just a wild guess, not sure at all if it's possible. The kernel
shouldn't do anything while the display is off, unless you have runtime PM
enabled (if /sys/bus/pci/devices/0000\:00\:02.0/power/control contains 'auto')
Ville said that X does the scan line readout when rendering to the front
buffer, but that shouldn't be the case here. Yea, could be still something
under the hood by the HW itself, DC transitions would be an indication for
that.

>  
> > Does /sys/kernel/debug/dri/0/i915_dmc_info show any transitions during the
> > test when DMC is loaded?
> 
> Yes, see above. :)

So no good idea still. One other thing to try would be to limit the package
state to PC2 in BIOS if there is an option for that and boot with DMC; would
show if somehow the PC7 vs. PC2 difference itself would be the cause.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20170407/ce453440/attachment.html>


More information about the intel-gfx-bugs mailing list