[Intel-gfx] [igt-dev] [PATCH i-g-t v2] intel-gpu-top: Rewrite the tool to be safe to use
Eero Tamminen
eero.t.tamminen at intel.com
Wed Apr 4 14:23:05 UTC 2018
Hi,
On 04.04.2018 15:42, Tvrtko Ursulin wrote:
> On 04/04/2018 13:15, Eero Tamminen wrote:
>> I've now tested v5 with old Ubuntu kernel on KBL, and with latest
>> drm-tip kernel on SNB, HSW, BYT, BSW and BDW GT & GT3.
>>
>>
>> Generic test results
>> --------------------
>>
>> * Tool works on all of them
>>
>> * The new error messages and headings look good
>>
>> * Idle IMC read amounts correspond to expected values on SNB & HSW.
>> The much smaller values on BDW & SKL are due to FBC (how well
>> it compresses, naturally depends on screen content).
Unlike I assumed earlier, it's not actually uncore bandwidth.
It's RAM bandwidth, I guess the IMC abbreviation is for something
like Intel/Integrated Memory Controller.
Note that it includes also any memory bandwidth used by CPU, and if
the data fits into LLC, it doesn't show it.
However, knowing whether CPU uses memory bandwidth is actually useful
thing because RAM bandwidth is a shared resource with GPU. One can
check other tasks bandwidth usage before launching the GPU task.
> Hm OK, you managed to explain it. Because in the meantime I have
> observed one oddity with write bandwidth on my headless SkullCanyon NUC.
> It idles around 28MiB/s,
On idle machine, write bandwidth usage should be zero.
What is causing the writes?
> while when I load it up with some command
> streamer activity it drops to ~11MiB/s. I don't know, but just feels
> suspect. (Read bandwidth goes from ~215MiB/s at idle to ~4.5GiB/s in my
> load case.)
Is it possible that your test load directly affected whatever task
was causing the writes? E.g. if the write load and the read load
e.g. both use render pipeline, your read load could slow down
the write load (by "flooding" render pipeline).
The effect could also be indirect. E.g. read bandwidth usage could eat
part of the write bandwidth, as they aren't completely independent
resources.
Or if your test load is very heavy, it could cause TDP limitation
for the whole device, which could drop other tasks a bit.
I would need to know more about what your write load is, to come up
with a good excuse. ;-)
- Eero
>> BYT & BSW
>> ---------
>>
>> * IMC, power usage and actual(?) freq values are missing.
>>
>> -> You can get actual freq by polling CAGF register, represented by:
>> /sys/class/drm/card0/gt_act_freq_mhz
>
> Yep, this is the i915 internal limitation that we cannot expose this for
> consumption from PMU.
>
>>
>> Normally i915 driver maps uncore power usage to GPU power usage,
>> but BYT is missing that (and ram power usage). However, RAPL
>> does report package & core values...
>>
>>
>> Suggestions
>> -----------
>>
>> Maybe on platforms where RAPL doesn't report "uncore" power usage,
>> you could just deduct RAPL reported "core" power consumption from
>> the "package" power consumption, and report that as "GPU" power
>> usage? (Or do that in i915 directly)
>
> What are you referring to as "uncore" in the context of RAPL?
>
> Do I understood correctly you suggested to use "energy-pkg -
> energy-cores" when "energy-gpu" is not available? If the former two are
> there both on on BYT and BSW, this sounds okay to me.
>
>> You need also to either update the manual, or implement -o and -e
>
> There is a manual, will do!
>
>> options for the new version of intel_gpu_top. CSV output of all
>> the reported values would be nice.
>
> I would prefer to drop both -o and -e, since this is achievable via perf
> stat. For instance:
>
> perf stat -a -e power/energy-gpu/,i915/rcs0-busy/ -I 1000 -x, <command>
>
> Gives CSV samples once per second.
>
> On the other hand one argument I can think of to actually do implement
> -o and -e, is that we need to do some extra normalization on some i915
> counters perf tool would not do.
>
> I don't have a feeling if anyone is actually using these options. If
> unlikely, we should probably drop them regardless.
>
>> You might mention in manual as an example how to calculate
>> idle screen update bandwidth, and that it's impacted by:
>> - PSR (panel self refresh, depends on display supporting it):
>> /sys/kernel/debug/dri/0/i915_edp_psr_status
>> - FBC (frame buffer compression, enabled on newer GENs)
>> /sys/kernel/debug/dri/0/i915_fbc_status
>> - end-to-end RBC (render buffer compression, requires modifiers
>> support i.e. GEN9+ GPU and X & Mesa with DRI3 v1.2 [1] support)
>
> Sounds useful for users, but I am a bit wary of feature creep. In this
> specific example I'd want to push it for follow-up work.
>
> Regards,
>
> Tvrtko
More information about the Intel-gfx
mailing list