[Bug 111731] [GEN9(+)] large perf drop (up to 1/3) in most 3D benchmarks from force-enabling IOMMU
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Thu Sep 19 15:58:30 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=111731
Eero Tamminen <eero.t.tamminen at intel.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
i915 platform| |KBL, BXT
Summary|[SKL GT4e] large perf drop |[GEN9(+)] large perf drop
|(up to 27%) in most 3D |(up to 1/3) in most 3D
|benchmarks from |benchmarks from
|force-enabling IOMMU |force-enabling IOMMU
--- Comment #6 from Eero Tamminen <eero.t.tamminen at intel.com> ---
Largest performance drops from IOMMU with recent user-space 3D / Media stack
are following (all machines have dual channel memory)...
SKL GT4e (i7-6770HQ)
--------------------
* 25-30% SynMark CSDof (fullscreen)
* 20-25% GpuTest windowed 1/2 screen Triangle, SynMark VSTangent
* 20-25% SynMark Fill* [1]
* 20% Unigine Heaven
* 20% GpuTest fullscreen Triangle, SynMark DeferredAA [2]
* 10-20% GLB 2.7 windowed 1/2 screen Egypt & T-Rex
* 10-20% SynMark ZBuffer & Deferred [2]
* 10-15% SynMark TexMem512, GPU write [1]
* 5-10% GfxBench T-Rex, Manhattan 3.0 & 3.1, CarChase, SynMark TerrainFly*
* 5% Unigine Valley, GfxBench AztecRuins
* 2-3% 8-bit, max FullHD, FFmpeg/MediaSDK GPU transcode/downscale
[1] With June user-space. With latest Mesa, Fill* & write tests drop is only
3%, and TexMem512 perf somehow improves by 2%. Latest Mesa is several percent
faster than older one in these tests, due to Mesa slice/subslice balance
optimization, no idea how that can reduce impact of IOMMU.
[2] With June user-space. With latest user-space, drop in these specific tests
is half of that, or less. For fullscreen Triangle case, potentially relevant
user-space change could be latest X server disabling atomic commits. See:
https://gitlab.freedesktop.org/xorg/xserver/issues/888
KBL GT3e (i7-7567U)
-------------------
* 30-35% SynMark Fill* [1]
* 20-25% GpuTest windowed 1/2 screen Triangle, SynMark VSTangent
* 10-15% GLB 2.7 windowed 1/2 screen Egypt & T-Rex, SynMark ZBuffer
* 10-15% GpuTest fullscreen Triangle, GPU write [1]
* 10-15% 4K HEVC GPU decode + hwdownload
* 5-10% Unigine Heaven, GfxBench T-Rex, Manhattan 3.0 & 3.1 & CarChase,
SynMark CSDof
* 5-10% SynMark Deferred* [1]
* 5-10% 10-bit, 4K HEVC GPU transcode
* 2-3% 8-bit, max FullHD, FFmpeg/MediaSDK GPU transcode/downscale
[1] With June user-space. With latest user-space, drop in these specific tests
is about half.
SKL GT2 (i5-6600K)
------------------
* >25% GPU write, SynMark VSTangent
* 20% SynMark Fill*, GpuTest windowed Triangle
* 15-20% GLB Egypt
* 15% SynMark ZBuffer
* 10% GLB T-Rex, GpuTest fullscreen Triangle
* 5-10% GfxBench Manhattan 3.0 & 3.1, T-Rex, CarChase
* 5% GfxBench AztecRuins, Unigine Heaven
* 2-6% 8-bit, max FullHD, FFmpeg/MediaSDK GPU transcode/downscale
With the June user-space, there are some differences in how much performance
drops, but nothing major like with GT3e & GT4e (where slice/subslice issue
balance had noticeable impact).
BXT J4205
---------
Results similar to other devices (not reported here as this has higher
variances than them). Similarly to SKL GT2, user-space version doesn't have
significant impact on how much IOMMU regresses performance.
BDW GT2
-------
As expected, no impact (kernel skips IOMMU for BDW).
Summary
-------
* IOMMU can lose up to third of performance in worst synthetic case, and 5-15%
in real GPU (3d/Media) use-cases.
* Seems that badly balanced sub/slice utilization could have noticeable impact
in IOMMU performance impact for some use-cases.
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190919/29f59a34/attachment-0001.html>
More information about the intel-gfx-bugs
mailing list