<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><span class="vcard"><a class="email" href="mailto:eero.t.tamminen@intel.com" title="Eero Tamminen <eero.t.tamminen@intel.com>"> <span class="fn">Eero Tamminen</span></a>
</span> changed
<a class="bz_bug_link
bz_status_NEW "
title="NEW - [GEN9(+)] large perf drop (up to 1/3) in most 3D benchmarks from force-enabling IOMMU"
href="https://bugs.freedesktop.org/show_bug.cgi?id=111731">bug 111731</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">i915 platform</td>
<td>
</td>
<td>KBL, BXT
</td>
</tr>
<tr>
<td style="text-align:right;">Summary</td>
<td>[SKL GT4e] large perf drop (up to 27%) in most 3D benchmarks from force-enabling IOMMU
</td>
<td>[GEN9(+)] large perf drop (up to 1/3) in most 3D benchmarks from force-enabling IOMMU
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [GEN9(+)] large perf drop (up to 1/3) in most 3D benchmarks from force-enabling IOMMU"
href="https://bugs.freedesktop.org/show_bug.cgi?id=111731#c6">Comment # 6</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [GEN9(+)] large perf drop (up to 1/3) in most 3D benchmarks from force-enabling IOMMU"
href="https://bugs.freedesktop.org/show_bug.cgi?id=111731">bug 111731</a>
from <span class="vcard"><a class="email" href="mailto:eero.t.tamminen@intel.com" title="Eero Tamminen <eero.t.tamminen@intel.com>"> <span class="fn">Eero Tamminen</span></a>
</span></b>
<pre>Largest performance drops from IOMMU with recent user-space 3D / Media stack
are following (all machines have dual channel memory)...
SKL GT4e (i7-6770HQ)
--------------------
* 25-30% SynMark CSDof (fullscreen)
* 20-25% GpuTest windowed 1/2 screen Triangle, SynMark VSTangent
* 20-25% SynMark Fill* [1]
* 20% Unigine Heaven
* 20% GpuTest fullscreen Triangle, SynMark DeferredAA [2]
* 10-20% GLB 2.7 windowed 1/2 screen Egypt & T-Rex
* 10-20% SynMark ZBuffer & Deferred [2]
* 10-15% SynMark TexMem512, GPU write [1]
* 5-10% GfxBench T-Rex, Manhattan 3.0 & 3.1, CarChase, SynMark TerrainFly*
* 5% Unigine Valley, GfxBench AztecRuins
* 2-3% 8-bit, max FullHD, FFmpeg/MediaSDK GPU transcode/downscale
[1] With June user-space. With latest Mesa, Fill* & write tests drop is only
3%, and TexMem512 perf somehow improves by 2%. Latest Mesa is several percent
faster than older one in these tests, due to Mesa slice/subslice balance
optimization, no idea how that can reduce impact of IOMMU.
[2] With June user-space. With latest user-space, drop in these specific tests
is half of that, or less. For fullscreen Triangle case, potentially relevant
user-space change could be latest X server disabling atomic commits. See:
<a href="https://gitlab.freedesktop.org/xorg/xserver/issues/888">https://gitlab.freedesktop.org/xorg/xserver/issues/888</a>
KBL GT3e (i7-7567U)
-------------------
* 30-35% SynMark Fill* [1]
* 20-25% GpuTest windowed 1/2 screen Triangle, SynMark VSTangent
* 10-15% GLB 2.7 windowed 1/2 screen Egypt & T-Rex, SynMark ZBuffer
* 10-15% GpuTest fullscreen Triangle, GPU write [1]
* 10-15% 4K HEVC GPU decode + hwdownload
* 5-10% Unigine Heaven, GfxBench T-Rex, Manhattan 3.0 & 3.1 & CarChase,
SynMark CSDof
* 5-10% SynMark Deferred* [1]
* 5-10% 10-bit, 4K HEVC GPU transcode
* 2-3% 8-bit, max FullHD, FFmpeg/MediaSDK GPU transcode/downscale
[1] With June user-space. With latest user-space, drop in these specific tests
is about half.
SKL GT2 (i5-6600K)
------------------
* >25% GPU write, SynMark VSTangent
* 20% SynMark Fill*, GpuTest windowed Triangle
* 15-20% GLB Egypt
* 15% SynMark ZBuffer
* 10% GLB T-Rex, GpuTest fullscreen Triangle
* 5-10% GfxBench Manhattan 3.0 & 3.1, T-Rex, CarChase
* 5% GfxBench AztecRuins, Unigine Heaven
* 2-6% 8-bit, max FullHD, FFmpeg/MediaSDK GPU transcode/downscale
With the June user-space, there are some differences in how much performance
drops, but nothing major like with GT3e & GT4e (where slice/subslice issue
balance had noticeable impact).
BXT J4205
---------
Results similar to other devices (not reported here as this has higher
variances than them). Similarly to SKL GT2, user-space version doesn't have
significant impact on how much IOMMU regresses performance.
BDW GT2
-------
As expected, no impact (kernel skips IOMMU for BDW).
Summary
-------
* IOMMU can lose up to third of performance in worst synthetic case, and 5-15%
in real GPU (3d/Media) use-cases.
* Seems that badly balanced sub/slice utilization could have noticeable impact
in IOMMU performance impact for some use-cases.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
<li>You are on the CC list for the bug.</li>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>