Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

Timur Kristóf timur.kristof at gmail.com
Tue Jul 2 09:49:34 UTC 2019


On Tue, 2019-07-02 at 10:09 +0200, Michel Dänzer wrote:
> On 2019-07-01 6:01 p.m., Timur Kristóf wrote:
> > On Mon, 2019-07-01 at 16:54 +0200, Michel Dänzer wrote:
> > > On 2019-06-28 2:21 p.m., Timur Kristóf wrote:
> > > > I haven't found a good way to measure the maximum PCIe
> > > > throughput
> > > > between the CPU and GPU,
> > > 
> > > amdgpu.benchmark=3
> > > 
> > > on the kernel command line will measure throughput for various
> > > transfer
> > > sizes during driver initialization.
> > 
> > Thanks, I will definitely try that.
> > Is this the only way to do this, or is there a way to benchmark it
> > after it already booted?
> 
> The former. At least in theory, it's possible to unload the amdgpu
> module while nothing is using it, then load it again.

Okay, so I booted my system with amdgpu.benchmark=3
You can find the full dmesg log here: https://pastebin.com/zN9FYGw4

The result is between 1-5 Gbit / sec depending on the transfer size
(the higher the better), which corresponds to neither the 8 Gbit / sec
that the kernel thinks it is limited to, nor the 20 Gbit / sec which I
measured earlier with pcie_bw. Since pcie_bw only shows the maximum
PCIe packet size (and not the actual size), could it be that it's so
inaccurate that the 20 Gbit / sec is a fluke?

Side note: after booting with amdgpu.benchmark=3 the graphical session
was useless and straight out hanged the system after I logged in. So I
had to reboot into runlevel 3 to be able to save the above dmesg log.

> 
> > > > but I did take a look at AMD's sysfs interface at
> > > > /sys/class/drm/card1/device/pcie_bw which while running the
> > > > bottlenecked
> > > > game. The highest throughput I saw there was only 2.43 Gbit
> > > > /sec.
> > > 
> > > PCIe bandwidth generally isn't a bottleneck for games, since they
> > > don't
> > > constantly transfer large data volumes across PCIe, but store
> > > them in
> > > the GPU's local VRAM, which is connected at much higher
> > > bandwidth.
> > 
> > There are reasons why I think the problem is the bandwidth:
> > 1. The same issues don't happen when the GPU is not used with a TB3
> > enclosure.
> > 2. In case of radeonsi, the problem was mitigated once Marek's SDMA
> > patch was merged, which hugely reduces the PCIe bandwidth use.
> > 3. In less optimized cases (for example D9VK), the problem is still
> > very noticable.
> 
> However, since you saw as much as ~20 Gbit/s under different
> circumstances, the 2.43 Gbit/s used by this game clearly isn't a hard
> limit; there must be other limiting factors.

There may be other factors, yes. I can't offer a good explanation on
what exactly is happening, but it's pretty clear that amdgpu can't take
full advantage of the TB3 link, so it seemed like a good idea to start
investigating this first.



More information about the dri-devel mailing list