Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

Michel Dänzer michel at daenzer.net
Wed Jul 3 08:07:22 UTC 2019


On 2019-07-02 11:49 a.m., Timur Kristóf wrote:
> On Tue, 2019-07-02 at 10:09 +0200, Michel Dänzer wrote:
>> On 2019-07-01 6:01 p.m., Timur Kristóf wrote:
>>> On Mon, 2019-07-01 at 16:54 +0200, Michel Dänzer wrote:
>>>> On 2019-06-28 2:21 p.m., Timur Kristóf wrote:
>>>>> I haven't found a good way to measure the maximum PCIe
>>>>> throughput
>>>>> between the CPU and GPU,
>>>>
>>>> amdgpu.benchmark=3
>>>>
>>>> on the kernel command line will measure throughput for various
>>>> transfer
>>>> sizes during driver initialization.
>>>
>>> Thanks, I will definitely try that.
>>> Is this the only way to do this, or is there a way to benchmark it
>>> after it already booted?
>>
>> The former. At least in theory, it's possible to unload the amdgpu
>> module while nothing is using it, then load it again.
> 
> Okay, so I booted my system with amdgpu.benchmark=3
> You can find the full dmesg log here: https://pastebin.com/zN9FYGw4
> 
> The result is between 1-5 Gbit / sec depending on the transfer size
> (the higher the better), which corresponds to neither the 8 Gbit / sec
> that the kernel thinks it is limited to, nor the 20 Gbit / sec which I
> measured earlier with pcie_bw.

5 Gbit/s throughput could be consistent with 8 Gbit/s theoretical
bandwidth, due to various overhead.


> Since pcie_bw only shows the maximum PCIe packet size (and not the
> actual size), could it be that it's so inaccurate that the 20 Gbit /
> sec is a fluke?

Seems likely or at least plausible.


>>>>> but I did take a look at AMD's sysfs interface at
>>>>> /sys/class/drm/card1/device/pcie_bw which while running the
>>>>> bottlenecked
>>>>> game. The highest throughput I saw there was only 2.43 Gbit
>>>>> /sec.
>>>>
>>>> PCIe bandwidth generally isn't a bottleneck for games, since they
>>>> don't
>>>> constantly transfer large data volumes across PCIe, but store
>>>> them in
>>>> the GPU's local VRAM, which is connected at much higher
>>>> bandwidth.
>>>
>>> There are reasons why I think the problem is the bandwidth:
>>> 1. The same issues don't happen when the GPU is not used with a TB3
>>> enclosure.
>>> 2. In case of radeonsi, the problem was mitigated once Marek's SDMA
>>> patch was merged, which hugely reduces the PCIe bandwidth use.
>>> 3. In less optimized cases (for example D9VK), the problem is still
>>> very noticable.
>>
>> However, since you saw as much as ~20 Gbit/s under different
>> circumstances, the 2.43 Gbit/s used by this game clearly isn't a hard
>> limit; there must be other limiting factors.
> 
> There may be other factors, yes. I can't offer a good explanation on
> what exactly is happening, but it's pretty clear that amdgpu can't take
> full advantage of the TB3 link, so it seemed like a good idea to start
> investigating this first.

Yeah, actually it would be consistent with ~16-32 KB granularity
transfers based on your measurements above, which is plausible. So
making sure that the driver doesn't artificially limit the PCIe
bandwidth might indeed help.

OTOH this also indicates a similar potential for improvement by using
larger transfers in Mesa and/or the kernel.


-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the dri-devel mailing list