Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

Michel Dänzer michel at daenzer.net
Thu Jul 18 10:29:37 UTC 2019


On 2019-07-18 11:06 a.m., Timur Kristóf wrote:
>>> Thanks Marek, I didn't know about that option.
>>> Tried it, here is the output: https://pastebin.com/raw/9SAAbbAA
>>>
>>> I'm not quite sure how to interpret the numbers, they are
>>> inconsistent
>>> with the results from both pcie_bw and amdgpu.benchmark, for
>>> example
>>> GTT->VRAM at a 128 KB is around 1400 MB/s (I assume that is
>>> megabytes /
>>> sec, right?).
>>
>> Based on the SDMA results, you have 2.4 GB/s. For 128KB, it's 2.2
>> GB/s for GTT->VRAM copies.
> 
> In the meantime I had a chat with Michel on IRC and he suggested that
> maybe amdgpu.benchmark=3 gives lower results because it uses a less
> than optimal way to do the benchmark.
> 
> Looking at the results from the mesa benchmark a bit more closely, I
> see that the SDMA can do:
> VRAM->GTT: 3087 MB/s = 24 Gbit/sec
> GTT->VRAM: 2433 MB/s = 19 Gbit/sec
> 
> So on Polaris at least, the SDMA is the fastest, and the other transfer
> methods can't match it. I also did the same test on Navi, where it's
> different: all other transfer methods are much closer to the SDMA, but
> the max speed is still around 20-24 Gbit / sec.
> 
> I still have a few questions:
> 
> 1. Why is the GTT->VRAM copy so much slower than the VRAM->GTT copy?
> 
> 2. Why is the bus limited to 24 Gbit/sec? I would expect the
> Thunderbolt port to give me at least 32 Gbit/sec for PCIe traffic.

That's unrealistic I'm afraid. As I said on IRC, from the GPU POV
there's an 8 GT/s x4 PCIe link, so ~29.8 Gbit/s (= 32 billion bit/s; I
missed this nuance on IRC) is the theoretical raw bandwidth. However, in
practice that's not achievable due to various overhead[0], and I'm only
seeing up to ~90% utilization of the theoretical bandwidth with a
"normal" x16 link as well. I wouldn't expect higher utilization without
seeing some evidence to suggest it's possible.


[0] According to
https://www.tested.com/tech/457440-theoretical-vs-actual-bandwidth-pci-express-and-thunderbolt/
, PCIe 3.0 uses 1.54% of the raw bandwidth for its internal encoding.
Also keep in mind all CPU<->GPU communication has to go through the PCIe
link, e.g. for programming the transfers, in-band signalling from the
GPU to the PCIe port where the data is being transferred to/from, ...

-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the dri-devel mailing list