Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

Timur Kristóf timur.kristof at gmail.com
Mon Jul 22 09:39:32 UTC 2019


> > 
> > 1. Why is the GTT->VRAM copy so much slower than the VRAM->GTT
> > copy?
> > 
> > 2. Why is the bus limited to 24 Gbit/sec? I would expect the
> > Thunderbolt port to give me at least 32 Gbit/sec for PCIe traffic.
> 
> That's unrealistic I'm afraid. As I said on IRC, from the GPU POV
> there's an 8 GT/s x4 PCIe link, so ~29.8 Gbit/s (= 32 billion bit/s;
> I
> missed this nuance on IRC) is the theoretical raw bandwidth. However,
> in
> practice that's not achievable due to various overhead[0], and I'm
> only
> seeing up to ~90% utilization of the theoretical bandwidth with a
> "normal" x16 link as well. I wouldn't expect higher utilization
> without
> seeing some evidence to suggest it's possible.
> 
> 
> [0] According to
> https://www.tested.com/tech/457440-theoretical-vs-actual-bandwidth-pci-express-and-thunderbolt/
> , PCIe 3.0 uses 1.54% of the raw bandwidth for its internal encoding.
> Also keep in mind all CPU<->GPU communication has to go through the
> PCIe
> link, e.g. for programming the transfers, in-band signalling from the
> GPU to the PCIe port where the data is being transferred to/from, ...

Good point, I used 1024 and not 1000. My mistake.

There is something else:
In the same benchmark there is a "fill->GTT  ,SDMA" row which has a
4035 MB/s number. If that traffic goes through the TB3 interface then
we just found our 32 Gbit/sec.

Now the question is, if I understand this correctly and the SDMA can
indeed do 32 Gbit/sec for "fill->GTT", then why can't it do the same
with other kinds of transfers? Not sure if there is a good answer to
that question though.

Also I still don't fully understand why GTT->VRAM is slower than VRAM-
>GTT, when the bandwidth is clearly available.

Best regards,
Tim



Side note: with regards to that 1.5% figure, the TB3 tech brief[0]
explicitly mentions this and says that it isn't carried over: "the
underlying protocol uses some data to provide encoding overhead which
is not carried over the Thunderbolt 3 link reducing the consumed
bandwidth by roughly 20 percent (DisplayPort) or 1.5 percent (PCI
Express Gen 3)"

[0] https://thunderbolttechnology.net/sites/default/files/Thunderbolt3_TechBrief_FINAL.pdf



More information about the dri-devel mailing list