[PATCH 1/2] drm/amdgpu: return the PCIe gen and lanes from the INFO
Christian König
christian.koenig at amd.com
Mon Jan 16 11:31:00 UTC 2023
Mhm, that sucks. Could we have the automated builds check for paddings
in the UAPi data structure?
Christian.
Am 14.01.23 um 00:33 schrieb Marek Olšák:
> There is no hole on 32-bit unfortunately. It looks like the hole on
> 64-bit is now ABI.
>
> I moved the field to replace _pad1. The patch is attached (with your Rb).
>
> Marek
>
> On Fri, Jan 13, 2023 at 4:20 PM Alex Deucher <alexdeucher at gmail.com>
> wrote:
>
> On Fri, Jan 13, 2023 at 4:02 PM Marek Olšák <maraeo at gmail.com> wrote:
> >
> > i've added the comments and indeed pahole shows the hole as
> expected.
>
> What about on 32-bit?
>
> Alex
>
> >
> > Marek
> >
> > On Thu, Jan 12, 2023 at 11:44 AM Alex Deucher
> <alexdeucher at gmail.com> wrote:
> >>
> >> On Thu, Jan 12, 2023 at 6:50 AM Christian König
> >> <christian.koenig at amd.com> wrote:
> >> >
> >> > Am 11.01.23 um 21:48 schrieb Alex Deucher:
> >> > > On Wed, Jan 4, 2023 at 3:17 PM Marek Olšák
> <maraeo at gmail.com> wrote:
> >> > >> Yes, it's meant to be like a spec sheet. We are not
> interested in the current bandwidth utilization.
> >> > > After chatting with Marek on IRC and thinking about this
> more, I think
> >> > > this patch is fine. It's not really meant for bandwidth
> per se, but
> >> > > rather as a limit to determine what the driver should do in
> certain
> >> > > cases (i.e., when does it make sense to copy to vram vs
> not). It's
> >> > > not straightforward for userspace to parse the full topology to
> >> > > determine what links may be slow. I guess one potential
> pitfall would
> >> > > be that if you pass the device into a VM, the driver may
> report the
> >> > > wrong values. Generally in a VM the VM doesn't get the
> full view up
> >> > > to the root port. I don't know if the hypervisors report
> properly for
> >> > > pcie_bandwidth_available() in a VM or if it just shows the
> info about
> >> > > the endpoint in the VM.
> >> >
> >> > So this basically doesn't return the gen and lanes of the
> device, but
> >> > rather what was negotiated between the device and the
> upstream root port?
> >>
> >> Correct. It exposes the max gen and lanes of the slowest link
> between
> >> the device and the root port.
> >>
> >> >
> >> > If I got that correctly then we should probably document that
> cause
> >> > otherwise somebody will try to "fix" it at some time.
> >>
> >> Good point.
> >>
> >> Alex
> >>
> >> >
> >> > Christian.
> >> >
> >> > >
> >> > > Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
> >> > >
> >> > > Alex
> >> > >
> >> > >> Marek
> >> > >>
> >> > >> On Wed, Jan 4, 2023 at 10:33 AM Lazar, Lijo
> <Lijo.Lazar at amd.com> wrote:
> >> > >>> [AMD Official Use Only - General]
> >> > >>>
> >> > >>>
> >> > >>> To clarify, with DPM in place, the current bandwidth will
> be changing based on the load.
> >> > >>>
> >> > >>> If apps/umd already has a way to know the current
> bandwidth utilisation, then possible maximum also could be part of
> the same API. Otherwise, this only looks like duplicate
> information. We have the same information in sysfs DPM nodes.
> >> > >>>
> >> > >>> BTW, I don't know to what extent app/umd really makes use
> of this. Take that memory frequency as an example (I'm reading it
> as 16GHz). It only looks like a spec sheet.
> >> > >>>
> >> > >>> Thanks,
> >> > >>> Lijo
> >> > >>> ________________________________
> >> > >>> From: Marek Olšák <maraeo at gmail.com>
> >> > >>> Sent: Wednesday, January 4, 2023 8:40:00 PM
> >> > >>> To: Lazar, Lijo <Lijo.Lazar at amd.com>
> >> > >>> Cc: amd-gfx at lists.freedesktop.org
> <amd-gfx at lists.freedesktop.org>
> >> > >>> Subject: Re: [PATCH 1/2] drm/amdgpu: return the PCIe gen
> and lanes from the INFO
> >> > >>>
> >> > >>> On Wed, Jan 4, 2023 at 9:19 AM Lazar, Lijo
> <lijo.lazar at amd.com> wrote:
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> On 1/4/2023 7:43 PM, Marek Olšák wrote:
> >> > >>>> On Wed, Jan 4, 2023 at 6:50 AM Lazar, Lijo
> <lijo.lazar at amd.com
> >> > >>>> <mailto:lijo.lazar at amd.com>> wrote:
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> On 1/4/2023 4:11 AM, Marek Olšák wrote:
> >> > >>>> > I see. Well, those sysfs files are not usable,
> and I don't think it
> >> > >>>> > would be important even if they were usable, but
> for completeness:
> >> > >>>> >
> >> > >>>> > The ioctl returns:
> >> > >>>> > pcie_gen = 1
> >> > >>>> > pcie_num_lanes = 16
> >> > >>>> >
> >> > >>>> > Theoretical bandwidth from those values: 4.0 GB/s
> >> > >>>> > My DMA test shows this write bandwidth: 3.5 GB/s
> >> > >>>> > It matches the expectation.
> >> > >>>> >
> >> > >>>> > Let's see the devices (there is only 1 GPU
> Navi21 in the system):
> >> > >>>> > $ lspci |egrep '(PCI|VGA).*Navi'
> >> > >>>> > 0a:00.0 PCI bridge: Advanced Micro Devices, Inc.
> [AMD/ATI] Navi
> >> > >>>> 10 XL
> >> > >>>> > Upstream Port of PCI Express Switch (rev c3)
> >> > >>>> > 0b:00.0 PCI bridge: Advanced Micro Devices, Inc.
> [AMD/ATI] Navi
> >> > >>>> 10 XL
> >> > >>>> > Downstream Port of PCI Express Switch
> >> > >>>> > 0c:00.0 VGA compatible controller: Advanced
> Micro Devices, Inc.
> >> > >>>> > [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900
> XT] (rev c3)
> >> > >>>> >
> >> > >>>> > Let's read sysfs:
> >> > >>>> >
> >> > >>>> > $ cat
> /sys/bus/pci/devices/0000:0a:00.0/current_link_width
> >> > >>>> > 16
> >> > >>>> > $ cat
> /sys/bus/pci/devices/0000:0b:00.0/current_link_width
> >> > >>>> > 16
> >> > >>>> > $ cat
> /sys/bus/pci/devices/0000:0c:00.0/current_link_width
> >> > >>>> > 16
> >> > >>>> > $ cat
> /sys/bus/pci/devices/0000:0a:00.0/current_link_speed
> >> > >>>> > 2.5 GT/s PCIe
> >> > >>>> > $ cat
> /sys/bus/pci/devices/0000:0b:00.0/current_link_speed
> >> > >>>> > 16.0 GT/s PCIe
> >> > >>>> > $ cat
> /sys/bus/pci/devices/0000:0c:00.0/current_link_speed
> >> > >>>> > 16.0 GT/s PCIe
> >> > >>>> >
> >> > >>>> > Problem 1: None of the speed numbers match 4 GB/s.
> >> > >>>>
> >> > >>>> US bridge = 2.5GT/s means operating at PCIe Gen 1
> speed. Total
> >> > >>>> theoretical bandwidth is then derived based on
> encoding and total
> >> > >>>> number
> >> > >>>> of lanes.
> >> > >>>>
> >> > >>>> > Problem 2: Userspace doesn't know the bus index
> of the bridges,
> >> > >>>> and it's
> >> > >>>> > not clear which bridge should be used.
> >> > >>>>
> >> > >>>> In general, modern ones have this arch= US->DS->EP.
> US is the one
> >> > >>>> connected to physical link.
> >> > >>>>
> >> > >>>> > Problem 3: The PCIe gen number is missing.
> >> > >>>>
> >> > >>>> Current link speed is based on whether it's
> Gen1/2/3/4/5.
> >> > >>>>
> >> > >>>> BTW, your patch makes use of capabilities flags
> which gives the maximum
> >> > >>>> supported speed/width by the device. It may not
> necessarily reflect the
> >> > >>>> current speed/width negotiated. I guess in NV, this
> info is already
> >> > >>>> obtained from PMFW and made available through
> metrics table.
> >> > >>>>
> >> > >>>>
> >> > >>>> It computes the minimum of the device PCIe gen and the
> motherboard/slot
> >> > >>>> PCIe gen to get the final value. These 2 lines do that.
> The low 16 bits
> >> > >>>> of the mask contain the device PCIe gen mask. The high
> 16 bits of the
> >> > >>>> mask contain the slot PCIe gen mask.
> >> > >>>> + pcie_gen_mask = adev->pm.pcie_gen_mask &
> (adev->pm.pcie_gen_mask >> 16);
> >> > >>>> + dev_info->pcie_gen = fls(pcie_gen_mask);
> >> > >>>>
> >> > >>> With DPM in place on some ASICs, how much does this
> static info help for
> >> > >>> upper level apps?
> >> > >>>
> >> > >>>
> >> > >>> It helps UMDs make better decisions if they know the
> maximum achievable bandwidth. UMDs also compute the maximum memory
> bandwidth and compute performance (FLOPS). Right now it's printed
> by Mesa to give users detailed information about their GPU. For
> example:
> >> > >>>
> >> > >>> $ AMD_DEBUG=info glxgears
> >> > >>> Device info:
> >> > >>> name = NAVI21
> >> > >>> marketing_name = AMD Radeon RX 6800
> >> > >>> num_se = 3
> >> > >>> num_rb = 12
> >> > >>> num_cu = 60
> >> > >>> max_gpu_freq = 2475 MHz
> >> > >>> max_gflops = 19008 GFLOPS
> >> > >>> l0_cache_size = 16 KB
> >> > >>> l1_cache_size = 128 KB
> >> > >>> l2_cache_size = 4096 KB
> >> > >>> l3_cache_size = 128 MB
> >> > >>> memory_channels = 16 (TCC blocks)
> >> > >>> memory_size = 16 GB (16384 MB)
> >> > >>> memory_freq = 16 GHz
> >> > >>> memory_bus_width = 256 bits
> >> > >>> memory_bandwidth = 512 GB/s
> >> > >>> pcie_gen = 1
> >> > >>> pcie_num_lanes = 16
> >> > >>> pcie_bandwidth = 4.0 GB/s
> >> > >>>
> >> > >>> Marek
> >> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230116/6a2d8c44/attachment-0001.htm>
More information about the amd-gfx
mailing list