[PATCH] drm/amdgpu: add initial documentation for debugfs files

Russell, Kent Kent.Russell at amd.com
Wed Mar 5 14:59:38 UTC 2025


[Public]

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Alex
> Deucher
> Sent: Tuesday, March 4, 2025 11:50 AM
> To: amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: [PATCH] drm/amdgpu: add initial documentation for debugfs files
>
> Describes what debugfs files are available and what
> they are used for.
>
> v2: fix some typos (Mark Glines)
>
> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> ---
>  Documentation/gpu/amdgpu/debugfs.rst | 202 +++++++++++++++++++++++++++
>  Documentation/gpu/amdgpu/index.rst   |   1 +
>  2 files changed, 203 insertions(+)
>  create mode 100644 Documentation/gpu/amdgpu/debugfs.rst
>
> diff --git a/Documentation/gpu/amdgpu/debugfs.rst
> b/Documentation/gpu/amdgpu/debugfs.rst
> new file mode 100644
> index 0000000000000..18bccb57c89fb
> --- /dev/null
> +++ b/Documentation/gpu/amdgpu/debugfs.rst
> @@ -0,0 +1,202 @@
> +==============
> +AMDGPU DebugFS
> +==============
> +
> +The amdgpu driver provides a number of debugfs files to aid in debugging
> +issues in the driver.  Thse are usually found in
> +/sys/kernel/debug/dri/<num>.
> +
> +DebugFS Files
> +=============
> +
> +amdgpu_benchmark
> +----------------
> +
> +Run benchmarks using the DMA engine the driver uses for GPU memory paging.
> +Write a number to the file to run the test.  The results are written to the
> +kernel log.  The following tests are available:
> +
> +- 1: simple test, VRAM to GTT and GTT to VRAM
> +- 2: simple test, VRAM to VRAM
> +- 3: GTT to VRAM, buffer size sweep, powers of 2
> +- 4: VRAM to GTT, buffer size sweep, powers of 2
> +- 5: VRAM to VRAM, buffer size sweep, powers of 2
> +- 6: GTT to VRAM, buffer size sweep, common modes
> +- 7: VRAM to GTT, buffer size sweep, common modes
> +- 8: VRAM to VRAM, buffer size sweep, common modes
> +
> +amdgpu_test_ib
> +--------------
> +
> +Read this file to run simple IB (Indirect Buffer) tests on all kernel managed
> +rings.  IBs are command buffers usually generated by userspace applications
> +which are submitted to the kernel for execution on an particular GPU engine.
> +This just runs the simple IB tests included in the kernel.
> +
> +amdgpu_discovery
> +----------------
> +
> +Provides raw access to the IP discovery binary provided by the GPU.  Read this
> +file to acess the raw binary.
> +
> +amdgpu_vbios
> +------------
> +
> +Provides raw access to the ROM binary image from the GPU.  Read this file to
> +access the raw binary.
> +
> +amdgpu_evict_gtt
> +----------------
> +
> +Evict all buffers from the GTT memory pool.  Read this file to evict all
> +buffers from this pool.
> +
> +amdgpu_evict_vram
> +-----------------
> +
> +Evict all buffers from the VRAM memory pool.  Read this file to evict all
> +buffers from this pool.
> +
> +amdgpu_gpu_recover
> +------------------
> +

If we're going for consistency, then you could add "Trigger a full GPU reset" or something like that beforehand. The other entries above are "Do a thing. Read this file to do the thing", so it doesn't match the same style. But it's honestly so nit-picky and pedantic that it's not a big deal.

> +Read this file to trigger a full GPU reset.  All work currently running
> +on the GPU will be lost.
> +
> +amdgpu_ring_<name>
> +------------------
> +
> +Provides read access to the kernel managed ring buffers for each ring <name>.
> +These are useful for debugging problems on a particular ring.  The ring buffer
> +is how the CPU sends commands to the GPU.  The CPU writes commands into the
> +buffer and then asks the GPU engine to process it.
> +
> +amdgpu_mqd_<name>
> +-----------------
> +
> +Provides read access to the kernel managed MQD (Memory Queue Descriptor) for
> +ring <name> managed by the kernel driver.  MQDs define the features of the ring
> +and are used to store the ring's state when it is not connected to hardware.
> +The driver writes the requested ring features and metadata (GPU addresses of
> +the ring itself and associated buffers) to the MQD and the firmware uses the MQD
> +to populate the hardware when the ring is mapped to a hardware slot.  Only
> +available on engines which use MQDs.
> +
> +amdgpu_error_<name>
> +-------------------
> +
> +Provides an interface to set an error on fences associated with ring <name>.
> +The error code specified is propogated to all fences associated with the
> +ring.
> +
> +amdgpu_pm_info
> +--------------
> +
> +Provides human readable information about the power management features
> +and state of the GPU.  This includes current GFX clock, Memory clock,
> +voltages, average SoC power, temperature, GFX load, Memory load, SMU
> +feature mask, VCN power state, clock and power gating features.
> +
> +amdgpu_firmware_info
> +--------------------
> +
> +Lists the firmware versions for all firmwares used by the GPU.  Only
> +entries with a non-0 version are valid.  If the version is 0, the firmware
> +is not valid for the GPU.
> +
> +amdgpu_fence_info
> +-----------------
> +
> +Shows the last signalled and emitted fence sequence numbers for each
> +kernel driver managed ring.  Fences are associated with submissions
> +to the engine.  Emitted fences have been submitted to the ring
> +and signalled fences have been signalled by the GPU.  Rings with a
> +larger emitted fence value have outstanding work that is still being
> +processed by the engine that owns that ring.  When the emitted and
> +signalled fence values are equal, the ring is idle.
> +
> +amdgpu_gem_info
> +---------------
> +
> +Lists all of the PIDs using the GPU and the GPU buffers that they have
> +allocated.  This lists the buffer size, pool (VRAM, GTT, etc.), and buffer
> +attributes (CPU access required, CPU cache attributes, etc.).
> +
> +amdgpu_vm_info
> +--------------
> +
> +Lists all of the PIDs using the GPU and the GPU buffers that they have
> +allocated as well as the status of those buffers relative to that process'
> +GPU virtual address space (e.g., evicted, idle, invalidated, etc.).
> +
> +amdgpu_sa_info
> +--------------
> +
> +Prints out all of the suballocations by the suballocation manager in the
> +kernel driver.  Prints the GPU address, size, and fence info associated
> +with each suballocation.  The suballocations are used internally within
> +the kernel driver for various things.
> +
> +amdgpu_<pool>_mm
> +----------------
> +
> +Prints TTM information about the memory pool <pool>.
> +
> +amdgpu_vram
> +-----------
> +
> +Provides direct access to VRAM.  Used by tools like UMR to inspect
> +objects in VRAM.
> +
> +amdgpu_iomem
> +------------
> +
> +Provides direct access to GTT memory.  Used by tools like UMR to inspect
> +GTT memory.
> +
> +amdgpu_regs_*
> +-------------
> +
> +Provides direct access to various register aperatures on the GPU.  Used
> +by tools like UMR to access GPU registers.
> +
> +amdgpu_regs2
> +------------
> +
> +Provides an IOCTL interface used by UMR for interacting with GPU registers.
> +
> +
> +amdgpu_sensors
> +--------------
> +
> +Provides an interface to query GPU power metrics (temperature, average
> +power, etc.).  Used by tools like UMR to query GPU power metrics.
> +
> +
> +amdgpu_gca_config
> +-----------------
> +
> +Provides an interface to query GPU details (GFX config, PCI config,
> +GPU family, etc.).  Used by tools like UMR to query GPU details.
> +
> +amdgpu_wave
> +-----------
> +
> +Used to query GFX/compute wave infomation from the hardware.  Used by tools
> +like UMR to query GFX/compute wave information.
> +
> +amdgpu_gpr
> +----------
> +
> +Used to      query GFX/compute GPR (General Purpose Register) information

Weird extra spaces here

 Kent

> from the
> +hardware.  Used by tools like UMR to query GPRs when debugging shaders.
> +
> +amdgpu_gprwave
> +--------------
> +
> +Provides an IOCTL interface used by UMR for interacting with shader waves.
> +
> +amdgpu_fw_attestation
> +---------------------
> +
> +Provides an interface for reading back firmware attestation records.
> diff --git a/Documentation/gpu/amdgpu/index.rst
> b/Documentation/gpu/amdgpu/index.rst
> index 302d039928ee8..5254f3a162f84 100644
> --- a/Documentation/gpu/amdgpu/index.rst
> +++ b/Documentation/gpu/amdgpu/index.rst
> @@ -17,4 +17,5 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA
> (CDNA) architectures.
>     driver-misc
>     debugging
>     process-isolation
> +   debugfs
>     amdgpu-glossary
> --
> 2.48.1



More information about the amd-gfx mailing list