[PATCH] drm/amdgpu: add initial documentation for debugfs files
Alex Deucher
alexander.deucher at amd.com
Thu Mar 6 15:53:41 UTC 2025
Describes what debugfs files are available and what
they are used for.
v2: fix some typos (Mark Glines)
v3: Address comments from Siqueira and Kent
Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
---
Documentation/gpu/amdgpu/debugfs.rst | 210 +++++++++++++++++++++++++
Documentation/gpu/amdgpu/debugging.rst | 7 +
Documentation/gpu/amdgpu/index.rst | 1 +
3 files changed, 218 insertions(+)
create mode 100644 Documentation/gpu/amdgpu/debugfs.rst
diff --git a/Documentation/gpu/amdgpu/debugfs.rst b/Documentation/gpu/amdgpu/debugfs.rst
new file mode 100644
index 0000000000000..fdfc1a8773c72
--- /dev/null
+++ b/Documentation/gpu/amdgpu/debugfs.rst
@@ -0,0 +1,210 @@
+==============
+AMDGPU DebugFS
+==============
+
+The amdgpu driver provides a number of debugfs files to aid in debugging
+issues in the driver. Thse are usually found in
+/sys/kernel/debug/dri/<num>.
+
+DebugFS Files
+=============
+
+amdgpu_benchmark
+----------------
+
+Run benchmarks using the DMA engine the driver uses for GPU memory paging.
+Write a number to the file to run the test. The results are written to the
+kernel log. VRAM is on device memory (dGPUs) or cave out (APUs) and GTT
+(Graphics Translation Tables) is system memory that is accessible by the GPU.
+The following tests are available:
+
+- 1: simple test, VRAM to GTT and GTT to VRAM
+- 2: simple test, VRAM to VRAM
+- 3: GTT to VRAM, buffer size sweep, powers of 2
+- 4: VRAM to GTT, buffer size sweep, powers of 2
+- 5: VRAM to VRAM, buffer size sweep, powers of 2
+- 6: GTT to VRAM, buffer size sweep, common display sizes
+- 7: VRAM to GTT, buffer size sweep, common display sizes
+- 8: VRAM to VRAM, buffer size sweep, common display sizes
+
+amdgpu_test_ib
+--------------
+
+Read this file to run simple IB (Indirect Buffer) tests on all kernel managed
+rings. IBs are command buffers usually generated by userspace applications
+which are submitted to the kernel for execution on an particular GPU engine.
+This just runs the simple IB tests included in the kernel. These tests
+are engine specific and verify that IB submission works.
+
+amdgpu_discovery
+----------------
+
+Provides raw access to the IP discovery binary provided by the GPU. Read this
+file to acess the raw binary. This is useful for verifying the contents of
+the IP discovery table. It is chip specific.
+
+amdgpu_vbios
+------------
+
+Provides raw access to the ROM binary image from the GPU. Read this file to
+access the raw binary. This is useful for verifying the contents of the
+video BIOS ROM. It is board specific.
+
+amdgpu_evict_gtt
+----------------
+
+Evict all buffers from the GTT memory pool. Read this file to evict all
+buffers from this pool.
+
+amdgpu_evict_vram
+-----------------
+
+Evict all buffers from the VRAM memory pool. Read this file to evict all
+buffers from this pool.
+
+amdgpu_gpu_recover
+------------------
+
+Trigger a GPU reset. Read this file to trigger reset the entire GPU.
+All work currently running on the GPU will be lost.
+
+amdgpu_ring_<name>
+------------------
+
+Provides read access to the kernel managed ring buffers for each ring <name>.
+These are useful for debugging problems on a particular ring. The ring buffer
+is how the CPU sends commands to the GPU. The CPU writes commands into the
+buffer and then asks the GPU engine to process it. This is the raw binary
+contents of the ring buffer. Use a tool like UMR to decode the rings into human
+readable form.
+
+amdgpu_mqd_<name>
+-----------------
+
+Provides read access to the kernel managed MQD (Memory Queue Descriptor) for
+ring <name> managed by the kernel driver. MQDs define the features of the ring
+and are used to store the ring's state when it is not connected to hardware.
+The driver writes the requested ring features and metadata (GPU addresses of
+the ring itself and associated buffers) to the MQD and the firmware uses the MQD
+to populate the hardware when the ring is mapped to a hardware slot. Only
+available on engines which use MQDs. This provides access to the raw MQD
+binary.
+
+amdgpu_error_<name>
+-------------------
+
+Provides an interface to set an error code on the dma fences associated with
+ring <name>. The error code specified is propogated to all fences associated
+with the ring. Use this to inject a fence error into a ring.
+
+amdgpu_pm_info
+--------------
+
+Provides human readable information about the power management features
+and state of the GPU. This includes current GFX clock, Memory clock,
+voltages, average SoC power, temperature, GFX load, Memory load, SMU
+feature mask, VCN power state, clock and power gating features.
+
+amdgpu_firmware_info
+--------------------
+
+Lists the firmware versions for all firmwares used by the GPU. Only
+entries with a non-0 version are valid. If the version is 0, the firmware
+is not valid for the GPU.
+
+amdgpu_fence_info
+-----------------
+
+Shows the last signalled and emitted fence sequence numbers for each
+kernel driver managed ring. Fences are associated with submissions
+to the engine. Emitted fences have been submitted to the ring
+and signalled fences have been signalled by the GPU. Rings with a
+larger emitted fence value have outstanding work that is still being
+processed by the engine that owns that ring. When the emitted and
+signalled fence values are equal, the ring is idle.
+
+amdgpu_gem_info
+---------------
+
+Lists all of the PIDs using the GPU and the GPU buffers that they have
+allocated. This lists the buffer size, pool (VRAM, GTT, etc.), and buffer
+attributes (CPU access required, CPU cache attributes, etc.).
+
+amdgpu_vm_info
+--------------
+
+Lists all of the PIDs using the GPU and the GPU buffers that they have
+allocated as well as the status of those buffers relative to that process'
+GPU virtual address space (e.g., evicted, idle, invalidated, etc.).
+
+amdgpu_sa_info
+--------------
+
+Prints out all of the suballocations (sa) by the suballocation manager in the
+kernel driver. Prints the GPU address, size, and fence info associated
+with each suballocation. The suballocations are used internally within
+the kernel driver for various things.
+
+amdgpu_<pool>_mm
+----------------
+
+Prints TTM information about the memory pool <pool>.
+
+amdgpu_vram
+-----------
+
+Provides direct access to VRAM. Used by tools like UMR to inspect
+objects in VRAM.
+
+amdgpu_iomem
+------------
+
+Provides direct access to GTT memory. Used by tools like UMR to inspect
+GTT memory.
+
+amdgpu_regs_*
+-------------
+
+Provides direct access to various register aperatures on the GPU. Used
+by tools like UMR to access GPU registers.
+
+amdgpu_regs2
+------------
+
+Provides an IOCTL interface used by UMR for interacting with GPU registers.
+
+
+amdgpu_sensors
+--------------
+
+Provides an interface to query GPU power metrics (temperature, average
+power, etc.). Used by tools like UMR to query GPU power metrics.
+
+
+amdgpu_gca_config
+-----------------
+
+Provides an interface to query GPU details (Graphics/Compute Array config,
+PCI config, GPU family, etc.). Used by tools like UMR to query GPU details.
+
+amdgpu_wave
+-----------
+
+Used to query GFX/compute wave infomation from the hardware. Used by tools
+like UMR to query GFX/compute wave information.
+
+amdgpu_gpr
+----------
+
+Used to query GFX/compute GPR (General Purpose Register) infomation from the
+hardware. Used by tools like UMR to query GPRs when debugging shaders.
+
+amdgpu_gprwave
+--------------
+
+Provides an IOCTL interface used by UMR for interacting with shader waves.
+
+amdgpu_fw_attestation
+---------------------
+
+Provides an interface for reading back firmware attestation records.
diff --git a/Documentation/gpu/amdgpu/debugging.rst b/Documentation/gpu/amdgpu/debugging.rst
index e75f97d0e4eaf..7cbfea0606e15 100644
--- a/Documentation/gpu/amdgpu/debugging.rst
+++ b/Documentation/gpu/amdgpu/debugging.rst
@@ -2,6 +2,13 @@
GPU Debugging
===============
+General Debugging Options
+=========================
+
+The DebugFS section provides documentation on a number files to aid in debugging
+issues on the GPU.
+
+
GPUVM Debugging
===============
diff --git a/Documentation/gpu/amdgpu/index.rst b/Documentation/gpu/amdgpu/index.rst
index 302d039928ee8..4c75567854cb2 100644
--- a/Documentation/gpu/amdgpu/index.rst
+++ b/Documentation/gpu/amdgpu/index.rst
@@ -16,5 +16,6 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures.
thermal
driver-misc
debugging
+ debugfs
process-isolation
amdgpu-glossary
--
2.48.1
More information about the amd-gfx
mailing list