[PATCH] drm/amd/amdgpu: add tracing of kernel DMA mappings and enable on init (v2)

Tue Aug 15 13:38:09 UTC 2017

On 15/08/17 08:16 AM, Tom St Denis wrote:
> On 15/08/17 08:11 AM, Christian König wrote:
>> Am 15.08.2017 um 13:33 schrieb Tom St Denis:
>>> On 15/08/17 07:22 AM, Christian König wrote:
>>>> Am 14.08.2017 um 13:37 schrieb Tom St Denis:
>>>>> This patch adds trace points to the amdgpu bo_kmap/bo/kunmap functions
>>>>> which capture internal allocations.
>>>>>
>>>>> (v2): Simply add a sleep to the init.  Users can use this script to
>>>>> load with map debugging enabled:
>>>>>
>>>>>     #!/bin/bash
>>>>>     modprobe amdgpu map_debug=1 &
>>>>>     sleep 1
>>>>>     echo 1 > 
>>>>> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_ttm_tt_unpopulate/enable 
>>>>>
>>>>>     echo 1 > 
>>>>> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_ttm_tt_populate/enable
>>>>
>>>> I've just realized that there is a far simpler method than that or 
>>>> enabling trace points using the module command line.
>>>>
>>>> Assuming your GPU is PCI device 01:00.0 connected through bridge 
>>>> 00:02.1 (use lspci -t to figure the connection out):
>>>>
>>>> # Temporary remove the PCIe device from the bus
>>>> echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
>>>> # Load amdgpu, this allows you to enable the trace points you want 
>>>> and also sets probes etc..
>>>> modprobe amdpgu
>>>> # Rescan the bus, the device subsystem will automatically probe 
>>>> amdgpu with this device
>>>> echo 1 > /sys/bus/pci/devices/0000\:00\:02.1/rescan
>>>
>>> That would definitely be a bunch trickier to script up.  Nobody will 
>>> run this by hand.  So we'd need to remove all devices that are AMD 
>>> vendor ID but GPU only (e.g. not other controllers) which means we 
>>> need to parse the lspci output for VGA controller related text.
>>>
>>> Definitely doable and probably "cleaner" from a race point of view.
>>>
>>>> Apart from that I would really like to see those trace points in TTM 
>>>> instead. I also don't mind adding a dev or pdev pointer to the 
>>>> ttm_bo_device structure for this.
>>>
>>> Not against this in principle.  There's a bit of confusion... our ttm 
>>> amdgpu functions are passed
>>>
>>> struct ttm_tt *ttm
>>>
>>> which we cast to
>>>
>>> struct amdgpu_ttm_tt {
>>>     struct ttm_dma_tt    ttm;
>>>     struct amdgpu_device    *adev;
>>
>> BTW: Isn't this adev pointer here superflous? I mean the ttm_tt has a 
>> bdev pointer to use, don't they?
> 
> Honestly without cscoping it I wouldn't know :-)

Yes, you can find the adev via

static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device 
*bdev)
{
	return container_of(bdev, struct amdgpu_device, mman.bdev);
}

But looking at ttm_bo_device I don't see an obvious path to a device 
structure (or pci structure).  So that would need to be added (as you 
previously commented).

Tom