[PATCH] drm/amd/amdgpu: Add tracepoint for DMA page mapping
Tom St Denis
tom.stdenis at amd.com
Tue Aug 1 14:26:08 UTC 2017
On 01/08/17 10:10 AM, Christian König wrote:
> You need to cover multiple code path here:
> 1. The one you currently implemented which uses ttm_dma_populate() and
> pci_map_page().
> 2. The one using ttm_dma_populate().
I'll have to look into this one though it's my understanding that code
path is only followed if there's no (hardware) IOMMU right? Which while
it'd have to be covered isn't a high priority right now (our devel
platforms have hardware IOMMU).
None the less I'll look into it once I figure out #3 and #4 per your
comments.
> 3. The one using drm_prime_sg_to_page_addr_arrays() a bit above for
> imported BOs.
This one seems rather straight forward but the only catch is I don't
have access to "adev" inside that drm function. Would it be taboo to do
something like
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 376078334f54..cd97ef2144c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -916,6 +916,12 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm)
drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
gtt->ttm.dma_address,
ttm->num_pages);
ttm->state = tt_unbound;
+ for (i = 0; i < ttm->num_pages; i++) {
+ trace_amdgpu_ttm_tt_populate(
+ adev,
+ gtt->ttm.dma_address[i],
+ page_to_phys(ttm->pages[i]));
+ }
return 0;
}
This would normally result in a for loop around a nop which shouldn't be
a huge performance hit but is one none the less.
> 4. The in amdgpu_ttm_tt_pin_userptr() which uses dma_map_sg() for userptrs.
>
> Basically all just take gtt->ttm.ttm.pages and fill gtt->ttm.dma_address.
Same comment as #3. I can tackle this with a for loop around the trace
if that is ok.
> I suggest to add a helper you can call to trace all pages->dma_address
> mappings inside a ttm.
One thing at a time :-). That would probably be a bit better since the
trace log gets filled with remappings (which is why my proof-of-concept
umr patch only grabs the latest mapping as it reads the trace).
Is there a handy place we can grab a list of ttm_tt's bound to our
device? If so then in theory I could write a debugfs interface instead.
Thanks for your patience as I'm definitely less familiar with the VM
side of things :-)
Cheers,
Tom
More information about the amd-gfx
mailing list