drm/radeon: "ring test failed" on PA-RISC Linux

Konrad Rzeszutek Wilk konrad.wilk at oracle.com
Wed Sep 25 10:28:16 PDT 2013


On Wed, Sep 25, 2013 at 08:29:07PM +0400, Alex Ivanov wrote:
> 24.09.2013, 00:11, "Konrad Rzeszutek Wilk" <konrad.wilk at oracle.com>:
> > On Sat, Sep 21, 2013 at 07:39:10AM +0400, Alex Ivanov wrote:
> >
> >>  21.09.2013, в 1:27, Alex Deucher <alexdeucher at gmail.com> написал(а):
> >>>  The register writes seems to be going through the register backbone correctly:
> >>>
> >>>  [0x00B] 0x15E0=0x00000000
> >>>  [0x00C] 0x15E4=0xCAFEDEAD
> >>>  [0x00D] 0x4274=0x0000000F
> >>>  [0x00E] 0x42C8=0x00000007
> >>>  [0x00F] 0x4018=0x0000001D
> >>>  [0x010] 0x170C=0x80000000
> >>>  [0x011] 0x3428=0x00020100
> >>>  [0x012] 0x15E4=0xCAFEDEAD
> >>>
> >>>  You can see the 0xCAFEDEAD written to the scratch register via MMIO
> >>>  from the ring_test(). The CP fifo however seems to be full of garbage.
> >>>  The CP is busy though, so it seems to be functional.  I guess it's
> >>>  just fetching garbage rather than commands.
> >
> > If it is fetching garbage, that would imply the DMA (or bus addresses)
> > that are programmed in the GART are bogus. If you dump them and try
> > to figure out if bus adress -> physical address -> virtual address ==
> > virtual address -> bus address that could help. And perhaps seeing what
> > the virtual address has - and or poisoning it with known data?
> >
> > Or perhaps the the card has picked up an incorrect page table? Meaning
> > the (bus) address given to it is not the correct one?
> >
> 
> Konrad,
> 
> Let's see. Please notice that i'm not PA-RISC or general linux kernel
> developer, just the user, so i may do things completely wrong. 
> I was hoping that PA-RISC smarties will join me here, but they seem
> to be busy with other duties. Even port's mail list activity is low 
> during last weeks.

I took a look at the arch/parisc/kernel/pci-dma.c and I see that
is mostly a flat platform. That is bus addresses == physical addresses.
Unless it is an pclx or pclx2 CPU type (huh?) - if its it that
then any calls to dma_alloc_coherent will map memory out of a pool.
In essence it will look like a SWIOTLB bounce buffer.

But interestingly enough there is a lot of 'flush_kernel_dcache_range'
call for every DMA operation. And I think the you need to do
dma_sync_for_cpu call in the radeon_test_writeback for it to
use the flush_kernel_dcache_range. I don't know what the
flush_kernel_dcache_range does thought so I could be wrong.

That means you can ignore the little code below I wrote and
see about doing something like this:


diff --git a/drivers/gpu/drm/radeon/radeon_cp.c b/drivers/gpu/drm/radeon/radeon_cp.c
index 3cae2bb..9e5923d 100644
--- a/drivers/gpu/drm/radeon/radeon_cp.c
+++ b/drivers/gpu/drm/radeon/radeon_cp.c
@@ -876,6 +876,7 @@ static void radeon_test_writeback(drm_radeon_private_t * dev_priv)
 
 	RADEON_WRITE(RADEON_SCRATCH_REG1, 0xdeadbeef);
 
+	flush_kernel_dcache_range(dev_priv->ring_rptr, PAGE_SIZE);
 	for (tmp = 0; tmp < dev_priv->usec_timeout; tmp++) {
 		u32 val;

But that is probably a shot in the dark. I have no clue what the flush_..
is doing.

[edit: And then I noticed sba_iommu.c, which is a complete IOMMU driver
where bus and physical addresses are different. sigh. What type of machine
is this? Does it have the IOMMU in it?] 
> 
> > If you dump them and try
> > to figure out if bus adress -> physical address -> virtual address ==
> > virtual address -> bus address that could help
> 
> With following
> 
> radeon/radeon_ttm.c:
> 
> radeon_ttm_tt_populate():
> ...
> for (i = 0; i < ttm->num_pages; i++) {
>                 gtt->ttm.dma_address[i] = pci_map_page(rdev->pdev, ttm->pages[i],
>                                                        0, PAGE_SIZE,
>                                                        PCI_DMA_BIDIRECTIONAL);
> 
>                 void *va = bus_to_virt(gtt->ttm.dma_address[i]);
>                 if ((phys_addr_t) va != virt_to_bus(va)) {

You are missing a translation here (you were comparing the virtual address
to the bus address). I was thinking something along this:

		unsigned int pfn = page_to_pfn(ttm->pages[i]);
		dma_addr_t bus =  gtt->ttm.dma_address[i];
		void *va_bus, *va, *va_pfn;

		if ((pfn << PAGE_SHIFT) != bus)
			printk("Bus 0x%lx != PFN 0x%lx, bus, pfn << PAGE_SHIFT); /* OK, that means
			bus addresses are different */

		va_bus = bus_to_virt(gtt->ttm.dma_address[i]);
		va_pfn = __va(pfn << PAGE_SHIFT);

		if (!virt_addr_valid(va_bus))
			printk("va_bus (0x%lx) not good!\n", va_bus);
		if (!virt_addr_valid(va_pfn))
			printk("va_pfn (0x%lx) not good!\n", va_pfn);
			
		/* We got VA for both bus -> va, and pfn -> va. Should be the
		   same if bus and physical addresses are on the same namespace. */
		if (va_bus != va_pfn)
			printk("va bus:%lx != va pfn: %lx\n", va_bus, va_pfn);

		/* Now that we have bus -> pa -> va (va_bus) try to go va_bus -> bus address.
		   The bus address should be the same */
		if (gtt->tmm.dma_address[i] != virt_to_bus(va_bus))
			printk("bus->pa->va:%lx != bus->pa->va->ba: %lx\n", gtt->tmm.dma_address[i],virt_to_bus(va_bus));
		
>                      DRM_INFO("MISMATCH: %p != %p\n", va, (void *) virt_to_bus(va));
>                      /*DRM_INFO("CONTENTS: %x\n", *((uint32_t *)va));*/ // Leads to a Kernel Fault

That is odd. I would have thought it would be usuable.

>                      ...
>                 }
> 
> I'm getting the output:
> 
> [drm] MISMATCH: 0000000080280000 != 0000000040280000

In theory that means the bus address that is programmed in (gtt->dma_address[i])
is 0000000040280000 (which is what virt_to_bus(va) should have resolved itself to).


Tha you can't get access to 'va' (0000000080280000) is odd. One way to try to
access it is to do:

	va = __va(page_to_pfn(ttm->pages[i]) << PAGE_SHIFT);
	DRM_INFO("CONTENTS: %x\n", *((uint32_t)va));

As that would get it via the page -> va.


More information about the dri-devel mailing list