drm/radeon: "ring test failed" on PA-RISC Linux

Konrad Rzeszutek Wilk konrad.wilk at oracle.com
Mon Sep 23 13:11:32 PDT 2013


On Sat, Sep 21, 2013 at 07:39:10AM +0400, Alex Ivanov wrote:
> 21.09.2013, в 1:27, Alex Deucher <alexdeucher at gmail.com> написал(а):
> 
> > On Tue, Sep 17, 2013 at 3:33 PM, Alex Ivanov <gnidorah at p0n4ik.tk> wrote:
> >> 17.09.2013, в 18:24, Alex Deucher <alexdeucher at gmail.com> написал(а):
> >> 
> >>> On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov <gnidorah at p0n4ik.tk> wrote:
> >>>> Alex,
> >>>> 
> >>>> 10.09.2013, в 16:37, Alex Deucher <alexdeucher at gmail.com> написал(а):
> >>>> 
> >>>>> The dummy page isn't really going to help much.  That page is just
> >>>>> used as a safety placeholder for gart entries that aren't mapped on
> >>>>> the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
> >>>>> the backing pages for the gart.
> >>>> 
> >>>>> You may want to look there.
> >>>> 
> >>>> Ah, sorry. Indeed. Though, my idea with:
> >>>> 
> >>>> On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov <gnidorah at p0n4ik.tk> wrote:
> >>>> 
> >>>>> Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
> >>>>> dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(
> >>>> 
> >>>> doesn't make a sense at TTM part as well.
> >>> 
> >>> After the driver is loaded, you can dump some info from debugfs:
> >>> r100_rbbm_info
> >>> r100_cp_ring_info
> >>> r100_cp_csq_fifo
> >>> Which will dump a bunch of registers and internal fifos so we can see
> >>> that the chip actually processed.
> >>> 
> >>> Alex
> >> 
> >> Reading of r100_cp_ring_info leads to a KP:
> >> 
> >> r100_debugfs_cp_ring_info():
> >> count = (rdp + ring->ring_size - wdp) & ring->ptr_mask;
> >> i = (rdp + j) & ring->ptr_mask;
> >> 
> >>        for (j = 0; j <= count; j++) {
> >>                i = (rdp + j) & ring->ptr_mask;
> >>                --> Here at first iteration <--
> >>                --> count = 262080, i = 0 <--
> >>                seq_printf(m, "r[%04d]=0x%08x\n", i, ring->ring[i]);
> >>        }
> >> 
> >> Reading of radeon_ring_gfx (which i've additionally tried to read)
> >> throws an MCE:
> >> 
> >> radeon_debugfs_ring_info():
> >> count = (ring->ring_size / 4) - ring->ring_free_dw;
> >> i = (ring->rptr + ring->ptr_mask + 1 - 32) & ring->ptr_mask;
> >> 
> >>        for (j = 0; j <= (count + 32); j++) {
> >>                --> Here at first iteration <--
> >>                --> i = 262112, j = 0 <--
> >>                seq_printf(m, "r[%5d]=0x%08x\n", i, ring->ring[i]);
> >>                i = (i + 1) & ring->ptr_mask;
> >>        }
> >> 
> >> I'm attaching debug outputs on kernel built with these loops commented.
> > 
> > The register writes seems to be going through the register backbone correctly:
> > 
> > [0x00B] 0x15E0=0x00000000
> > [0x00C] 0x15E4=0xCAFEDEAD
> > [0x00D] 0x4274=0x0000000F
> > [0x00E] 0x42C8=0x00000007
> > [0x00F] 0x4018=0x0000001D
> > [0x010] 0x170C=0x80000000
> > [0x011] 0x3428=0x00020100
> > [0x012] 0x15E4=0xCAFEDEAD
> > 
> > You can see the 0xCAFEDEAD written to the scratch register via MMIO
> > from the ring_test(). The CP fifo however seems to be full of garbage.
> > The CP is busy though, so it seems to be functional.  I guess it's
> > just fetching garbage rather than commands.

If it is fetching garbage, that would imply the DMA (or bus addresses)
that are programmed in the GART are bogus. If you dump them and try
to figure out if bus adress -> physical address -> virtual address ==
virtual address -> bus address that could help. And perhaps seeing what
the virtual address has - and or poisoning it with known data?

Or perhaps the the card has picked up an incorrect page table? Meaning
the (bus) address given to it is not the correct one?

> > 
> > Does doing a posted write when writing to the ring buffer help?
> 
> Unfortunately, no.
> 
> > 
> > diff --git a/drivers/gpu/drm/radeon/radeon_ring.c
> > b/drivers/gpu/drm/radeon/radeon_ring.c
> > index a890756..b4f04d2 100644
> > --- a/drivers/gpu/drm/radeon/radeon_ring.c
> > +++ b/drivers/gpu/drm/radeon/radeon_ring.c
> > @@ -324,12 +324,14 @@ static int radeon_debugfs_ring_init(struct
> > radeon_device *rdev, struct radeon_ri
> >  */
> > void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
> > {
> > +       u32 tmp;
> > #if DRM_DEBUG_CODE
> >        if (ring->count_dw <= 0) {
> >                DRM_ERROR("radeon: writing more dwords to the ring
> > than expected!\n");
> >        }
> > #endif
> >        ring->ring[ring->wptr++] = v;
> > +       tmp = ring->ring[ring->wptr - 1];
> >        ring->wptr &= ring->ptr_mask;
> >        ring->count_dw--;
> >        ring->ring_free_dw--;
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


More information about the dri-devel mailing list