[PATCH RFC 1/1] drm/radeon: fix bad DMA from INTERRUPT_CNTL2

Alex Deucher alexdeucher at gmail.com
Thu Nov 14 23:09:26 UTC 2019


On Thu, Nov 14, 2019 at 6:08 PM Sam Bobroff <sbobroff at linux.ibm.com> wrote:
>
> On Thu, Nov 14, 2019 at 12:05:19AM -0500, Alex Deucher wrote:
> > On Wed, Nov 13, 2019 at 9:53 PM Sam Bobroff <sbobroff at linux.ibm.com> wrote:
> > >
> > > Currently, binding the radeon driver to a Radeon FirePro (PCI
> > > 1002:68f2) that has been passed through to a guest on a Power8 machine
> > > causes a bad DMA read which causes the device to be frozen, leading to
> > > the driver failing to bind and other problems.
> > >
> > > The bad DMA is to the address written into the INTERRUPT_CNTL2
> > > register during r600_irq_init(), and it can be fixed by substituting a
> > > valid (dummy) DMA address.
> >
> > The patch is correct.  INTERRUPT_CNTL2 takes a bus address not a GPU
> > MC address.  Care to update si.c and cik.c as well?
>
> Sure (although I can't test those cases). I see that there's one other
> case in si_ih.c (although it's for amdgpu, not radeon); should I include
> that as well?
>

Yes, please.

thanks,

Alex

> > Thanks,
> >
> > Alex
> >
> > > ---
> > > Hello,
> > >
> > > I've been tracking down a bug, described above, and I've been able to hack
> > > in a workaround but I could use some help understanding how to fix it
> > > properly.
> > >
> > > What seems to be happening is that when the first CRTC is enabled (when
> > > evergreen_irq_set() writes to GRPH_INT_CONTROL) the device immediately
> > > performs a DMA read from the address that's been programmed into
> > > INTERRUPT_CNTL2 by r600_irq_init().
> > >
> > > That address isn't a valid DMA address, so it triggers the problem. The
> > > address comes from rdev->ih.gpu_addr, which seems to be a 'linear GPU
> > > address', calculated from the size of the card's VRAM. It definitely hasn't
> > > come from a DMA mapping operation.
> > >
> > > Based on the nearby comment, "set dummy read address to ring address", I
> > > tried substituting a valid dummy DMA address (using the address mapped to
> > > the 'dummy page' used by the GART) and it does prevent the problem.
> > > However, I don't know what that register is supposed to do or what
> > > information the device might be communicating via that DMA, so presumably
> > > it breaks something.
> > >
> > > Note: I have tried loading the patched driver with "test=1" and all the
> > > self tests succeed.
> > >
> > > Could anyone offer some insight into this problem?
> > > What does that register do? What kind of address is it expecting?
> > > What might be a good way of fixing it?
> > >
> > > Thanks in advance,
> > > Sam.
> > >
> > >  drivers/gpu/drm/radeon/r600.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
> > > index e937cc01910d..033bc466a862 100644
> > > --- a/drivers/gpu/drm/radeon/r600.c
> > > +++ b/drivers/gpu/drm/radeon/r600.c
> > > @@ -3696,8 +3696,8 @@ int r600_irq_init(struct radeon_device *rdev)
> > >         }
> > >
> > >         /* setup interrupt control */
> > > -       /* set dummy read address to ring address */
> > > -       WREG32(INTERRUPT_CNTL2, rdev->ih.gpu_addr >> 8);
> > > +       /* set dummy read address to dummy page address */
> > > +       WREG32(INTERRUPT_CNTL2, rdev->dummy_page.addr >> 8);
> > >         interrupt_cntl = RREG32(INTERRUPT_CNTL);
> > >         /* IH_DUMMY_RD_OVERRIDE=0 - dummy read disabled with msi, enabled without msi
> > >          * IH_DUMMY_RD_OVERRIDE=1 - dummy read controlled by IH_DUMMY_RD_EN
> > > --
> > > 2.22.0.216.g00a2a96fc9
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel


More information about the dri-devel mailing list