amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out

Deucher, Alexander Alexander.Deucher at amd.com
Tue Mar 21 16:01:53 UTC 2017


> -----Original Message-----
> From: joro at 8bytes.org [mailto:joro at 8bytes.org]
> Sent: Tuesday, March 21, 2017 11:57 AM
> To: Alex Deucher
> Cc: Daniel Drake; Deucher, Alexander; Chris Chiu; amd-
> gfx at lists.freedesktop.org; Nath, Arindam; iommu at lists.linux-
> foundation.org; Suthikulpanit, Suravee; Linux Upstreaming Team
> Subject: Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait
> loop timed out
> 
> On Fri, Mar 17, 2017 at 11:53:09AM -0400, Alex Deucher wrote:
> > On Fri, Mar 17, 2017 at 8:15 AM, Daniel Drake <drake at endlessm.com>
> wrote:
> > > Hi,
> > >
> > > On Mon, Mar 13, 2017 at 2:01 PM, Deucher, Alexander
> > > <Alexander.Deucher at amd.com> wrote:
> > >> > We are unable to boot Acer Aspire E5-553G (AMD FX-9800P RADEON
> R7) nor
> > >> > Acer Aspire E5-523 with standard configurations because during boot
> > >> > the screen is flooded with the following error message over and over:
> > >> >
> > >> >   AMD-Vi: Completion-Wait loop timed out
> > >>
> > >> We ran into similar issues and bisected it to commit
> b1516a14657acf81a587e9a6e733a881625eee53.  I'm not too familiar with the
> IOMMU hardware to know if this is an iommu or display driver issue yet.
> > >
> > > We can confirm that reverting this commit solves the issue.
> > >
> > > Given that that commit is an optimization, but it has introduced a
> > > regression on multiple platforms, and has been like this for 8 months,
> > > it would be common practice to now revert this patch upstream until
> > > the regression is fixed. Could you please send a new patch to do this?
> > >
> > > Also, we would be happy to test any real solutions to this issue while
> > > we still have the affected units in hand.
> >
> > No objections to a revert here.
> 
> Big objection here. Since this only happens with amdgpu so far we
> shouldn't rule out a display-driver issue.
> 
> Reverting that patch basically destroys iommu-performance on AMD
> systems. Doing this for all devices just to make amdgpu working is
> overkill at this stage of the debugging.

It seems to only affect Stoney systems, but not others (Carrizo, Bristol, etc.).  Maybe we could just disable it on Stoney until we root cause it.

Alex



More information about the amd-gfx mailing list