amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out

Deucher, Alexander Alexander.Deucher at amd.com
Tue Mar 21 16:30:55 UTC 2017


> -----Original Message-----
> From: 'joro at 8bytes.org' [mailto:joro at 8bytes.org]
> Sent: Tuesday, March 21, 2017 12:26 PM
> To: Deucher, Alexander
> Cc: Bridgman, John; Alex Deucher; Daniel Drake; Chris Chiu; amd-
> gfx at lists.freedesktop.org; Nath, Arindam; iommu at lists.linux-
> foundation.org; Suthikulpanit, Suravee; Linux Upstreaming Team
> Subject: Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait
> loop timed out
> 
> On Tue, Mar 21, 2017 at 04:17:40PM +0000, Deucher, Alexander wrote:
> > > -----Original Message-----
> > > From: 'joro at 8bytes.org' [mailto:joro at 8bytes.org]
> > > Sent: Tuesday, March 21, 2017 12:11 PM
> > > To: Deucher, Alexander
> > > Cc: Alex Deucher; Daniel Drake; Chris Chiu; amd-
> gfx at lists.freedesktop.org;
> > > Nath, Arindam; iommu at lists.linux-foundation.org; Suthikulpanit,
> Suravee;
> > > Linux Upstreaming Team
> > > Subject: Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-
> Wait
> > > loop timed out
> > >
> > > On Tue, Mar 21, 2017 at 04:01:53PM +0000, Deucher, Alexander wrote:
> > > > It seems to only affect Stoney systems, but not others (Carrizo,
> > > > Bristol, etc.).  Maybe we could just disable it on Stoney until we
> > > > root cause it.
> > >
> > > Completion-wait loop timeouts indicate something is seriously wrong.
> How
> > > can I detect whether I am running on a 'Stoney' system?
> >
> > + John
> >
> > I'm not sure if the iommu ids are different on stoney systems compared to
> Carrizo/Bristol systems.  The pci ids of the GPUs are different.  Stoney parts
> have 0x98E4 as the pci id for the GPU.
> >
> > >
> > > Other question, a shot into the dark, does the GPU on these systems
> have
> > > ATS? Probably yes, as they are likely HSA compatible.
> >
> > Stoney is a small APU.  Kind of a mini Carrizo.  While it may claim to support
> ATS, I don't think it was ever validated on Stoney, only Carrizo/Bristol.
> 
> Okay, so maybe ATS is broken in some way on these chips. When queue
> flushes happen it will also send the ATS-invalidates, and a queue flush
> can cause a storm of those. This may be the issue.
> 
> I am preparing a debug-patch that disables ATS for these GPUs so someone
> with such a chip can test it.

Thanks Joerg.

Alex



More information about the amd-gfx mailing list