[Intel-gfx] Failure with swiotlb
zhen78 at gmail.com
Tue Jan 5 07:20:03 PST 2010
On 2010.01.04 13:11:56 -0800, Eric Anholt wrote:
> On Mon, 4 Jan 2010 17:27:45 +0800, Zhenyu Wang <zhenyuw at linux.intel.com> wrote:
> > On 2009.12.31 12:33:06 +0800, Zhenyu Wang wrote:
> > > On 2009.12.30 10:26:27 +0000, David Woodhouse wrote:
> > > > On Wed, 2009-12-30 at 11:02 +0800, Zhenyu Wang wrote:
> > > > > We have .31->.32 regression as reported in
> > > > > http://bugs.freedesktop.org/show_bug.cgi?id=25690
> > > > > http://bugzilla.kernel.org/show_bug.cgi?id=14627
> > > > >
> > > > > It's triggered on non VT-d machine (or machine that should have VT-d,
> > > > > but no way to turn it on in BIOS.) and with large memory, and swiotlb
> > > > > is used for PCI dma ops. swiotlb uses a bounce buffer to copy between
> > > > > CPU pages and real pages made for DMA, but we can't make it real coherent
> > > > > as we don't call pci_dma_sync_single_for_cpu() alike APIs. And in GEM
> > > > > domain change, we also can't flush pages for bounce buffer. It looks like
> > > > > our usual non-cache-coherent graphics device can't love swiotlb.
> > > > >
> > > > > This patch trys to only handle pci dma mapping in case of real iommu
> > > > > hardware detected, the only case for that is VT-d. And fallback to origin
> > > > > method to insert physical page directly in other case. This fixes the
> > > > > GPU hang on our Q965 with 8G memory in 64-bit OS. Comments?
> > > >
> > > > I don't understand. Why is swiotlb doing anything here anyway, when the
> > > > device has a dma_mask of 36 bits?
> > > >
> > > > Shouldn't dma_capable() return 1, causing swiotlb_map_page() to return
> > > > the original address unmangled?
> > >
> > > Good point, I didn't look into swiotlb code, coz my debug showed it returned
> > > mangled dma address. So looks the real problem is 36 bit dma mask got corrupted
> > > somehow, which matches first report in fd.o bug 25690.
> > >
> > > Looks we should setup dma mask in drm/i915 driver too, as they both operate on
> > > graphics device. But I can't test that on our 8G mem machine until after new year.
> > >
> > Finally caught it! It's within drm_pci_alloc() which will try to setup dma mask
> > for pci_dev again! That is used for physical address based hardware status page
> > for 965G (i915_init_phys_hws()), as alloc with pci coherent interface. But trying
> > to set mask again in an alloc function looks wrong to me, and driver should setup
> > their own consistent dma mask according to hw.
> > So following patch trys to remove mask setting in drm_pci_alloc(), which fixed
> > the origin problem as dma mask now has the right 36bit setting on intel hw. I
> > can't test if ati bits looks correct, Dave?
> > As intel hws page does support 36bit physical address, that will be another patch
> > for setup pci consistent 36bit mask for it. Any comment?
> Looks like this patch doesn't set the dma mask that used to get set for
> the drivers that were relying on it. Once all the drivers are fixed to
> set it up at load time, this seems like a good interface fix.
In my patch all removed ones were 32bit mask, which is pci dma default mask.
So if driver doesn't set dma mask before, it should also be fine with this
Radeon KMS driver has already handled dma mask for AGP or PCI-e, but in
radeon_cp_init() looks it always trys to set 32bit mask. Is there problem here?
As in my patch I tried to keep mask setting for radeon and r128 in drm_ati_pcigart_init(),
not sure if that would create problem on some model of radeon...
More information about the Intel-gfx