[Intel-gfx] Failure with swiotlb

Mathieu Taillefumier mathieu.taillefumier at free.fr
Wed Jan 6 16:25:51 CET 2010


On 01/05/2010 04:20 PM, Zhenyu Wang wrote:
> On 2010.01.04 13:11:56 -0800, Eric Anholt wrote:
>> On Mon, 4 Jan 2010 17:27:45 +0800, Zhenyu Wang<zhenyuw at linux.intel.com>  wrote:
>>> On 2009.12.31 12:33:06 +0800, Zhenyu Wang wrote:
>>>> On 2009.12.30 10:26:27 +0000, David Woodhouse wrote:
>>>>> On Wed, 2009-12-30 at 11:02 +0800, Zhenyu Wang wrote:
>>>>>> We have .31->.32 regression as reported in
>>>>>> http://bugs.freedesktop.org/show_bug.cgi?id=25690
>>>>>> http://bugzilla.kernel.org/show_bug.cgi?id=14627
>>>>>>
>>>>>> It's triggered on non VT-d machine (or machine that should have VT-d,
>>>>>> but no way to turn it on in BIOS.) and with large memory, and swiotlb
>>>>>> is used for PCI dma ops. swiotlb uses a bounce buffer to copy between
>>>>>> CPU pages and real pages made for DMA, but we can't make it real coherent
>>>>>> as we don't call pci_dma_sync_single_for_cpu() alike APIs. And in GEM
>>>>>> domain change, we also can't flush pages for bounce buffer. It looks like
>>>>>> our usual non-cache-coherent graphics device can't love swiotlb.
>>>>>>
>>>>>> This patch trys to only handle pci dma mapping in case of real iommu
>>>>>> hardware detected, the only case for that is VT-d. And fallback to origin
>>>>>> method to insert physical page directly in other case. This fixes the
>>>>>> GPU hang on our Q965 with 8G memory in 64-bit OS. Comments?
>>>>>
>>>>> I don't understand. Why is swiotlb doing anything here anyway, when the
>>>>> device has a dma_mask of 36 bits?
>>>>>
>>>>> Shouldn't dma_capable() return 1, causing swiotlb_map_page() to return
>>>>> the original address unmangled?
>>>>
>>>> Good point, I didn't look into swiotlb code, coz my debug showed  it returned
>>>> mangled dma address. So looks the real problem is 36 bit dma mask got corrupted
>>>> somehow, which matches first report in fd.o bug 25690.
>>>>
>>>> Looks we should setup dma mask in drm/i915 driver too, as they both operate on
>>>> graphics device. But I can't test that on our 8G mem machine until after new year.
>>>>
>>>
>>> Finally caught it! It's within drm_pci_alloc() which will try to setup dma mask
>>> for pci_dev again! That is used for physical address based hardware status page
>>> for 965G (i915_init_phys_hws()), as alloc with pci coherent interface. But trying
>>> to set mask again in an alloc function looks wrong to me, and driver should setup
>>> their own consistent dma mask according to hw.
>>>
>>> So following patch trys to remove mask setting in drm_pci_alloc(), which fixed
>>> the origin problem as dma mask now has the right 36bit setting on intel hw. I
>>> can't test if ati bits looks correct, Dave?
>>>
>>> As intel hws page does support 36bit physical address, that will be another patch
>>> for setup pci consistent 36bit mask for it. Any comment?
>>
>> Looks like this patch doesn't set the dma mask that used to get set for
>> the drivers that were relying on it.  Once all the drivers are fixed to
>> set it up at load time, this seems like a good interface fix.
>
> In my patch all removed ones were 32bit mask, which is pci dma default mask.
> So if driver doesn't set dma mask before, it should also be fine with this
> change.

This failure also seems to be responsible for the bug 25510 since 
applying the patch to the last git kernel fix it. I will add a comment 
on the bug 25510 file. I was not able to apply the patch on the 
v2.6.32.x series because of multiple declarations though.

Mathieu




More information about the Intel-gfx mailing list