amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

Mon Jan 14 17:22:18 UTC 2019

On 2019-01-10 3:48 p.m., Christoph Hellwig wrote:
> On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
>>>  From the trace it looks like we git the case where swiotlb tries
>>> to copy back data from a bounce buffer, but hits a dangling or NULL
>>> pointer.  So a couple questions for the submitter:
>>>
>>>   - does the system have more than 4GB memory and thus use swiotlb?
>>>     (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
>>>   - does the device this happens on have a DMA mask smaller than
>>>     the available memory, that is should swiotlb be used here to start
>>>     with?
>>
>> Rather unlikely. The device is an AMD GPU, so we can address memory up to 
>> 1TB.
> 
> So we probably somehow got a false positive.
> 
> For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> backtrace really is in the swiotlb code (I can't think of anything else,
> but I'd rather be sure).
> 
> Second it would be great to print what the contents of io_tlb_start
> and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
> maybe that gives a clue why we are hitting the swiotlb code here.

Any progress? https://bugzilla.kernel.org/show_bug.cgi?id=202261 was
also filed about this.

I hope everyone's clear that this needs to be resolved one way or
another by 5.0 final (though the sooner, the better :).

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer