Adreno crash on i.MX53 running 5.3-rc6

Robin Murphy robin.murphy at arm.com
Tue Sep 10 14:35:11 UTC 2019


On 05/09/2019 20:05, Rob Clark wrote:
> On Thu, Sep 5, 2019 at 10:03 AM Rob Clark <robdclark at gmail.com> wrote:
>>
>> On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy <robin.murphy at arm.com> wrote:
>>>
>>> On 04/09/2019 01:12, Rob Clark wrote:
>>>> On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam <festevam at gmail.com> wrote:
>>>>>
>>>>> Hi Jonathan,
>>>>>
>>>>> On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek <jonathan at marek.ca> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I tried this and it works with patches 4+5 from Rob's series and
>>>>>> changing gpummu to use sg_phys(sg) instead of sg->dma_address
>>>>>> (dma_address isn't set now that dma_map_sg isn't used).
>>>>>
>>>>> Thanks for testing it. I haven't had a chance to test it yet.
>>>>>
>>>>> Rob,
>>>>>
>>>>> I assume your series is targeted to 5.4, correct?
>>>>
>>>> maybe, although Christoph Hellwig didn't seem like a big fan of
>>>> exposing cache ops, and would rather add a new allocation API for
>>>> uncached pages.. so I'm not entirely sure what the way forward will
>>>> be.
>>>
>>> TBH, the use of map/unmap looked reasonable in the context of
>>> "start/stop using these pages for stuff which may include DMA", so even
>>> if it was cheekily ignoring sg->dma_address I'm not sure I'd really
>>> consider it "abuse" - in comparison, using sync without a prior map
>>> unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG
>>> will be rendered useless with false positives if this driver is active
>>> while trying to debug something else.
>>>
>>> The warning referenced in 0036bc73ccbe represents something being
>>> unmapped which didn't match a corresponding map - from what I can make
>>> of get_pages()/put_pages() it looks like that would need msm_obj->flags
>>> or msm_obj->sgt to change during the lifetime of the object, neither of
>>> which sounds like a thing that should legitimately happen. Are you sure
>>> this isn't all just hiding a subtle bug elsewhere? After all, if what
>>> was being unmapped wasn't right, who says that what's now being synced is?
>>>
>>
>> Correct, msm_obj->flags/sgt should not change.
>>
>> I reverted the various patches, and went back to the original setup
>> that used dma_{map,unmap}_sg() to reproduce the original issue that
>> prompted the change in the first place.  It is a pretty massive flood
>> of splats, which pretty quickly overflowed the dmesg ring buffer, so I
>> might be missing some things, but I'll poke around some more.
>>
>> The one thing I wonder about, what would happen if the buffer is
>> allocated and dma_map_sg() called before drm/msm attaches it's own
>> iommu_domains, and then dma_unmap_sg() afterwards.  We aren't actually
>> ever using the iommu domain that DMA API is creating for the device,
>> so all the extra iommu_map/unmap (and tlb flush) is at best
>> unnecessary.  But I'm not sure if it could be having some unintended
>> side effects that cause this sort of problem.

Right, one of the semi-intentional side-effects of 43c5bf11a610 is that 
iommu-dma no longer interferes with unmanaged domains - it will still go 
and make its own redundant mappings in the unattached default domain, 
but as long as the DMA API usage is fundamentally sound then it 
shouldn't actually get in the way.
> it seems like every time (or at least every time we splat), we end up
> w/ iova=fffffffffffff000 .. which doesn't sound likely to be right.
> Although from just looking at the dma-iommu.c code, I'm not sure how
> this happens.  And adding some printk's results in enough traces that
> I can't boot for some reason..

Yeah, that's a bogus IOVA for sure, so regardless of how we actually 
make Adreno happy it would still be interesting to figure out how it 
came about. Do you see any WARNs from io-pgtable-arm before the one from 
__iommu_dma_unmap()?

Robin.


More information about the dri-devel mailing list