Adreno crash on i.MX53 running 5.3-rc6

Rob Clark robdclark at gmail.com
Sat Sep 7 01:21:20 UTC 2019


On Thu, Sep 5, 2019 at 3:30 PM Rob Clark <robdclark at gmail.com> wrote:
>
> On Thu, Sep 5, 2019 at 12:05 PM Rob Clark <robdclark at gmail.com> wrote:
> >
> > On Thu, Sep 5, 2019 at 10:03 AM Rob Clark <robdclark at gmail.com> wrote:
> > >
> > > On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy <robin.murphy at arm.com> wrote:
> > > >
> > > > On 04/09/2019 01:12, Rob Clark wrote:
> > > > > On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam <festevam at gmail.com> wrote:
> > > > >>
> > > > >> Hi Jonathan,
> > > > >>
> > > > >> On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek <jonathan at marek.ca> wrote:
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I tried this and it works with patches 4+5 from Rob's series and
> > > > >>> changing gpummu to use sg_phys(sg) instead of sg->dma_address
> > > > >>> (dma_address isn't set now that dma_map_sg isn't used).
> > > > >>
> > > > >> Thanks for testing it. I haven't had a chance to test it yet.
> > > > >>
> > > > >> Rob,
> > > > >>
> > > > >> I assume your series is targeted to 5.4, correct?
> > > > >
> > > > > maybe, although Christoph Hellwig didn't seem like a big fan of
> > > > > exposing cache ops, and would rather add a new allocation API for
> > > > > uncached pages.. so I'm not entirely sure what the way forward will
> > > > > be.
> > > >
> > > > TBH, the use of map/unmap looked reasonable in the context of
> > > > "start/stop using these pages for stuff which may include DMA", so even
> > > > if it was cheekily ignoring sg->dma_address I'm not sure I'd really
> > > > consider it "abuse" - in comparison, using sync without a prior map
> > > > unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG
> > > > will be rendered useless with false positives if this driver is active
> > > > while trying to debug something else.
> > > >
> > > > The warning referenced in 0036bc73ccbe represents something being
> > > > unmapped which didn't match a corresponding map - from what I can make
> > > > of get_pages()/put_pages() it looks like that would need msm_obj->flags
> > > > or msm_obj->sgt to change during the lifetime of the object, neither of
> > > > which sounds like a thing that should legitimately happen. Are you sure
> > > > this isn't all just hiding a subtle bug elsewhere? After all, if what
> > > > was being unmapped wasn't right, who says that what's now being synced is?
> > > >
> > >
> > > Correct, msm_obj->flags/sgt should not change.
> > >
> > > I reverted the various patches, and went back to the original setup
> > > that used dma_{map,unmap}_sg() to reproduce the original issue that
> > > prompted the change in the first place.  It is a pretty massive flood
> > > of splats, which pretty quickly overflowed the dmesg ring buffer, so I
> > > might be missing some things, but I'll poke around some more.
> > >
> > > The one thing I wonder about, what would happen if the buffer is
> > > allocated and dma_map_sg() called before drm/msm attaches it's own
> > > iommu_domains, and then dma_unmap_sg() afterwards.  We aren't actually
> > > ever using the iommu domain that DMA API is creating for the device,
> > > so all the extra iommu_map/unmap (and tlb flush) is at best
> > > unnecessary.  But I'm not sure if it could be having some unintended
> > > side effects that cause this sort of problem.
> > >
> >
> > it seems like every time (or at least every time we splat), we end up
> > w/ iova=fffffffffffff000 .. which doesn't sound likely to be right.
> > Although from just looking at the dma-iommu.c code, I'm not sure how
> > this happens.  And adding some printk's results in enough traces that
> > I can't boot for some reason..
> >
>
> Ok, I see better what is going on.. at least on the kernel that I'm
> using on the yoga c630 laptop, where I have a patch[1] to skip domain
> attach.  That results in to_smmu_domain(domain)->pgtbl_ops being null,
> so arm_smmu_map() fails.  So we skip __finalise_sg() which sets the
> sg_dma_address().  Which causes the failure on unmap.
>
> That said, I'm pretty sure I've seen (or had reported) a similar splat
> (although maybe not so frequent) on devices without that patch (where
> the bootloader isn't enabling scanout).  I'll have to switch over to a
> different device that doesn't light up display from bootloader, so
> that I can drop that skip-domain-attach patch
>
> All that said, this would be much easier if I could do the cache
> operations without all this unneeded iommu stuff.  (Not to mention the
> unnecessary TLB flushes that I suspect are also happening.)
>
> [1] https://patchwork.kernel.org/patch/11038793/
>

fwiw, with https://patchwork.freedesktop.org/series/63096/ we could go
back to simply using dma_{map,unmap}_sg() in all cases, as the iommu
dma_ops would no longer get in the way.

BR,
-R


More information about the dri-devel mailing list