Adreno crash on i.MX53 running 5.3-rc6

Rob Clark robdclark at gmail.com
Thu Sep 5 17:03:43 UTC 2019


On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy <robin.murphy at arm.com> wrote:
>
> On 04/09/2019 01:12, Rob Clark wrote:
> > On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam <festevam at gmail.com> wrote:
> >>
> >> Hi Jonathan,
> >>
> >> On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek <jonathan at marek.ca> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I tried this and it works with patches 4+5 from Rob's series and
> >>> changing gpummu to use sg_phys(sg) instead of sg->dma_address
> >>> (dma_address isn't set now that dma_map_sg isn't used).
> >>
> >> Thanks for testing it. I haven't had a chance to test it yet.
> >>
> >> Rob,
> >>
> >> I assume your series is targeted to 5.4, correct?
> >
> > maybe, although Christoph Hellwig didn't seem like a big fan of
> > exposing cache ops, and would rather add a new allocation API for
> > uncached pages.. so I'm not entirely sure what the way forward will
> > be.
>
> TBH, the use of map/unmap looked reasonable in the context of
> "start/stop using these pages for stuff which may include DMA", so even
> if it was cheekily ignoring sg->dma_address I'm not sure I'd really
> consider it "abuse" - in comparison, using sync without a prior map
> unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG
> will be rendered useless with false positives if this driver is active
> while trying to debug something else.
>
> The warning referenced in 0036bc73ccbe represents something being
> unmapped which didn't match a corresponding map - from what I can make
> of get_pages()/put_pages() it looks like that would need msm_obj->flags
> or msm_obj->sgt to change during the lifetime of the object, neither of
> which sounds like a thing that should legitimately happen. Are you sure
> this isn't all just hiding a subtle bug elsewhere? After all, if what
> was being unmapped wasn't right, who says that what's now being synced is?
>

Correct, msm_obj->flags/sgt should not change.

I reverted the various patches, and went back to the original setup
that used dma_{map,unmap}_sg() to reproduce the original issue that
prompted the change in the first place.  It is a pretty massive flood
of splats, which pretty quickly overflowed the dmesg ring buffer, so I
might be missing some things, but I'll poke around some more.

The one thing I wonder about, what would happen if the buffer is
allocated and dma_map_sg() called before drm/msm attaches it's own
iommu_domains, and then dma_unmap_sg() afterwards.  We aren't actually
ever using the iommu domain that DMA API is creating for the device,
so all the extra iommu_map/unmap (and tlb flush) is at best
unnecessary.  But I'm not sure if it could be having some unintended
side effects that cause this sort of problem.

BR,
-R


More information about the dri-devel mailing list