[PATCH] of/device: add blacklist for iommu dma_ops

Thierry Reding thierry.reding at gmail.com
Mon Jun 3 13:54:08 UTC 2019


On Mon, Jun 03, 2019 at 06:20:57AM -0700, Rob Clark wrote:
> On Mon, Jun 3, 2019 at 4:14 AM Robin Murphy <robin.murphy at arm.com> wrote:
> >
> > On 03/06/2019 11:47, Rob Clark wrote:
> > > On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa <tfiga at chromium.org> wrote:
> > >>
> > >> On Mon, Jun 3, 2019 at 4:40 AM Rob Clark <robdclark at gmail.com> wrote:
> > >>>
> > >>> So, another case I've come across, on the display side.. I'm working
> > >>> on handling the case where bootloader enables display (and takes iommu
> > >>> out of reset).. as soon as DMA domain gets attached we get iommu
> > >>> faults, because bootloader has already configured display for scanout.
> > >>> Unfortunately this all happens before actual driver is probed and has
> > >>> a chance to intervene.
> > >>>
> > >>> It's rather unfortunate that we tried to be clever rather than just
> > >>> making drivers call some function to opt-in to the hookup of dma iommu
> > >>> ops :-(
> > >>
> > >> I think it still works for the 90% of cases and if 10% needs some
> > >> explicit work in the drivers, that's better than requiring 100% of the
> > >> drivers to do things manually.
> >
> > Right, it's not about "being clever", it's about not adding opt-in code
> > to the hundreds and hundreds and hundreds of drivers which *might* ever
> > find themselves used on a system where they would need the IOMMU's help
> > for DMA.
> 
> Well, I mean, at one point we didn't do the automatic iommu hookup, we
> could have just stuck with that and added a helper so drivers could
> request the hookup.  Things wouldn't have been any more broken than
> before, and when people bring up systems where iommu is required, they
> could have added the necessary dma_iommu_configure() call.  But that
> is water under the bridge now.
> 
> > >> Adding Marek who had the same problem on Exynos.
> > >
> > > I do wonder how many drivers need to iommu_map in their ->probe()?
> > > I'm thinking moving the auto-hookup to after a successful probe(),
> > > with some function a driver could call if they need mapping in probe,
> > > might be a way to eventually get rid of the blacklist.  But I've no
> > > idea how to find the subset of drivers that would be broken without a
> > > dma_setup_iommu_stuff() call in their probe.
> >
> > Wouldn't help much. That particular issue is nothing to do with DMA ops
> > really, it's about IOMMU initialisation. On something like SMMUv3 there
> > is literally no way to turn the thing on without blocking unknown
> > traffic - it *has* to have stream table entries programmed before it can
> > even allow stuff to bypass.
> 
> fwiw, on these sdm850 laptops (and I think sdm845 boards like mtp and
> db845c) the SMMU (v2) is taken out of bypass by the bootloader.  Bjorn
> has some patches for arm-smmu to read-back the state at boot.
> 
> (Although older devices were booting with display enabled but SMMU in bypass.)
> 
> > The answer is either to either pay attention to the "Quiesce all DMA
> > capable devices" part of the boot protocol (which has been there since
> > pretty much forever), or to come up with some robust way of
> > communicating "live" boot-time mappings to IOMMU drivers so that they
> > can program themselves appropriately during probe.
> 
> Unfortunately display lit up by bootloader is basically ubiquitous.
> Every single android phone does it.  All of the windows-arm laptops do
> it.  Basically 99.9% of things that have a display do it.  It's a
> tough problem to solve involving clks, gdsc, regulators, etc, in
> addition to the display driver.. and sadly now smmu.  And devices
> where we can make changes to and update the firmware are rather rare.
> So there is really no option to ignore this problem.

I think this is going to require at least some degree of cooperation
from the bootloader. See my other thread on that. Unfortunately I think
this is an area where everyone has kind of been doing their own thing
even if standard bindings for this have been around for quite a while
(actually 5 years by now). I suspect that most bootloaders that run
today are not that old, but as always downstream doesn't follow closely
where upstream is guiding.

> I guess if we had some early-quirks mechanism like x86 does, we could
> mash the display off early in boot.  That would be an easy solution.
> Although I'd prefer a proper solution so that android phones aren't
> carrying around enormous stacks of hack patches to achieve a smooth
> flicker-free boot.

The proper solution, I think, is for bootloader and kernel to work
together. Unfortunately that means we'll just have to bite the bullet
and get things fixed across the stack. I think this is just the latest
manifestation of the catch-up that upstream has been playing. Only now
that we're starting to enable all of these features upstream are we
running into interoperability issues.

If upstream had been further along we would've caught these issues way
ahead of time and could've influenced the designs of bootloader much
earlier. Now, unless we get all the vendors to go back and modify 5 year
old code that's going to be difficult.

However, I think Robin has a point here: it's clearly documented in the
boot protocol, so technically bootloaders are buggy and we can't always
go and fix things so that buggy bootloaders continue to work. There's
not a whole lot of incentive for anyone to fix the bootloaders if we
keep doing that, ey?

> I suppose arm-smmu could realize that the context bank is already
> taken out of bypass..  although I'm not entirely sure if we can assume
> that the CPU would be able to read back the pagetable setup by the
> bootloader.  Maybe Vivek has an idea about that?

I've been thinking, though, and I'm not sure I see this issue on newer
Tegras that use the ARM SMMU. I suppose this could be because the
bootloader doesn't enable the SMMU, but then again, I should be getting
these faults even so. Interestingly I don't see those on 64-bit ARM, so
perhaps there's something special about that.

Let me reload a bit of context here so I can give you more than just
speculation.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190603/565df2e2/attachment.sig>


More information about the dri-devel mailing list