[Intel-gfx] [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v10

Fri Jun 21 16:27:53 UTC 2019

On Fri, Jun 21, 2019 at 02:06:54PM +0200, Christian König wrote:
> Am 21.06.19 um 12:32 schrieb Daniel Vetter:
> > On Fri, Jun 21, 2019 at 11:55 AM Christian König
> > <ckoenig.leichtzumerken at gmail.com> wrote:
> > > Am 21.06.19 um 11:20 schrieb Daniel Vetter:
> > > > On Tue, Jun 18, 2019 at 01:54:50PM +0200, Christian König wrote:
> > > > [SNIP]
> > > > Imo the below semantics would be much cleaner:
> > > > 
> > > > - invalidate may add new fences
> > > > - invalidate _must_ unmap its mappings
> > > > - an unmap must wait for current fences before the mapping can be
> > > >     released.
> > > > 
> > > > Imo there's no reason why unmap is special, and the only thing where we
> > > > don't use fences to gate access to resources/memory when it's in the
> > > > process of getting moved around.
> > > Well in general I want to avoid waiting for fences as much as possible.
> > > But the key point here is that this actually won't help with the problem
> > > I'm trying to solve.
> > The point of using fences is not to wait on them. I mean if you have
> > the shadow ttm bo on the lru you also don't wait for that fence to
> > retire before you insert the shadow bo onto the lru. You don't even
> > wait when you try to use that memory again, you just pipeline more
> > stuff on top.
> 
> Correct.
> 
> Ok, if I understand it correctly your suggestion is to move the
> responsibility to delay destruction of mappings until they are no longer
> used from the importer to the exporter based on the fences of the
> reservation object.
> 
> I seriously don't think that this is a good idea because you need to move
> the tracking who is using which mapping from the importer to the exporter as
> well. So duplicating quite a bunch of housekeeping.
> 
> On the other hand that we have this house keeping in the importer because we
> get it for free from TTM. But I can't think of a way other memory management
> backends would do this without keeping the sg table around either.
> 
> > In the end it will be the exact same amount of fences and waiting in
> > both solutions. One just leaks less implementationt details (at least
> > in my opinion) across the dma-buf border.
> 
> I agree that leaking implementation details over the DMA-buf border is a bad
> idea.
> 
> But I can assure you that this has absolutely nothing todo with the ghost
> object handling of TTM. ghost objects doesn't even receive an invalidation,
> they are just a possible implementation of the delayed destruction of sg
> tables.
> 
> > > > btw this is like the 2nd or 3rd time I'm typing this, haven't seen your
> > > > thoughts on this yet.
> > > Yeah, and I'm responding for the 3rd time now that you are
> > > misunderstanding why we need this here :)
> > > 
> > > Maybe I can make that clear with an example:
> > > 
> > > 1. You got a sharing between device A (exporter) and B (importer) which
> > > uses P2P.
> > > 
> > > 2. Now device C (importer) comes along and wants to use the DMA-buf
> > > object as well.
> > > 
> > > 3. The handling now figures out that we can't do P2P between device A
> > > and device C (for whatever reason).
> > > 
> > > 4. The map_attachment implementation in device driver A doesn't want to
> > > fail with -EBUSY and migrates the DMA-buf somewhere where both device A
> > > and device C can access it.
> > > 
> > > 5. This migration will result in sending an invalidation event around.
> > > And here it doesn't make sense to send this invalidation event to device
> > > C, because we know that device C is actually causing this event and
> > > doesn't have a valid mapping.
> > Hm I thought the last time around there was a different scenario, with
> > just one importer:
> > 
> > - importer has a mapping, gets an ->invalidate call.
> > - importer arranges for the mappings/usage to get torn down, maybe
> > updating fences, all from ->invalidate. But the mapping itself wont
> > disappear.
> > - exporter moves buffer to new places (for whatever reasons it felt
> > that was the thing to do).
> > - importer does another execbuf, the exporter needs to move the buffer
> > back. Again it calls ->invalidate, but on a mapping it already has
> > called ->invalidate on, and to prevent that silliness we take the
> > importer temporary off the list.
> 
> Mhm, strange I don't remember giving this explanation. It also doesn't make
> to much sense, but see below.

Yeah maybe I mixed things up somewhere. I guess you started with the first
scenario, I replied with "why don't we track this in the exporter or
dma-buf.c then?" and then the thread died out or something.
> 
> > Your scenario here is new, and iirc my suggestion back then was to
> > count the number of pending mappings so you don't go around calling
> > ->invalidate on mappings that don't exist.
> 
> Well the key point is we don't call invalidate on mappings, but we call
> invalidate on attachments.
> 
> When the invalidate on an attachment is received all the importer should at
> least start to tear down all mappings.

Hm, so either we invalidate mappings instead (pretty big change for
dma-buf, but maybe worth it). Or importers need to deal with invalidate on
stuff they're don't even have mapped anywhere anyway.

> > But even if you fix your scenario here there's still the issue that we
> > can receive invalidates on a mapping we've already torn down and which
> > is on the process of disappearing. That's kinda the part I don't think
> > is great semantics.
> 
> Yeah, that is a rather valid point.
> 
> Currently it is perfectly valid to receive an invalidation when you don't
> even have any mappings at all.
> 
> But this is intentional, because otherwise I would need to move the
> housekeeping which mappings are currently made from the importer to the
> exporter.
> 
> And as explained above that would essentially mean double housekeeping.
> 
> > [SNIP]
> > > The reason I don't have that on unmap is that I think migrating things
> > > on unmap doesn't make sense. If you think otherwise it certainly does
> > > make sense to add that there as well.
> > The problem isn't the recursion, but the book-keeping. There's imo two cases:
> 
> Completely agree, yes.
> 
> > - your scenario, where you call ->invalidate on an attachment which
> > doesn't have a mapping. I'll call that very lazy accounting, feels
> > like a bug :-) It's also very easy to fix by keeping track who
> > actually has a mapping, and then you fix it everywhere, not just for
> > the specific case of a recursion into the same caller.
> 
> Yeah, exactly. Unfortunately it's not so easy to handle as just a counter.
> 
> When somebody unmaps a mapping you need to know if that is already
> invalidated or not. And this requires tracking of each mapping.

Yeah we'd need to track mappings. Well, someone has to track mappings, and
atm it seems to be a mix of both importer and exporter (and dma-buf.c).

> > - calling invalidate multiple times. That's my scenario (or your older
> > one), where you call invalidate again on something that's already
> > invalidated. Just keeping track of who actually has a mapping wont fix
> > that, imo the proper fix is to to pipeline the unmapping using fences.
> 
> Unmapping using fences is exactly what I implemented using the TTM ghost
> objects.
> 
> The question is really who should implements this? The exporter or the
> importer?
> 
> > But I guess there's other fixes too possible.
> > 
> > Either way none of this is about recursion, I think the recursive case
> > is simply the one where you've hit this already. Drivers will have to
> > handle all these additional ->invalidates no matter what with your
> > current proposal. After all the point here is that the exporter can
> > move the buffers around whenever it feels like, for whatever reasons.
> 
> The recursion case is still perfectly valid. In the importer I need to
> ignore invalidations which are caused by creating a mapping.
> 
> Otherwise it is perfectly possible that we invalidate a mapping because of
> its creation which will result in creating a new one....
> 
> So even if you fix up your mapping case, you absolutely still need this to
> prevent recursion :)

Hm, but if we stop tracking attachments and instead start tracking
mappings, then how is that possible:
1. importer has no mappings
2. importer creates attachment. still no mapping
3. importer calls dma_buf_attach_map_sg, still no mapping at this point
4. we call into the exporter implementation. still no mapping
5. exporter does whatever it does. still no mapping
6. exporter finishes. conceptually from the dma-buf pov, _this_ is where
the mapping starts to exist.
7. invalidates (hey the exporter maybe changed its mind!) are totally
fine, and will be serialized with ww_mutex.

So I kinda don understand why the exporter here is allowed to call
invalidate too early (the mapping doesn't exist yet from dma-buf pov), and
dma-buf needs to filter it out.

But anywhere else where we might call ->invalidate and there's not yet a
mapping (again purely from dma-buf pov), there the importer is supposed to
do the filter.

Someone needs to keep track of all this, and I want clear
responsibilities. What they are exactly is not that important.

> > For solutions I think there's roughly three:
> > - importers need to deal. You don't like that, I agree
> > - exporters need to deal, probably not much better, but I think
> > stricter contract is better in itself at least.
> > - dma-buf.c keeps better track of mappings and which have been
> > invalidated already
> > 
> > We could also combine the last two with some helpers, e.g. if your
> > exporter really expects importers to delay the unmap until it's no
> > longer in use, then we could do a small helper which puts all these
> > unmaps onto a list with a worker. But I think you want to integrate
> > that into your exporters lru management directly.
> > 
> > > So this is just the most defensive thing I was able to come up with,
> > > which leaves the least possibility for drivers to do something stupid.
> > Maybe we're still talking past each another, but I feel like the big
> > issues are all still there. Problem identified, yes, solved, no.
> 
> Yeah, agreed your last mail sounded like we are absolutely on the same page
> on the problem now.

So I pondered a few ideas while working out:

1) We drop this filtering. Importer needs to keep track of all its
mappings and filter out invalidates that aren't for that specific importer
(either because already invalidated, or not yet mapped, or whatever).
Feels fragile.

2) dma-buf.c keeps track of all the mappings. Will be quite invasive I
think, and will duplicate what either the importer or exporter are already
doing anyway. We might need a map_dynamic_sg so that the invalidate
happens on that, and the invalidate is one-shot (i.e. once unmapped you
can never use the same mapping again, you need to create a new one). Will
probably be quite a bit of code churn.

3) Like two, but instead of creating something new we change the semantics
of attachments. For dynamic dma-buf importers, create a new
dma_buf_attach_and_map_sg, which does both, together, atomically. Same for
unmap. For unmap I see a clever option here:  These attachements are
one-shot only, i.e. you attach_and_map_sg, then you get an invalidate,
after that the attachment is dead. Can't ever remap it. I think this would
neatly solve the "you've invalidated this already" issue. But it is a bit
more code churn I guess. Also, it breaks the idea of using the attachment
to indicate for which devices a buffer should be accessible, so it's not
accidentally pinned into a bad spot. We might still need that even for
dynamic importers.

4) The exporter keeps track of which attachments have a mapping, and
invalidates them individually. Not sure this would actually solve
anything because of the double-invalidate thing. If we go with this we'd
probably need to put the responsibility of delaying the unmap after the
corresponding fence has signalled also onto the exporter. So importer
would unmap from it's ->invalidate callback, but exporter needs to obey
the current fences attached to the resv_obj. Also feels a bit fragile
since it depends upon the exporter instead of shared code to do the right
thing.

I think there's probably a few more variants that might work, but those is
what I came up with.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch