[Intel-gfx] [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v10

Christian König ckoenig.leichtzumerken at gmail.com
Fri Jun 21 12:06:54 UTC 2019


Am 21.06.19 um 12:32 schrieb Daniel Vetter:
> On Fri, Jun 21, 2019 at 11:55 AM Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>> Am 21.06.19 um 11:20 schrieb Daniel Vetter:
>>> On Tue, Jun 18, 2019 at 01:54:50PM +0200, Christian König wrote:
>>> [SNIP]
>>> Imo the below semantics would be much cleaner:
>>>
>>> - invalidate may add new fences
>>> - invalidate _must_ unmap its mappings
>>> - an unmap must wait for current fences before the mapping can be
>>>     released.
>>>
>>> Imo there's no reason why unmap is special, and the only thing where we
>>> don't use fences to gate access to resources/memory when it's in the
>>> process of getting moved around.
>> Well in general I want to avoid waiting for fences as much as possible.
>> But the key point here is that this actually won't help with the problem
>> I'm trying to solve.
> The point of using fences is not to wait on them. I mean if you have
> the shadow ttm bo on the lru you also don't wait for that fence to
> retire before you insert the shadow bo onto the lru. You don't even
> wait when you try to use that memory again, you just pipeline more
> stuff on top.

Correct.

Ok, if I understand it correctly your suggestion is to move the 
responsibility to delay destruction of mappings until they are no longer 
used from the importer to the exporter based on the fences of the 
reservation object.

I seriously don't think that this is a good idea because you need to 
move the tracking who is using which mapping from the importer to the 
exporter as well. So duplicating quite a bunch of housekeeping.

On the other hand that we have this house keeping in the importer 
because we get it for free from TTM. But I can't think of a way other 
memory management backends would do this without keeping the sg table 
around either.

> In the end it will be the exact same amount of fences and waiting in
> both solutions. One just leaks less implementationt details (at least
> in my opinion) across the dma-buf border.

I agree that leaking implementation details over the DMA-buf border is a 
bad idea.

But I can assure you that this has absolutely nothing todo with the 
ghost object handling of TTM. ghost objects doesn't even receive an 
invalidation, they are just a possible implementation of the delayed 
destruction of sg tables.

>>> btw this is like the 2nd or 3rd time I'm typing this, haven't seen your
>>> thoughts on this yet.
>> Yeah, and I'm responding for the 3rd time now that you are
>> misunderstanding why we need this here :)
>>
>> Maybe I can make that clear with an example:
>>
>> 1. You got a sharing between device A (exporter) and B (importer) which
>> uses P2P.
>>
>> 2. Now device C (importer) comes along and wants to use the DMA-buf
>> object as well.
>>
>> 3. The handling now figures out that we can't do P2P between device A
>> and device C (for whatever reason).
>>
>> 4. The map_attachment implementation in device driver A doesn't want to
>> fail with -EBUSY and migrates the DMA-buf somewhere where both device A
>> and device C can access it.
>>
>> 5. This migration will result in sending an invalidation event around.
>> And here it doesn't make sense to send this invalidation event to device
>> C, because we know that device C is actually causing this event and
>> doesn't have a valid mapping.
> Hm I thought the last time around there was a different scenario, with
> just one importer:
>
> - importer has a mapping, gets an ->invalidate call.
> - importer arranges for the mappings/usage to get torn down, maybe
> updating fences, all from ->invalidate. But the mapping itself wont
> disappear.
> - exporter moves buffer to new places (for whatever reasons it felt
> that was the thing to do).
> - importer does another execbuf, the exporter needs to move the buffer
> back. Again it calls ->invalidate, but on a mapping it already has
> called ->invalidate on, and to prevent that silliness we take the
> importer temporary off the list.

Mhm, strange I don't remember giving this explanation. It also doesn't 
make to much sense, but see below.

> Your scenario here is new, and iirc my suggestion back then was to
> count the number of pending mappings so you don't go around calling
> ->invalidate on mappings that don't exist.

Well the key point is we don't call invalidate on mappings, but we call 
invalidate on attachments.

When the invalidate on an attachment is received all the importer should 
at least start to tear down all mappings.

> But even if you fix your scenario here there's still the issue that we
> can receive invalidates on a mapping we've already torn down and which
> is on the process of disappearing. That's kinda the part I don't think
> is great semantics.

Yeah, that is a rather valid point.

Currently it is perfectly valid to receive an invalidation when you 
don't even have any mappings at all.

But this is intentional, because otherwise I would need to move the 
housekeeping which mappings are currently made from the importer to the 
exporter.

And as explained above that would essentially mean double housekeeping.

> [SNIP]
>> The reason I don't have that on unmap is that I think migrating things
>> on unmap doesn't make sense. If you think otherwise it certainly does
>> make sense to add that there as well.
> The problem isn't the recursion, but the book-keeping. There's imo two cases:

Completely agree, yes.

> - your scenario, where you call ->invalidate on an attachment which
> doesn't have a mapping. I'll call that very lazy accounting, feels
> like a bug :-) It's also very easy to fix by keeping track who
> actually has a mapping, and then you fix it everywhere, not just for
> the specific case of a recursion into the same caller.

Yeah, exactly. Unfortunately it's not so easy to handle as just a counter.

When somebody unmaps a mapping you need to know if that is already 
invalidated or not. And this requires tracking of each mapping.

> - calling invalidate multiple times. That's my scenario (or your older
> one), where you call invalidate again on something that's already
> invalidated. Just keeping track of who actually has a mapping wont fix
> that, imo the proper fix is to to pipeline the unmapping using fences.

Unmapping using fences is exactly what I implemented using the TTM ghost 
objects.

The question is really who should implements this? The exporter or the 
importer?

> But I guess there's other fixes too possible.
>
> Either way none of this is about recursion, I think the recursive case
> is simply the one where you've hit this already. Drivers will have to
> handle all these additional ->invalidates no matter what with your
> current proposal. After all the point here is that the exporter can
> move the buffers around whenever it feels like, for whatever reasons.

The recursion case is still perfectly valid. In the importer I need to 
ignore invalidations which are caused by creating a mapping.

Otherwise it is perfectly possible that we invalidate a mapping because 
of its creation which will result in creating a new one....

So even if you fix up your mapping case, you absolutely still need this 
to prevent recursion :)

> For solutions I think there's roughly three:
> - importers need to deal. You don't like that, I agree
> - exporters need to deal, probably not much better, but I think
> stricter contract is better in itself at least.
> - dma-buf.c keeps better track of mappings and which have been
> invalidated already
>
> We could also combine the last two with some helpers, e.g. if your
> exporter really expects importers to delay the unmap until it's no
> longer in use, then we could do a small helper which puts all these
> unmaps onto a list with a worker. But I think you want to integrate
> that into your exporters lru management directly.
>
>> So this is just the most defensive thing I was able to come up with,
>> which leaves the least possibility for drivers to do something stupid.
> Maybe we're still talking past each another, but I feel like the big
> issues are all still there. Problem identified, yes, solved, no.

Yeah, agreed your last mail sounded like we are absolutely on the same 
page on the problem now.

Christian.

>
> Thanks, Daniel
>
>> Thanks,
>> Christian.
>>



More information about the Intel-gfx mailing list