[PATCH 10/12] drm/amdgpu: add independent DMA-buf export v3

Mon May 6 15:10:46 UTC 2019

On Mon, May 06, 2019 at 10:05:07AM +0000, Koenig, Christian wrote:
> Am 06.05.19 um 10:04 schrieb Daniel Vetter:
> > [SNIP]
> >>>> + /* pin buffer into GTT */
> >>>> + return amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
> >>> This is kinda what I mean with "shouldn't we pin the attachment" - afaiui
> >>> this can fail is someone already pinned the buffer into vram. And that
> >>> kind of checking is supposed to happen in the buffer attachment.
> >> Why is that supposed to happen on the attachment? I mean it could be nice to
> >> have for debugging, but I still don't see any practical reason for this.
> > dma_buf_attach is supposed to make sure the buffer won't land somewhere
> > where you can't get at it anymore. Wrt pin that means the exporter needs
> > to make sure it can't get pinned into a wrong place, and also isn't pinned
> > into a wrong place anymore. That's why I think pinning ties in with
> > dma_buf_attach and not the overall buffer.
> >
> > In a way there's two pieces of a pin:
> > - Do not move the buffer anymore.
> > - Make sure I can still get at it.
> >
> > Internally the 2nd part is encoded in the domain parameter you pass to
> > amdgpu_bo_pin. When going through dma-buf that information is derived
> > from the attachment (e.g. if it's a p2p one, then you can put it wherever
> > you feel like, if it's a normal one it must be in system ram). The dma-buf
> > alone doesn't tell you _where_ to pin something.
> 
> Ok, that finally makes some sense. So the attachment describes where the 
> buffer needs to be for the attaching device/use case to be able to 
> access it.
> 
> Going to change it to use an attachment instead.

btw one thing we've discussed, and much easier to implement on most socs
is telling apart the render vs display attachment. Most socs have
different devices for that, and different pin requirements. We could do
something similarly-ish on pci too, where you attach to some sub-device
representing scanout. Or maybe a flag on the attachment. Either way, that
would indicate an attachment where the pin indicates "must be in vram".
Might be useful for SLI rendered scanout (in case that ever happens).

> >> Well completely amdgpu internal handling here. Key point is we have both
> >> preferred_domains as well as allowed_domains.
> >>
> >> During command submission we always try to move a BO to the
> >> preferred_domains again.
> >>
> >> Now what could happen if we don't have this check is the following:
> >>
> >> 1. BO is allocate in VRAM. And preferred_domains says only VRAM please, but
> >> allowed_domains says VRAM or GTT.
> >>
> >> 2. DMA-buf Importer comes along and moves the BO to GTT, which is perfectly
> >> valid because of the allowed_domains.
> >>
> >> 3. Command submission is made and moves the BO to VRAM again.
> >>
> >> 4. Importer comes along and moves the BO to GTT.
> >> ....
> >>
> >> E.g. a nice ping/pong situation which just eats up memory bandwidth.
> > Hm yeah the ping/pong is bad, but I figure you have to already handle that
> > (with some bias or whatever). Outright disabling invalidate/dynamic
> > dma-buf seems like overkill.
> >
> > What about upgradging preferred_domains to include GTT here? Defacto what
> > you do is forcing GTT, so just adding GTT as a possible domain seems like
> > the better choice. Bonus points for undoing that when the last importer
> > disappears.
> 
> Well that's exactly what we want to avoid here.
> 
> The preferred_domains is where userspace likes the buffer to be and 
> should never be changed by the kernel.
> 
> The allowed_domains is where the buffer should be based on the hardware 
> restrictions and is usually updated by the kernel driver.
> 
> > In general I think dynamic dma-buf needs to be able to handle this
> > somehow, or it won't really work. Simplest is probably to just stop moving
> > buffers around so much for buffers that are dynamically exported (so maybe
> > could also change that in the CS ioctl to not move exported buffers
> > anymore, would achieve the same).
> 
> Yeah, that's the obvious alternative. But I didn't wanted to add even 
> more complexity to the patch right now.
> 
> Cleaning this up is pure amdgpu internally, e.g. we need to make sure to 
> not move buffers around so much on command submission.

Ok if the comment is clear enough for you folks that's all fine. Maybe add
a FIXME about how this is a bit pessimistic and what could/should be done
instead.

I agree that it's probably not a use-case you really care about, since the
big piles of dynamic shared buffers will be p2p. Dynamic, but not p2p is a
bit an edge case really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch