[PATCH 10/12] drm/amdgpu: add independent DMA-buf export v3

Mon May 6 08:04:55 UTC 2019

On Fri, May 03, 2019 at 02:35:02PM +0200, Christian König wrote:
> Am 30.04.19 um 16:16 schrieb Daniel Vetter:
> > [SNIP]
> > >   /**
> > > - * amdgpu_gem_map_attach - &dma_buf_ops.attach implementation
> > > - * @dma_buf: Shared DMA buffer
> > > + * amdgpu_gem_pin_dma_buf - &dma_buf_ops.pin_dma_buf implementation
> > > + *
> > > + * @dma_buf: DMA-buf to pin in memory
> > > + *
> > > + * Pin the BO which is backing the DMA-buf so that it can't move any more.
> > > + */
> > > +static int amdgpu_gem_pin_dma_buf(struct dma_buf *dma_buf)
> > > +{
> > > +	struct drm_gem_object *obj = dma_buf->priv;
> > > +	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
> > > +
> > > +	/* pin buffer into GTT */
> > > +	return amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
> > This is kinda what I mean with "shouldn't we pin the attachment" - afaiui
> > this can fail is someone already pinned the buffer into vram. And that
> > kind of checking is supposed to happen in the buffer attachment.
> 
> Why is that supposed to happen on the attachment? I mean it could be nice to
> have for debugging, but I still don't see any practical reason for this.

dma_buf_attach is supposed to make sure the buffer won't land somewhere
where you can't get at it anymore. Wrt pin that means the exporter needs
to make sure it can't get pinned into a wrong place, and also isn't pinned
into a wrong place anymore. That's why I think pinning ties in with
dma_buf_attach and not the overall buffer.

In a way there's two pieces of a pin:
- Do not move the buffer anymore.
- Make sure I can still get at it.

Internally the 2nd part is encoded in the domain parameter you pass to
amdgpu_bo_pin. When going through dma-buf that information is derived
from the attachment (e.g. if it's a p2p one, then you can put it wherever
you feel like, if it's a normal one it must be in system ram). The dma-buf
alone doesn't tell you _where_ to pin something.

> > Also will p2p pin into VRAM if all attachments are p2p capable? Or is your
> > plan to require dynamic invalidate to avoid fragmenting vram badly with
> > pinned stuff you can't move?
> 
> My plan was to make dynamic invalidation a must have for P2P, exactly for
> the reason you noted.
> 
> > +/**
> > + * amdgpu_gem_dma_buf_attach - &dma_buf_ops.attach implementation
> > + *
> > + * @dma_buf: DMA-buf we attach to
> >    * @attach: DMA-buf attachment
> >    *
> > + * Returns:
> > + * Always zero for success.
> > + */
> > +static int amdgpu_gem_dma_buf_attach(struct dma_buf *dma_buf,
> > +				     struct dma_buf_attachment *attach)
> > +{
> > +	struct drm_gem_object *obj = dma_buf->priv;
> > +	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
> > +
> > +	/* Make sure the buffer is pinned when userspace didn't set GTT as
> > +	 * preferred domain. This avoid ping/pong situations with scan out BOs.
> > +	 */
> > +	if (!(bo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT))
> > +		attach->invalidate = NULL;
> > Not following here at all. If the BO can't be in GTT I'd guess you should
> > reject the attach outright, since the pinning/map later on will fail I
> > guess? At least I'm not making the connection with why dynamic dma-buf
> > won't work anymore, since dynamic dma-buf is to make p2p of bo in vram
> > work better, which is exactly what this here seems to check for.
> > 
> > Or is this just a quick check until you add full p2p support?
> > 
> > Count me confused ...
> 
> Well completely amdgpu internal handling here. Key point is we have both
> preferred_domains as well as allowed_domains.
> 
> During command submission we always try to move a BO to the
> preferred_domains again.
> 
> Now what could happen if we don't have this check is the following:
> 
> 1. BO is allocate in VRAM. And preferred_domains says only VRAM please, but
> allowed_domains says VRAM or GTT.
> 
> 2. DMA-buf Importer comes along and moves the BO to GTT, which is perfectly
> valid because of the allowed_domains.
> 
> 3. Command submission is made and moves the BO to VRAM again.
> 
> 4. Importer comes along and moves the BO to GTT.
> ....
> 
> E.g. a nice ping/pong situation which just eats up memory bandwidth.

Hm yeah the ping/pong is bad, but I figure you have to already handle that
(with some bias or whatever). Outright disabling invalidate/dynamic
dma-buf seems like overkill.

What about upgradging preferred_domains to include GTT here? Defacto what
you do is forcing GTT, so just adding GTT as a possible domain seems like
the better choice. Bonus points for undoing that when the last importer
disappears.

In general I think dynamic dma-buf needs to be able to handle this
somehow, or it won't really work. Simplest is probably to just stop moving
buffers around so much for buffers that are dynamically exported (so maybe
could also change that in the CS ioctl to not move exported buffers
anymore, would achieve the same).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch