[RFC] drm/radeon: userfence IOCTL

Mon Apr 13 10:46:01 PDT 2015

On Mon, Apr 13, 2015 at 06:55:19PM +0200, Christian König wrote:
> On 13.04.2015 18:08, Jerome Glisse wrote:
> >On Mon, Apr 13, 2015 at 05:45:21PM +0200, Christian König wrote:
> >>On 13.04.2015 17:31, Jerome Glisse wrote:
> >>>On Mon, Apr 13, 2015 at 04:52:14PM +0200, Christian König wrote:
> >>>>Hello everyone,
> >>>>
> >>>>we have a requirement for a bit different kind of fence handling.
> >>>>Currently we handle fences completely inside the kernel, but in
> >>>>the future we would like to emit multiple fences inside the same
> >>>>IB as well.
> >>>>
> >>>>This works by adding multiple fence commands into an IB which
> >>>>just write their value to a specific location inside a BO and
> >>>>trigger the appropriate hardware interrupt.
> >>>>
> >>>>The user part of the driver stack should then be able to call an
> >>>>IOCTL to wait for the interrupt and block for the value (or
> >>>>something larger) to be written to the specific location.
> >>>>
> >>>>This has the advantage that you can have multiple synchronization
> >>>>points in the same IB and don't need to split up your draw commands
> >>>>over several IBs so that the kernel can insert kernel fences in
> >>>>between.
> >>>>
> >>>>The following set of patches tries to implement exactly this IOCTL.
> >>>>The big problem with that IOCTL is that TTM needs the BO to be
> >>>>kept in the same place while it is mapped inside the kernel page
> >>>>table. So this requires that we pin down the BO for the duration
> >>>>of the wait IOCTL.
> >>>>
> >>>>This practically gives userspace a way of pinning down BOs for as
> >>>>long as it wants, without the ability for the kernel for intervention.
> >>>>
> >>>>Any ideas how to avoid those problems? Or better ideas how to handle
> >>>>the new requirements?
> >>>So i think the simplest solution is to only allow such "fence" bo to
> >>>be inside system memory (no vram for them). My assumption here is that
> >>>such BO will barely see more than couple dword write so it is not a
> >>>bandwidth intensive BO. Or do you have a requirement for such BO to
> >>>be in VRAM ?
> >>Not that I know off.
> >>
> >>>Now to do that i would just add a property to buffer object that
> >>>effectively forbid such BO to be place anywhere else than GTT. Doing
> >>>that would make the ioctl code simpler, just check the BO as the
> >>>GTT only property set and if not return -EINVAL. Then its a simple
> >>>matter of kmapping the proper page.
> >>I've also considered adding an internal flag that when a buffer is kmapped
> >>we avoid moving it to VRAM / swapping it out, but see below.
> >>
> >>>Note that the only thing that would be left to forbid is the swaping
> >>>of the buffer due to memory pressure (from various ttm/core shrinker).
> >>Yeah, how the heck would I do that? That's internals of TTM that I never got
> >>into.
> >Actualy i think it is easier then i first thought, in the wait ioctl
> >check if the buffer has a pending fence ie gpu is still using it, if
> >not return -EAGAIN because it means that it is pointless to wait for
> >next GPU interrupt.
> >
> >For as long as the BO has an active fence it will not be swapped out
> >(see ttm_bo_swapout()). So in the wait event test both the value and
> >the pending fence. If the pending fence signal but not the value then
> >return -EAGAIN. In all case just keep a kmap of the page (do not kmap
> >the using existing kmap helper we would need something new to not
> >interfer with the shrinker). Not that after testing the value you would
> >need to check that the BO was not move and thus the page you were
> >looking into is still the one the BO is using.
> >
> >That way userspace can not abuse this ioctl to block the shrinker from
> >making progress.
> 
> So what we do on the start of the IOCTL is to check the BOs fences and see
> if it actually is still used and note it's current placement.
> 
> Then map it so the kernel can access it and in the waiting loop we check if
> it still has a fence and is still in the same place.
> 
> If there isn't any fences left or the placement has changed we simply assume
> that the fence is signaled.

Well no code is easier so pseudo below :

radeon_user_fence_wait_ioctl()
{
	struct radeon_bo rbo;
	struct page *page;
	uint64_t *fenceptr;

	radeon_bo_reserve(rbo, false);
	if (!(radeon_bo_property(rbo) & GTT_ONLY)) {
		radeon_bo_reserve(rbo, false);
		return -EINVAL;
	}

	if (reservation_object_test_signaled_rcu(robj->tbo.resv, true)) {
		radeon_bo_reserve(rbo, false);
		/* There is no pending work on this BO so just have
		 * userspace check for the fence itself. Proper user
		 * space will check fence has not signaled when getting
		 * -EAGAIN.
		 */
		return -EAGAIN;
	}

	page = ttm_get_page_at(&rbo->tbo, args->offset);
	// maybe we just want to return -EINVAL but if !page the
	// something is seriously wrong here.
	BUG_ON(!page);

	fenceptr = kmap(page) + (arg->offset & ~PAGE_MASK);
	radeon_bo_unreserve(robj);

	// Do not hold object

	radeon_irq_kms_sw_irq_get(rdev, ring);
	// FIXME timeout
	r = wait_event_interruptible(rdev->fence_queue,
				     (!robj->tbo.resv ||
				      *fenceptr >= args->fence ||
				       rdev->needs_reset));
	radeon_irq_kms_sw_irq_put(rdev, ring);

	radeon_bo_reserve(rbo, false);
	if (*fenceptr >= args->fence &&
	    page == ttm_get_page_at(&rbo->tbo, args->offset)) {
		radeon_bo_unreserve(robj);
		kunmap(page);
		/* success the user fence signaled. */
		return 0;
	}
	radeon_bo_unreserve(robj);
	kunmap(page);

	/* Fence is not signaled and either bo no longer have pending
	 * work or we timedout or gpu need reset.
	 */
	return -EAGAIN;
}

The point here is that it's ok to map the page in the first place has
object is still in use. But then even if object is swaped out it is
ok to keep that page mapped, worst case we are just accessing a page
use by someone else but we are just reading.

That is why we have to check again after the wait that we are still
reading from the right page to know fore sure if this is actually the
fence value we are looking at or just some random page.

That way because we do not have anything on the object while waiting
we are not blocking anything to happen. The object can be swapped out
and thus there is no way for userspace to abuse this ioctl.

The new GTT only property only ensure us that ttm_get_page_at() will
actually be successfull if the object is active. If the object is
swapped out then ttm_get_page_at() return NULL. But in the first part
of the ioctl as we checked that the object is still active we know
it can not be swapped out so ttm_get_page_at() must return a proper
page.

Cheers,
Jérôme

> 
> Yeah, that actually should work.
> 
> Thanks for the tip,
> Christian.
> 
> >
> >I need to look at ttm kmap code to see if it is actually useable without
> >disrupting the shrinker. Will do that after lunch.
> >
> >Cheers,
> >Jérôme
> >
> >>Thanks for the ideas,
> >>Christian.
> >>
> >>>Cheers,
> >>>Jérôme
>