[Intel-gfx] [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT

Thu Dec 10 02:23:03 PST 2015

On Wed, 2015-12-09 at 13:57 +0000, Tvrtko Ursulin wrote:
> On 09/12/15 12:46, ankitprasad.r.sharma at intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma at intel.com>
> >
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> >
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> >
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma at intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h |  1 +
> >   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 80 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> >   				    int *needs_clflush);
> >
> >   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >
> >   static inline int __sg_page_count(struct scatterlist *sg)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> >   	drm_gem_object_unreference(&obj->base);
> >   	return ERR_PTR(ret);
> >   }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +	int ret, i;
> > +	char __iomem *base;
> > +	size_t size = obj->base.size;
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_mm_node node;
> > +
> > +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> 
> Hm, I thought Chrises suggestion was not to even try mapping all of it 
> into GTT but just go page by page?
> 
Yes, I will modify this to use only the page-by-page approach.
> If I misunderstood that then I agree with Dave's comment that it should 
> be split in two helper functions.
> 
> > +	if (ret) {
> > +		memset(&node, 0, sizeof(node));
> > +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > +							  &node, 4096, 0,
> > +							  I915_CACHE_NONE, 0,
> > +							  i915->gtt.mappable_end,
> > +							  DRM_MM_SEARCH_DEFAULT,
> > +							  DRM_MM_CREATE_DEFAULT);
> > +		if (ret)
> > +			goto out;
> > +
> > +		i915_gem_object_pin_pages(obj);
> > +	} else {
> > +		node.start = i915_gem_obj_ggtt_offset(obj);
> > +		node.allocated = false;
> 
> This looks very hacky anyway and I would not recommend it.
> 
> > +	}
> > +
> > +	ret = i915_gem_object_put_fence(obj);
> > +	if (ret)
> > +		goto unpin;
> > +
> > +	if (node.allocated) {
> > +		for (i = 0; i < size/PAGE_SIZE; i++) {
> > +			wmb();
> 
> What is this barreier for? Shouldn't the one after writting out the PTEs 
> and before remapping be enough?
This is to be fully on the safer side, to avoid any overlapping done by
the compiler across the iterations and that the loop instructions are
strictly executed in the program order.

Having just one barrier will ensure that insert_page and subsequent
ioremap are done in that order but the end of one iteration can still
overlap the start of next iteration.
> 
> > +			i915->gtt.base.insert_page(&i915->gtt.base,
> > +					i915_gem_object_get_dma_address(obj, i),
> > +					node.start,
> > +					I915_CACHE_NONE,
> > +					0);
> > +			wmb();
> > +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> > +			memset_io(base, 0, 4096);
> > +			iounmap(base);
> > +		}
> > +	} else {
> > +		/* Get the CPU virtual address of the buffer */
> > +		base = ioremap_wc(i915->gtt.mappable_base +
> > +				  node.start, size);
> > +		if (base == NULL) {
> > +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> > +			ret = -ENOSPC;
> > +			goto unpin;
> > +		}
> > +
> > +		memset_io(base, 0, size);
> > +		iounmap(base);
> > +	}
> > +unpin:
> > +	if (node.allocated) {
> > +		wmb();
> 
> I don't understand this one either?
This is to make sure the last memset is over before we move to
clear_range.
> 
> > +		i915->gtt.base.clear_range(&i915->gtt.base,
> > +				node.start, node.size,
> > +				true);
> > +		drm_mm_remove_node(&node);
> > +		i915_gem_object_unpin_pages(obj);
> > +	}
> > +	else {
> > +		i915_gem_object_ggtt_unpin(obj);
> > +	}
> > +out:
> > +	return ret;
> > +}
> >
> 
> Regards,
> 
> Tvrtko