[Intel-gfx] [PATCH 14/41] drm/i915: Use a radixtree for random access to the object's backing storage

Mon Oct 17 09:56:27 UTC 2016

On 14/10/2016 15:07, Chris Wilson wrote:
> On Fri, Oct 14, 2016 at 02:32:03PM +0100, Tvrtko Ursulin wrote:
>> On 14/10/2016 13:18, Chris Wilson wrote:
>>> A while ago we switched from a contiguous array of pages into an sglist,
>>> for that was both more convenient for mapping to hardware and avoided
>>> the requirement for a vmalloc array of pages on every object. However,
>>> certain GEM API calls (like pwrite, pread as well as performing
>>> relocations) do desire access to individual struct pages. A quick hack
>>> was to introduce a cache of the last access such that finding the
>>> following page was quick - this works so long as the caller desired
>>> sequential access. Walking backwards, or multiple callers, still hits a
>>> slow linear search for each page. One solution is to store each
>>> successful lookup in a radix tree.
>>>
>>> v2: Rewrite building the radixtree for clarity, hopefully.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         |  69 +++++-------
>>>   drivers/gpu/drm/i915/i915_gem.c         | 179 +++++++++++++++++++++++++++++---
>>>   drivers/gpu/drm/i915/i915_gem_stolen.c  |   4 +-
>>>   drivers/gpu/drm/i915/i915_gem_userptr.c |   4 +-
>>>   4 files changed, 193 insertions(+), 63 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index 38467dde1efe..53cf4b0e5359 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -2273,9 +2273,12 @@ struct drm_i915_gem_object {
>>>   	struct sg_table *pages;
>>>   	int pages_pin_count;
>>> -	struct get_page {
>>> -		struct scatterlist *sg;
>>> -		int last;
>>> +	struct i915_gem_object_page_iter {
>>> +		struct scatterlist *sg_pos;
>>> +		unsigned long sg_idx;
>> We are not consistent in the code with type used for number of pages
>> in an object. sg_alloc_table even takes an unsigned int for it, so
>> for as long as we build them as we do, unsigned long is a waste.
> I know. It's worrying, today there is a possibility that we overflow a
> 32bit size. If this was counting in pages, we would have a few more
> years of grace. All I can say is that we are fortunate that memory
> remains expensive in the exabyte range.
>
>>> @@ -4338,6 +4349,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
>>>   	obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
>>>   	obj->madv = I915_MADV_WILLNEED;
>>> +	INIT_RADIX_TREE(&obj->get_page.radix, GFP_ATOMIC | __GFP_NOWARN);
>> Pros & cons of GFP_ATOMIC here? Versus perhaps going with the mutex?
>> I don't know how much data radix tree allocates with this, per node,
>> but we can have a lot of pages. Would this create extra pressure on
>> slab shrinking, and in turn out objects?
> The problem is that we require sg lookup on a !pagefault path, hence
> mutexes and GFP_KERNEL turn out to be illegal. :|

Bummer.  I don't know enough about the atomic reserve to judge how bad 
this might be then. Because userspace could drain it easily after this 
work by just pread/pwrite on large objects.

>>> +		/* If we cannot allocate and insert this entry, or the
>>> +		 * individual pages from this range, cancel updating the
>>> +		 * sg_idx so that on this lookup we are forced to linearly
>>> +		 * scan onwards, but on future lookups we will try the
>>> +		 * insertion again (in which case we need to be careful of
>>> +		 * the error return reporting that we have already inserted
>>> +		 * this index).
>>> +		 */
>>> +		ret = radix_tree_insert(&iter->radix, idx, sg);
>>> +		if (ret && ret != -EEXIST)
>>> +			goto scan;
>> What other error can we get here? Internal allocation failure?
> Yes. ENOMEM is the only other error.
>>> +
>>> +		exception =
>>> +			RADIX_TREE_EXCEPTIONAL_ENTRY |
>>> +			idx << RADIX_TREE_EXCEPTIONAL_SHIFT;
>>> +		for (i = 1; i < count; i++) {
>>> +			ret = radix_tree_insert(&iter->radix, idx + i,
>>> +						(void *)exception);
>>> +			if (ret && ret != -EEXIST)
>>> +				goto scan;
>>> +		}
>>> +
>>> +		idx += count;
>>> +		sg = ____sg_next(sg);
>>> +		count = __sg_page_count(sg);
>>> +	}
>>> +
>>> +scan:
>>> +	iter->sg_pos = sg;
>>> +	iter->sg_idx = idx;
>>> +
>>> +	spin_unlock(&iter->lock);
>>> +
>>> +	if (unlikely(n < idx)) /* insertion completed by another thread */
>>> +		goto lookup;
>>> +
>>> +	/* In case we failed to insert the entry into the radixtree, we need
>>> +	 * to look beyond the current sg.
>>> +	 */
>>> +	while (idx + count <= n) {
>>> +		idx += count;
>>> +		sg = ____sg_next(sg);
>>> +		count = __sg_page_count(sg);
>>> +	}
>>> +
>> Hmm... I assume failures happen then since you added this fallback.
>> Due GFP_ATOMIC?
> No, this was always in the code, because malloc failures happen. Quite
> often if you run igt ;)
>   

But GFP_ATOMIC failures primarily, yes?

It is a bit unfortunate that with this fancy lookup we can easily and 
silently fall back to linear lookup and lose the performance benefit. Do 
you know how often this happens? Should we have some stats collected and 
exported via debugfs to evaluate on realistically busy/loaded systems?

>>> +	*offset = n - idx;
>>> +	return sg;
>>> +
>>> +lookup:
>>> +	rcu_read_lock();
>>> +
>> Property of the radix tree implementation that the RCU lock is sufficient?
> Yes. Lookups are RCU safe (the slots are freed via RCU). Writers must be
> serialised by the caller.
>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> index f4f6d3a48b05..70e61bc35c60 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> @@ -595,8 +595,8 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
>>>   	if (obj->pages == NULL)
>>>   		goto cleanup;
>>> -	obj->get_page.sg = obj->pages->sgl;
>>> -	obj->get_page.last = 0;
>>> +	obj->get_page.sg_pos = obj->pages->sgl;
>>> +	obj->get_page.sg_idx = 0;
>>>   	i915_gem_object_pin_pages(obj);
>>>   	obj->stolen = stolen;
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> index 1c891b92ac80..cb95789da76e 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> @@ -526,8 +526,8 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
>>>   			if (ret == 0) {
>>>   				list_add_tail(&obj->global_list,
>>>   					      &to_i915(dev)->mm.unbound_list);
>>> -				obj->get_page.sg = obj->pages->sgl;
>>> -				obj->get_page.last = 0;
>>> +				obj->get_page.sg_pos = obj->pages->sgl;
>>> +				obj->get_page.sg_idx = 0;
>> Almost like these ones would be better in a helper like
>> i915_gem_object_init_page_lookup or something.
> Yup.  I think I have a patch for that, with your r-b on it :)

Ok then. :)

Regards,

Tvrtko