[Intel-gfx] [PATCH 11/42] drm/i915: Introduce an internal allocator for disposable private objects

Sat Oct 8 08:12:13 UTC 2016

On 07/10/2016 18:08, Chris Wilson wrote:
> On Fri, Oct 07, 2016 at 05:52:47PM +0100, Tvrtko Ursulin wrote:
>> On 07/10/2016 10:46, Chris Wilson wrote:
>>> Quite a few of our objects used for internal hardware programming do not
>>> benefit from being swappable or from being zero initialised. As such
>>> they do not benefit from using a shmemfs backing storage and since they
>>> are internal and never directly exposed to the user, we do not need to
>>> worry about providing a filp. For these we can use an
>>> drm_i915_gem_object wrapper around a sg_table of plain struct page. They
>>> are not swap backed and not automatically pinned. If they are reaped
>>> by the shrinker, the pages are released and the contents discarded. For
>>> the internal use case, this is fine as for example, ringbuffers are
>>> pinned from being written by a request to be read by the hardware. Once
>>> they are idle, they can be discarded entirely. As such they are a good
>>> match for execlist ringbuffers and a small variety of other internal
>>> objects.
>>>
>>> In the first iteration, this is limited to the scratch batch buffers we
>>> use (for command parsing and state initialisation).
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/Makefile                |   1 +
>>>   drivers/gpu/drm/i915/i915_drv.h              |   5 +
>>>   drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  28 ++---
>>>   drivers/gpu/drm/i915/i915_gem_internal.c     | 161 +++++++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/i915_gem_render_state.c |   2 +-
>>>   drivers/gpu/drm/i915/intel_engine_cs.c       |   2 +-
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c      |  14 ++-
>>>   7 files changed, 189 insertions(+), 24 deletions(-)
>>>   create mode 100644 drivers/gpu/drm/i915/i915_gem_internal.c
>>>
>>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>>> index a998c2bce70a..b94a90f34d2d 100644
>>> --- a/drivers/gpu/drm/i915/Makefile
>>> +++ b/drivers/gpu/drm/i915/Makefile
>>> @@ -35,6 +35,7 @@ i915-y += i915_cmd_parser.o \
>>>   	  i915_gem_execbuffer.o \
>>>   	  i915_gem_fence.o \
>>>   	  i915_gem_gtt.o \
>>> +	  i915_gem_internal.o \
>>>   	  i915_gem.o \
>>>   	  i915_gem_render_state.o \
>>>   	  i915_gem_request.o \
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index fee5cc92e2f2..bad97f1e5265 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -3538,6 +3538,11 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>>>   					       u32 gtt_offset,
>>>   					       u32 size);
>>> +/* i915_gem_internal.c */
>>> +struct drm_i915_gem_object *
>>> +i915_gem_object_create_internal(struct drm_device *dev,
>>> +				unsigned int size);
>>> +
>> Wasn't size_t our convention for GEM objects?
> Not our convention, no. Using size_t has caused too many bugs, that we
> started to erradicate it (in our usual piecemeal approach).

Oh wow, I missed that decision. Last thing I remember was that someone 
was trying to convert it all to size_t. Fine by me.

>>> +		/* 965gm cannot relocate objects above 4GiB. */
>>> +		gfp &= ~__GFP_HIGHMEM;
>>> +		gfp |= __GFP_DMA32;
>>> +	}
>>> +
>>> +	for (i = 0; i < npages; i++) {
>>> +		struct page *page;
>>> +
>>> +		page = alloc_page(gfp);
>>> +		if (!page)
>>> +			goto err;
>>> +
>>> +#ifdef CONFIG_SWIOTLB
>>> +		if (swiotlb_nr_tbl()) {
>>> +			st->nents++;
>>> +			sg_set_page(sg, page, PAGE_SIZE, 0);
>>> +			sg = sg_next(sg);
>>> +			continue;
>>> +		}
>>> +#endif
>>> +		if (!i || page_to_pfn(page) != last_pfn + 1) {
>>> +			if (i)
>>> +				sg = sg_next(sg);
>>> +			st->nents++;
>>> +			sg_set_page(sg, page, PAGE_SIZE, 0);
>>> +		} else {
>>> +			sg->length += PAGE_SIZE;
>>> +		}
>>> +		last_pfn = page_to_pfn(page);
>>> +	}
>>> +#ifdef CONFIG_SWIOTLB
>>> +	if (!swiotlb_nr_tbl())
>>> +#endif
>>> +		sg_mark_end(sg);
>> Looks like the loop above could be moved into a helper and shared
>> with i915_gem_object_get_pages_gtt. Maybe just a page-alloc and
>> page-alloc-error callbacks would be required.
> So just the entire thing as a callback... I would have thought you might
> suggest trying high order allocations and falling back to low order.

I am not sure higher order attempts would be worth it. What I was 
thinking was to implement the loop above generically in one place, and 
then two (or even three with userptr) callers would call that with their 
own alloc-page and alloc-page-on-error callbacks. So that there is a 
single implementation of the coalescing logic etc.

Regards,

Tvrtko