[Intel-gfx] [PATCH v7 6/7] drm/i915: refactor duplicate object vmap functions (the final rework?)
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Mar 23 12:23:44 UTC 2016
On 22/03/16 15:25, Dave Gordon wrote:
> On 08/03/16 09:43, Tvrtko Ursulin wrote:
>>
>> On 02/03/16 15:40, Dave Gordon wrote:
>>> On 02/03/16 12:08, Chris Wilson wrote:
>>>> On Tue, Mar 01, 2016 at 04:33:58PM +0000, Dave Gordon wrote:
>>>>> This is essentially Chris Wilson's patch of a similar name,
>>>>> reworked on
>>>>> top of Alex Dai's recent patch:
>>>>> | drm/i915: Add i915_gem_object_vmap to map GEM object to virtual
>>>>> space
>>>>>
>>>>> Chris' original commentary said:
>>>>> | We now have two implementations for vmapping a whole object, one for
>>>>> | dma-buf and one for the ringbuffer. If we couple the vmapping into
>>>>> | the obj->pages lifetime, then we can reuse an obj->vmapping for both
>>>>> | and at the same time couple it into the shrinker.
>>>>> |
>>>>> | v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala)
>>>>> | v3: Call unpin_vmap from the right dmabuf unmapper
>>>>>
>>>>> v4: reimplements the same functionality, but now as wrappers round the
>>>>> recently-introduced i915_gem_object_vmap_range() from Alex's
>>>>> patch
>>>>> mentioned above.
>>>>>
>>>>> v5: separated from two minor but unrelated changes [Tvrtko Ursulin];
>>>>> this is the third and most substantial portion.
>>>>>
>>>>> Decided not to hold onto vmappings after the pin count goes to
>>>>> zero.
>>>>> This may reduce the benefit of Chris' scheme a bit, but does
>>>>> avoid
>>>>> any increased risk of exhausting kernel vmap space on 32-bit
>>>>> kernels
>>>>> [Tvrtko Ursulin]. Potentially, the vunmap() could be deferred
>>>>> until
>>>>> the put_pages() stage if a suitable notifier were written, but
>>>>> we're
>>>>> not doing that here. Nonetheless, the simplification of both
>>>>> dmabuf
>>>>> and ringbuffer code makes it worthwhile in its own right.
>>>>>
>>>>> v6: change BUG_ON() to WARN_ON(). [Tvrtko Ursulin]
>>>>>
>>>>> Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
>>>>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> Cc: Alex Dai <yu.dai at intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/i915/i915_drv.h | 22 ++++++++++++++-----
>>>>> drivers/gpu/drm/i915/i915_gem.c | 39
>>>>> +++++++++++++++++++++++++++++++++
>>>>> drivers/gpu/drm/i915/i915_gem_dmabuf.c | 36
>>>>> ++++--------------------------
>>>>> drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++----
>>>>> 4 files changed, 65 insertions(+), 41 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>>>> b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index b3ae191..f1ad3b3 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -2172,10 +2172,7 @@ struct drm_i915_gem_object {
>>>>> struct scatterlist *sg;
>>>>> int last;
>>>>> } get_page;
>>>>> -
>>>>> - /* prime dma-buf support */
>>>>> - void *dma_buf_vmapping;
>>>>> - int vmapping_count;
>>>>> + void *vmapping;
>>>>>
>>>>> /** Breadcrumb of last rendering to the buffer.
>>>>> * There can only be one writer, but we allow for multiple
>>>>> readers.
>>>>> @@ -2980,7 +2977,22 @@ static inline void
>>>>> i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
>>>>> static inline void i915_gem_object_unpin_pages(struct
>>>>> drm_i915_gem_object *obj)
>>>>> {
>>>>> BUG_ON(obj->pages_pin_count == 0);
>>>>> - obj->pages_pin_count--;
>>>>> + if (--obj->pages_pin_count == 0 && obj->vmapping) {
>>>>> + /*
>>>>> + * Releasing the vmapping here may yield less benefit than
>>>>> + * if we kept it until put_pages(), but on the other hand
>>>>
>>>> Yields no benefit. Makes the patch pointless.
>>>> Plus there is also pressure to enable WC vmaps.
>>>> -Chris
>>>
>>> The patch is not pointless -- at the very least, it:
>>> + reduces the size of "struct drm_i915_gem_object" (OK, only by 4 bytes)
>>> + replaces special-function code for dmabufs with more generic code that
>>> can be reused for other objects (for now, ringbuffers; next GuC-shared
>>> objects -- see Alex's patch "drm/i915/guc: Simplify code by keeping vmap
>>> of guc_client object" which will eliminate lot of short-term
>>> kmap_atomics with persistent kmaps).
>>> + provides a shorthand for the sequence of { get_pages(), pin_pages(),
>>> vmap() } so we don't have to open-code it (and deal with all the error
>>> paths) in several different places
>>>
>>> Thus there is an engineering benefit even if this version doesn't
>>> provide any performance benefit. And if, as the next step, you want to
>>> extend the vmap lifetime, you just have to remove those few lines in
>>> i915_gem_object_unpin_pages() and incorporate the notifier that you
>>> prototyped earlier -- if it actually provides any performance boost.
>>
>> So Chris do you ack on this series on the basis of the above - that it
>> consolidates the current code and following GuC patch will be another
>> user of the pin_vmap API?
>>
>> Regards,
>> Tvrtko
>
> I see that Chris has posted a patch to add a vmap notifier, although it
> hasn't yet got its R-B. So I suggest we merge this patch series now, and
> then update it by moving the vunmap() into put_pages() when Chris has
> the notifier finalised. IIRC you wanted Daniel to merge the new DRM bits
> (patches 3 and 7, which already have their R-Bs) ?
>
> Or we can merge 1-5+7, all of which already have R-Bs, and I can turn
> 6 into a GuC-private version, without the benefit of simplifying and
> unifying the corresponding object-mapping management in the DMAbuf and
> ringbuffer code.
>
> Or I can repost just the bits that don't rely on drm_malloc_gfp() and
> exclude the final patch so that we can move ahead on the bits we
> actually want for improving the performance of the GuC interface and
> reducing the number of kmap_atomic calls elsewhere, and then the omitted
> bits can be added back once drm_malloc_gfp() has been merged upstream
> and the notifier is working.
I've chatted with Chris and Daniel on IRC and here is the summary and
way forward I think.
1. Drop 6/7, and probably 7/7 unless you can get etnaviv people to r-b/ack.
2. Add the patch which fixes the actual scheduling while atomic in GuC
to the end of the series with a Bugzila & Testcase tag in that patch.
(This step should allow Chris to provide an Acked-by.)
3. Cc dri-devel on all patches of the series since some touch DRM core.
(This is standard recommended practice).
4. Rebase & resend as new series.
5. Review the new patch in the series.
6. Explain CI results.
7. Merge. :)
Regards,
Tvrtko
More information about the Intel-gfx
mailing list