[Intel-gfx] [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked)

Tue Feb 23 11:38:02 UTC 2016

On Tue, Feb 23, 2016 at 10:31:08AM +0000, Tvrtko Ursulin wrote:
> 
> On 23/02/16 10:06, Chris Wilson wrote:
> >On Mon, Feb 22, 2016 at 04:06:39PM +0000, Tvrtko Ursulin wrote:
> >>
> >>[Cc Chris as the author of the idea.]
> >>
> >>Hi,
> >>
> >>On 22/02/16 15:18, Dave Gordon wrote:
> >>>This is essentially Chris Wilson's patch of a similar name, reworked on
> >>>top of Alex Dai's recent patch:
> >>>   drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
> >>>Chris' original commentary said:
> >>>
> >>>   We now have two implementations for vmapping a whole object, one for
> >>>   dma-buf and one for the ringbuffer. If we couple the vmapping into
> >>>   the obj->pages lifetime, then we can reuse an obj->vmapping for both
> >>>   and at the same time couple it into the shrinker.
> >>
> >>As a general concept my worry is that by implementing this approach
> >>we tie two unrelated concepts together.
> >>
> >>Shrinker is about backing storage (used/free pages in a machine),
> >>while vmap is about kernel address space. And then on 32-bit with
> >>its limited vmap space (128MiB, right?), it can become exhausted
> >>much sooner that the shrinker would be triggered. And we would rely
> >>on the shrinker running to free up address space as well now, right?
> >
> >Yes, we use the shrinker to free address space.
> >
> >>So unless I am missing something that doesn't fit well.
> >
> >The opposite. Even more reason for the shrinker to be able to recover
> >vmap space on 32bit systems (for external users, not just ourselves).
> 
> How? I don't see that failed vmapping will trigger shrinking. What
> will prevent i915 from accumulating objects with vmaps sticking
> around for too long potentially?

I read what you wrote as implying the shrinker would be run (which is
what I hoped). I made the mistake of just thinking of the allocations
around that path, rather than alloc_vmap_area().

Indeed, we would need a new notifier, pretty much for the sole use of
32b. Grr, that will be a nuisance.

> >>But as a related question, I wonder why doesn't the LRC require the
> >>same !HAS_LLC approach when mapping as ring buffer does here?
> >
> >We don't try to use stolen for LRC. The main difficulty lies in
> >deciding how to map the state object, stolen forces us to use an
> >ioremapping through the GTT and so only suitable for write-only
> >mappings. However, we could be using the per-context HWS, for which we
> >want a CPU accessible, direct pointer.
> 
> I wasn't asking about stolen but the !HAS_LLC path. Even non-stolen
> ring buffers will be mapped vie the aperture on !HAS_LLC platforms.
> That implies it is about cache coherency and we don't have the same
> treatment for the LRC page.

Oh, completely missed the point. Yes, I also wonder how the kmap() on
the context object works for Braswell without an explicit clflush of at
least the TAIL update between requests.

> Until your vma->iomap prototype which added the same uncached access
> to the LRC as well.
> 
> So my question was, do we need this for cache considerations today,
> irrespective of caching the pointer in the VMA.

Yes, I think so, but it's hard to say as Braswell breaks in so many
different ways when the littlest of stress is applied.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre