[Intel-gfx] [PATCH 3/3] drm/i915: Use insert_page for pwrite_fast

Sat Nov 7 02:13:13 PST 2015

On Thu, Nov 05, 2015 at 02:38:30PM +0000, Tvrtko Ursulin wrote:
> 
> On 05/11/15 12:58, Chris Wilson wrote:
> >On Thu, Nov 05, 2015 at 12:53:20PM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 05/11/15 12:42, Chris Wilson wrote:
> >>>On Thu, Nov 05, 2015 at 12:37:46PM +0000, Tvrtko Ursulin wrote:
> >>>>
> >>>>On 05/11/15 11:45, ankitprasad.r.sharma at intel.com wrote:
> >>>>>From: Ankitprasad Sharma <ankitprasad.r.sharma at intel.com>
> >>>>>
> >>>>>In pwrite_fast, map an object page by page if obj_ggtt_pin fails. First,
> >>>>>we try a nonblocking pin for the whole object (since that is fastest if
> >>>>>reused), then failing that we try to grab one page in the mappable
> >>>>>aperture. It also allows us to handle objects larger than the mappable
> >>>>>aperture (e.g. if we need to pwrite with vGPU restricting the aperture
> >>>>>to a measely 8MiB or something like that).
> >>>>
> >>>>Aperture in aperture, reminds me of those "Yo dawg I've heard you
> >>>>like X so I've put X in your X so you can Y while you Y" jokes. :D
> >>>>
> >>>>Would using the partial view code be interesting for this? Might be
> >>>>faster due to larger chunks possible, or slower due more expensive
> >>>>set up time, I don't know.
> >>>
> >>>It's the wrong abstraction.
> >>
> >>Looks the same to me, only difference is the size.
> >
> >There are many places that insert-page is used where we cannot do a
> >partial-pin.
> >
> >>Why not just to the page aperture then for simplicity? If there is
> >>any performance gain from trying the full VMA first then why there
> >>wouldn't be some to try with the partial VMA?
> >
> >obj->base.size >> PAGE_SHIFT x partial pages is not even funny.
> 
> Well I did not suggest that but larger chunks so I will repeat my question.
> 
> If going page by page is fine for performance then why have the two
> code paths at all? One which tries top pin the whole object first,
> and second which goes page by page if that fails. Why not just do it
> page by page and avoid having two copy loops etc?

If we already have the vma or can allocate it with impacting upon the
system, using it is best (since we expect to reuse it again). If we
cannot allocate it, our natural iterator size is 4096 bytes and is also
our best chance at allocating that in the aperture.

Partial vma are a high overhead and more importantly a massive impedance
mismatch.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre