[Intel-gfx] [PATCH 1/6] RFCish: write only mappings (aka non-blocking)

Wed Sep 21 00:19:53 CEST 2011

On Tue, 20 Sep 2011 13:06:43 +0200
Daniel Vetter <daniel at ffwll.ch> wrote:

> On Mon, Sep 19, 2011 at 09:25:00PM -0700, Ben Widawsky wrote:
> > I'm going to keep this short...
> > Patch 5 is my test case.
> > On Gen6 I see slightly better performance. On Gen5 I see really
> > really improvements (like 3x) for non GTT write only maps over
> > regular mmaps. GTT mappings don't really show any improvements as a
> > whole.
> >
> > Better tests would be nice, but without more significant Mesa
> > changes, or better benchmarks, I'm not sure how to get those.
> > While I think these patches are mostly complete, ideas for better
> > testing are very welcome. Also of course, general optimizations or
> > pointing out my errors would be nice.
> 
> Ok, I'm gonna be the dense annoying bastard here:
> 
> - Can we stop calling this mappings write-only. Afaics the
> distinguishing feature is that they're non-blocking. And yes, current
> users only use non-blocking paths to upload data because the amount
> of data we're currently downloading is so small. Hence we can use on
> bo for each download without wasting too much space and still avoid
> unnecessary blocking. Bit I think this will change, e.g. with designs
> like sna that tightly integrate gpu and sw rendering. Or OpenCL.

I really wanted to argue this point, but you're right. I think what I'll do is
make the libdrm stub be called non-blocking, and then per gen we can do
whatever with set_domain.

> 
> - Why do we need any patches for gtt non-blocking mmaps? I've re-read
> our code, and afaics we're only calling wait_rendering from gem_fault
> if obj->gtt_space == NULL. I.e. there's no way the gpu is currently
> using the data and hence no way for us to block on it.

I think you're right. I misread this before, we take an early exit if
its write domain is already GTT, so we don't sync flush at that point. I
believe this is rooted in the fact that my original versions of the
patch wouldn't call set domain at all (it used create() and mmap()
ioctls to mark buffers as write only), but with those changes, I think
this is unnecessary with prefaulting.

> I think the only thing needed is a small libdrm batch to enable non-blocking
> gtt mmaps
> 
>   void drm_intel_enable_non_blocking_gtt_mmap(obj)
> 
>   which sets a bit somewhere and moves the obj (once) into the gtt domain.
>   And a corresponding change in gtt_mmap to disable the set_domain call. This
>   only works as long as no one else access the object from the cpu domain,
>   but afaics we'll use non-blocking mmaps only for unshared buffers, so that
>   should be fine.
> 
>   I might also just be dense and not see the issue ...

I should have documented that these buffers should not be flinked. I
should probably enforce that in the driver. I will take a note to change
that. Anyway, I think you're right, and the result of this is to remove
the hunk from i915_gem_fault, and prefault do you agree?

> 
> - I'm sorry having suggested to implement the clflush ioctl, I think
> it's a foolish idea, now. Non-blocking mmaps is a performance
> optimization, needing to sync caches with clflush is very much the
> opposite. So I think we can dustbin this.

I disagree. I think it's nice function to add for people too lazy to do
micro-optimizations. The flushing of the full object is almost
guaranteed to make performance worse though, that should really only be
for testing purposes.

> 
>   Now non-blocking cpu mmaps make very much sense on llc/snooped
> buffer objects. So I think we actually need an ioctl to get
> obj->cache_level so userspace can decide whether it should use
> non-blocking gtt mmaps or cpu (non-blocking) cpu mmaps. We might as
> well go full-circle, make Chris happy and merge the corresponding
> set_cache_level ioclt to enable snooped buffers on machines with
> ilk-like coherency (i.e. that atom thing I'm hearing about ...). But
> imo that's material for non-blocking mmaps, step 2.

I'd need to research this a bit more, let me defer response on this
part. By the way, which set_cache_level ioctl are you referring to?

> 
> Cheers, Daniel

Ben