[Intel-gfx] [PATCH 1/6] RFCish: write only mappings (aka non-blocking)

Tue Sep 20 13:06:43 CEST 2011

On Mon, Sep 19, 2011 at 09:25:00PM -0700, Ben Widawsky wrote:
> I'm going to keep this short...
> Patch 5 is my test case.
> On Gen6 I see slightly better performance. On Gen5 I see really really
> improvements (like 3x) for non GTT write only maps over regular mmaps.
> GTT mappings don't really show any improvements as a whole.
>
> Better tests would be nice, but without more significant Mesa changes,
> or better benchmarks, I'm not sure how to get those.  While I think
> these patches are mostly complete, ideas for better testing are very
> welcome. Also of course, general optimizations or pointing out my errors
> would be nice.

Ok, I'm gonna be the dense annoying bastard here:

- Can we stop calling this mappings write-only. Afaics the distinguishing
  feature is that they're non-blocking. And yes, current users only use
  non-blocking paths to upload data because the amount of data we're
  currently downloading is so small. Hence we can use on bo for each
  download without wasting too much space and still avoid unnecessary
  blocking. Bit I think this will change, e.g. with designs like sna that
  tightly integrate gpu and sw rendering. Or OpenCL.

- Why do we need any patches for gtt non-blocking mmaps? I've re-read our
  code, and afaics we're only calling wait_rendering from gem_fault if
  obj->gtt_space == NULL. I.e. there's no way the gpu is currently using
  the data and hence no way for us to block on it. I think the only thing
  needed is a small libdrm batch to enable non-blocking gtt mmaps

  void drm_intel_enable_non_blocking_gtt_mmap(obj)

  which sets a bit somewhere and moves the obj (once) into the gtt domain.
  And a corresponding change in gtt_mmap to disable the set_domain call.
  This only works as long as no one else access the object from the cpu
  domain, but afaics we'll use non-blocking mmaps only for unshared
  buffers, so that should be fine.

  I might also just be dense and not see the issue ...

- I'm sorry having suggested to implement the clflush ioctl, I think it's
  a foolish idea, now. Non-blocking mmaps is a performance optimization,
  needing to sync caches with clflush is very much the opposite. So I
  think we can dustbin this.

  Now non-blocking cpu mmaps make very much sense on llc/snooped buffer
  objects. So I think we actually need an ioctl to get obj->cache_level so
  userspace can decide whether it should use non-blocking gtt mmaps or cpu
  (non-blocking) cpu mmaps. We might as well go full-circle, make Chris
  happy and merge the corresponding set_cache_level ioclt to enable
  snooped buffers on machines with ilk-like coherency (i.e. that atom
  thing I'm hearing about ...). But imo that's material for non-blocking
  mmaps, step 2.

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48