[Intel-gfx] [PATCH 1/6] RFCish: write only mappings (aka non-blocking)
Daniel Vetter
daniel at ffwll.ch
Tue Sep 20 13:06:43 CEST 2011
On Mon, Sep 19, 2011 at 09:25:00PM -0700, Ben Widawsky wrote:
> I'm going to keep this short...
> Patch 5 is my test case.
> On Gen6 I see slightly better performance. On Gen5 I see really really
> improvements (like 3x) for non GTT write only maps over regular mmaps.
> GTT mappings don't really show any improvements as a whole.
>
> Better tests would be nice, but without more significant Mesa changes,
> or better benchmarks, I'm not sure how to get those. While I think
> these patches are mostly complete, ideas for better testing are very
> welcome. Also of course, general optimizations or pointing out my errors
> would be nice.
Ok, I'm gonna be the dense annoying bastard here:
- Can we stop calling this mappings write-only. Afaics the distinguishing
feature is that they're non-blocking. And yes, current users only use
non-blocking paths to upload data because the amount of data we're
currently downloading is so small. Hence we can use on bo for each
download without wasting too much space and still avoid unnecessary
blocking. Bit I think this will change, e.g. with designs like sna that
tightly integrate gpu and sw rendering. Or OpenCL.
- Why do we need any patches for gtt non-blocking mmaps? I've re-read our
code, and afaics we're only calling wait_rendering from gem_fault if
obj->gtt_space == NULL. I.e. there's no way the gpu is currently using
the data and hence no way for us to block on it. I think the only thing
needed is a small libdrm batch to enable non-blocking gtt mmaps
void drm_intel_enable_non_blocking_gtt_mmap(obj)
which sets a bit somewhere and moves the obj (once) into the gtt domain.
And a corresponding change in gtt_mmap to disable the set_domain call.
This only works as long as no one else access the object from the cpu
domain, but afaics we'll use non-blocking mmaps only for unshared
buffers, so that should be fine.
I might also just be dense and not see the issue ...
- I'm sorry having suggested to implement the clflush ioctl, I think it's
a foolish idea, now. Non-blocking mmaps is a performance optimization,
needing to sync caches with clflush is very much the opposite. So I
think we can dustbin this.
Now non-blocking cpu mmaps make very much sense on llc/snooped buffer
objects. So I think we actually need an ioctl to get obj->cache_level so
userspace can decide whether it should use non-blocking gtt mmaps or cpu
(non-blocking) cpu mmaps. We might as well go full-circle, make Chris
happy and merge the corresponding set_cache_level ioclt to enable
snooped buffers on machines with ilk-like coherency (i.e. that atom
thing I'm hearing about ...). But imo that's material for non-blocking
mmaps, step 2.
Cheers, Daniel
--
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48
More information about the Intel-gfx
mailing list