GEM-related desktop sluggishness due to linear-time arch_get_unmapped_area_topdown()

Killer solution would be to have no mapping and a decent
upload/download ioctl that can take userpage.
Doesn't this sound like GEM's read/write interface implemented by e.g.
the i915 driver?  But if I understand correctly, a mmap-like interface
should still be necessary if we want to implement e.g. glMapBuffer()
without extra copying.
glMapBuffer should not be use, it's really not a good way to do stuff.
Anyway the extra copy might be unavoidable given that sometime the
front/back might either be in unmappable vram or either have memory
layout that is not the one specify at buffer creation (this is very
common when using tiling for instance). So even considering MapBuffer
or a like function i believe it's a lot better to not allow buffer
mapping in userspace but provide upload/download hooks that can use
userpage to avoid as much as possible extra copy.
Wouldn't this give us a performance penalty for short lived resources
like vbo's which are located in GART memory? Mmap allows us to write
directly to this drm controlled portion of sysram. With a copy based
implementation we would have to allocate the buffer in sysram just to
copy it over to another portion of sysram which seems a little insane to
me, but I'm not an expert here.
Short lived & small bo would definitly doesn't work well for this kind
of API, it would all be a function of the ioctl cost. But i am not
sure the drawback would be that big, intel tested with pread/pwrite
and gived up don't remember why. For the vbo case you describe the
scheme i was thinking would be : allocate bo and on buffer data call
upload to the allocated bo using the bind user page feature that would
mean zero extra copy operation. For the fire forget case of vbo,
likely somekind of transient buffer would be more appropriate.


