making a COW mapping on the fly from existing vma

Mon Apr 18 13:58:06 UTC 2016

On Sat, Apr 16, 2016 at 06:18:38AM +1000, Dave Airlie wrote:
> This was just a random thought process I was having last night, and
> wondered if it was possible.
> 
> We have a scenario with OpenGL where certain APIs hand large amounts
> of data from the user to the API and when you return from the API call
> the user can then free/overwrite/do whatever they want with the data
> they gave you, which pretty much means you have to straight away
> process the data.
> 
> Now there have been attempts at threading the GL API, but one thing
> they usually hit is they have to do a lot of unthreaded processing for
> these scenarios, so I was wondering could we do some COW magic with
> the data.
> 
> More than likely the data will be anonymous mappings though maybe some
> filebacked, and my idea would be you'd in the main thread create a new
> readonly VMA from the old pages and set the original mapping to do COW
> on all of its pages. Then the thread would pick up the readonly VMA
> mapping and do whatever background processing it wants while the main
> thread continues happily on its way.
> 
> I'm not sure if anyone who's done glthread has thought around this, or
> if the kernel APIs are in place to do something like this so I just
> thought I'd throw it out there.
> 

So iirc, i discussed doing that with Thomas while upstreaming ttm, a long
time ago in a far far away universe. There is 2 issues, for file back page
we just do not have any infrastructure to write protect a valid & uptodate
page. Even if we did, such file back page might be map so many times that
the cost of walking all the mapping and tlb flushing might be worse then
doing just memcpy. Finaly handling things like write() syscall would also
be problematic and require major code overhaul (especialy if we consider
direct io). So for file back page i would say this is a no go, unless i
am unaware of some magic kernel infrastructure that just do that already.

For anonymous memory issue mostly revolve around tlb flush, if we are
talking about few pages then you very likely better of doing memcpy. So
it would need some heuristic for that. That being said, the reason why i
never tried to implement it in the end is because you end up to defer the
memcpy. So the application still pay the memcpy cost, you can not expect
userspace free to do an munmap() after uploading texture. So i am not sure
it is worth doing. One thing that might make sense is some new madvise
kind of like MADV_DONTNEED, maybe MADV_STEAL or MADV_GIFT which would mean
that memory with that flag can be steal and replace by zero page. I know
this sounds like splice(SPLICE_F_MOVE) but we can not use splice here
because we can not change the OpenGL API.

So we could add a new get_user_pages_steal or get_user_pages_cow, and
probably best to implement the latter first and see if it already helps
with real world apps but i have my doubts.

Cheers,
Jérôme