Pushing image transport logic down the stack
Owen Taylor
otaylor at redhat.com
Tue Sep 5 11:48:28 PDT 2006
On Mon, 2006-09-04 at 18:21 -0400, Owen Taylor wrote:
> - You could reference an external shared memory buffer; it's clear
> that at some image size, allocating a new shared memory buffer is
> better than copying data, but I have no idea what that point
> is - is it 100k, 1M, 10M?
So, I wrote a simulated client/server pair and did some experimentation
simulating the effect of XPutImage to an in-memory pixmap:
Method A:
Process 1 writes N bytes into the socket
Process 2 reads N bytes from the socket into a buffer
Process 2 scans over the N bytes read in a separate pass and
adds them up (adding the bytes to touch them as a simulation
of copying, compositing, or whatever.)
Method B:
Process 1 opens a posix shared memory file and mmaps it
Process 1 fills in the mmaped buffer with N bytes
Process 1 passes the file descriptor over the socket
Process 2 gets the file descriptor from the socket and mmaps it
Process 2 scans over contents of the buffer and adds it up
Not surprisingly, for small values of N, method A is much faster. The
overhead for creating and mapping a shared memory buffer in my tests
was about 50usec / 100,000 cycles. Method A was noticeably faster for
up to about N = 1M.
But contrary to my expectations, for larger values of N, method B is
*not* faster; from N = 10M to N = 100M, the two methods were about
the same speed.
I think the explanation here is that the limiting factor is simply how
many times memory is accessed uncached.
In Method A, you have:
1 access: read bytes from source, write into socket, read from socket,
write into output buffer
1 access: read from output buffer, add up
In method B, you have:
1 access: read bytes from source, write into shared memory buffer
1 access: read from shared memory buffer, add up
So, it appears:
Small images: number of cycles matter: reducing copies by using
shared memory might be a win if there was 0 setup cost. (shm
X protocol transport, perhaps)
Large images: only thing that matters is the number of times we
have to hit system memory. Shared memory doesn't help unless
we can share the actual source data.
This actually supports the API proposal that I made at the beginning
of this thread - if you want to format convert a source image and then
do an XPutImage with it, it's much better to do the format conversion
streaming as XPutImage writes into the X socket rather than to convert
the whole image ahead of time.
- Owen
My test program can be found at:
http://fishsoup.net/software/buffer-passing/buffer-passing-0.1.tar.gz
if anyone wants to take a look or make it more accurately simulate
different types of operations.
More information about the xorg
mailing list