Pushing image transport logic down the stack

Tue Sep 5 11:48:28 PDT 2006

On Mon, 2006-09-04 at 18:21 -0400, Owen Taylor wrote:

>  - You could reference an external shared memory buffer; it's clear
>    that at some image size, allocating a new shared memory buffer is
>    better than copying data, but I have no idea what that point
>    is - is it 100k, 1M, 10M?

So, I wrote a simulated client/server pair and did some experimentation
simulating the effect of XPutImage to an in-memory pixmap:

 Method A:

  Process 1 writes N bytes into the socket
  Process 2 reads N bytes from the socket into a buffer
  Process 2 scans over the N bytes read in a separate pass and 
     adds them up (adding the bytes to touch them as a simulation
     of copying, compositing, or whatever.)

 Method B:

  Process 1 opens a posix shared memory file and mmaps it
  Process 1 fills in the mmaped buffer with N bytes
  Process 1 passes the file descriptor over the socket
  Process 2 gets the file descriptor from the socket and  mmaps it
  Process 2 scans over contents of the buffer and adds it up

Not surprisingly, for small values of N, method A is much faster. The
overhead for creating and mapping a shared memory buffer in my tests
was about 50usec / 100,000 cycles. Method A was noticeably faster for
up to about N = 1M.

But contrary to my expectations, for larger values of N, method B is
*not* faster; from N = 10M to N = 100M, the two methods were about
the same speed.

I think the explanation here is that the limiting factor is simply how
many times memory is accessed uncached.

In Method A, you have:

 1 access: read bytes from source, write into socket, read from socket,
    write into output buffer
 1 access: read from output buffer, add up

In method B, you have:

 1 access: read bytes from source, write into shared memory buffer
 1 access: read from shared memory buffer, add up

So, it appears:

 Small images: number of cycles matter: reducing copies by using
   shared memory might be a win if there was 0 setup cost. (shm
   X protocol transport, perhaps)

 Large images: only thing that matters is the number of times we
   have to hit system memory. Shared memory doesn't help unless
   we can share the actual source data.

This actually supports the API proposal that I made at the beginning
of this thread - if you want to format convert a source image and then
do an XPutImage with it, it's much better to do the format conversion
streaming as XPutImage writes into the X socket rather than to convert
the whole image ahead of time.

                                   - Owen

My test program can be found at:

 http://fishsoup.net/software/buffer-passing/buffer-passing-0.1.tar.gz

if anyone wants to take a look or make it more accurately simulate 
different types of operations.