[Mesa-dev] IDEA: Kernel threaded dma dispatch with a shared area per process

Jeff Hartmann jeff.hartmann at gmail.com
Tue Sep 28 22:30:25 PDT 2010


Hello everyone, its been awhile.

I have been thinking about something lately and I thought I would share some
some very rough proof of concept code for a lock free single reader / single
writer ring buffer.   It uses atomic operations when the ring is not empty
or full, and uses the futex system call to provide blocking in those
conditions.  I was thinking that something like this could be useful to
build a method to communicate to a kernel based dma dispatch thread.

The basic idea is like this:

Each process will call into the kernel when they initialize the DRI.  The
kernel would setup a set of shared pages to use for command dispatch.  The
producer end would be used by the user space driver, and the consumer end
would terminate in a kernel thread.  This single thread would service all
dma requests for the DRI processes in the system, and could even be made
more complex if necessary or desired (managing saving and restoring of
hardware state between the different process, managing memory management
requests, etc.)  One of the things that is good about a design like this is
obviously there are very few system calls involved, and it can offload some
bookkeeping to another cpu core.

Was anything like this explored when you guys did the redesign for DRI2?
 Has anyone measured what kind of raw performance is possible with some of
the current dma dispatch mechanisms used by the DRI 2 drivers?

This code is quite rough and is just meant as a demonstration, no attempt
was made to optimize it.  It might have bugs and races, but it seems pretty
solid with my limited testing.  It will probably only run on 32-bit x86, and
no attempt was made to make anything else work.  I would be interested if
some people could compile and test the code I've attached and see what kind
of output it produces.  I only have access to a older CORE 2 laptop running
linux right now, so I wonder how good or bad it would do on a recent good
desktop.

I guess this sort of approach might also be useful for any other bulk data
transfer between processes (Xlib transport maybe) or with the kernel.

It performs 3 basic tests with various different buffer configurations (1
big 64k buffer, 4 16k buffers, and 8 4k buffers), a write only, read/write,
and write/validate test (tiger hash calculated on the buffers as they are
read).  Here is the output on my system:

Beginning test : write only, read is a noop (one 64k buffer).
Starting write test
Starting read test
Time 1: 0, Time 2: 5380000
Number of seconds: 5.3800
Number of bytes: 10240000000
Rate (MB/sec): 1815.1719
Number of read operations: 10000000
Number of write operations: 10000000
Number of blocking writes: 92673
Number of blocking reads: 713363
Write block ratio: 0.0093
Read block ratio: 0.0713
Beginning test : read and write. (one 64k buffer)
Starting write test
Starting read test
Time 1: 5380000, Time 2: 9530000
Number of seconds: 4.1500
Number of bytes: 10240000000
Rate (MB/sec): 2353.1627
Number of read operations: 10000000
Number of write operations: 10000000
Number of blocking writes: 91006
Number of blocking reads: 96142
Write block ratio: 0.0091
Read block ratio: 0.0096
Beginning test : data that is written is validated as it is read. (one 64k
buffer)
Starting write validate test
Starting read validate test
Matches: 2000000, failures: 0, success ratio: 1.0000
Time 1: 9530000, Time 2: 15830000
Number of seconds: 6.3000
Number of bytes: 2048000000
Rate (MB/sec): 310.0198
Number of read operations: 2000000
Number of write operations: 2000000
Number of blocking writes: 1987287
Number of blocking reads: 93
Write block ratio: 0.9936
Read block ratio: 0.0000
Beginning test : write only, read is a noop (four 16k buffers).
Starting write test
Starting read test
Time 1: 15830000, Time 2: 21200000
Number of seconds: 5.3700
Number of bytes: 10240000000
Rate (MB/sec): 1818.5521
Number of read operations: 10000000
Number of write operations: 10000000
Number of blocking writes: 200612
Number of blocking reads: 541775
Write block ratio: 0.0201
Read block ratio: 0.0542
Beginning test : read and write. (four 16k buffers)
Starting write test
Starting read test
Time 1: 21200000, Time 2: 27460000
Number of seconds: 6.2600
Number of bytes: 10240000000
Rate (MB/sec): 1560.0040
Number of read operations: 10000000
Number of write operations: 10000000
Number of blocking writes: 636012
Number of blocking reads: 349444
Write block ratio: 0.0636
Read block ratio: 0.0349
Beginning test : data that is written is validated as it is read. (four 16k
buffers)
Starting write validate test
Starting read validate test
Matches: 2000000, failures: 0, success ratio: 1.0000
Time 1: 27460000, Time 2: 33830000
Number of seconds: 6.3700
Number of bytes: 2048000000
Rate (MB/sec): 306.6130
Number of read operations: 2000000
Number of write operations: 2000000
Number of blocking writes: 1998643
Number of blocking reads: 79
Write block ratio: 0.9993
Read block ratio: 0.0000
Beginning test : write only, read is a noop (eight 4k buffers).
Starting write test
Starting read test
Time 1: 33830000, Time 2: 53320000
Number of seconds: 19.4900
Number of bytes: 10240000000
Rate (MB/sec): 501.0582
Number of read operations: 10000000
Number of write operations: 10000000
Number of blocking writes: 1514672
Number of blocking reads: 5898047
Write block ratio: 0.1515
Read block ratio: 0.5898
Beginning test : read and write. (eight 4k buffers)
Starting write test
Starting read test
Time 1: 53320000, Time 2: 69760000
Number of seconds: 16.4400
Number of bytes: 10240000000
Rate (MB/sec): 594.0161
Number of read operations: 10000000
Number of write operations: 10000000
Number of blocking writes: 3061476
Number of blocking reads: 3021581
Write block ratio: 0.3061
Read block ratio: 0.3022
Beginning test : data that is written is validated as it is read. (eight 4k
buffers)
Starting write validate test
Starting read validate test
Matches: 2000000, failures: 0, success ratio: 1.0000
Time 1: 69760000, Time 2: 76320000
Number of seconds: 6.5600
Number of bytes: 2048000000
Rate (MB/sec): 297.7325
Number of read operations: 2000000
Number of write operations: 2000000
Number of blocking writes: 1999559
Number of blocking reads: 203
Write block ratio: 0.9998
Read block ratio: 0.0001

Thanks,
-Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20100929/d86d0e4d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fastpipe.tar.gz
Type: application/x-gzip
Size: 25620 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20100929/d86d0e4d/attachment-0001.bin>


More information about the mesa-dev mailing list