roland.mainz at nrubsig.org
Thu Nov 4 16:21:40 PST 2004
Adam Jackson wrote:
> On Thursday 04 November 2004 16:46, Roland Mainz wrote:
> > Jim Gettys wrote:
> > > > It would be interesting to see if the results in this paper can be
> > > > improved upon by using linux futexes rather than the Unix socket for
> > > > synchronization. The implementation referred to in this paper is still
> > > > available on a branch of the DRI xc tree, if anyone feels like some
> > > > archaeology.
> > >
> > > Not to mention the fact that Unix domain sockets on Linux are really,
> > > *really* fast;
> > ... which is a myth. It's fast but shared memory can usually beat it
> > without problems.
> As mentioned in the SMT paper I referenced earlier, shared memory needs a
> synchronization primitive. Unix domain sockets give that to you for free, in
> the form of poll(). Shared memory transports are fine if your implementation
> overcomes the sync latency. Alternatively, the common case is the server
> sleeping in poll(), and the kernel can wake the kernel up directly once the
> send() completes from the client. Tough to beat that.
> Like I said, it'd be interesting to see if shm+futexes are faster.
I would prefer if Sun can contribute their current X11 SHM transport
code. It has been working quite good over several years and even got
active maintaince in the last two years. After that code has been
integrated the work on futexes could start on top of that codebase...
> > You save at least one data copy (which is important
> > when you shuffle around large amounts of image/texture/etc. data) and
> > don't have to split the data (there's the BIGREQUESTS extension but Xsun
> > doesn't support it right now).
> I am not concerned with implementation deficiencies in other people's X
> servers ;). And yes, you'll have to split the data eventually. BIGREQUESTS
> gives you a 16M X11 packet by default, but that gets split over 1.5K TCP
> segments or 64K Unix segments. Surprisingly enough that can actually be
> faster than a naive shared memory implementation due to cache friendliness.
Depends on the type of machine like single-CPU vs. SMP. It's very likely
that most of the future machines will have at least two CPUs (dual-core
- or even octa-core (I mean 8 cores per die and 4 threads per core
(somehow I think the threaded Xserver would rock on such a design :)) in
Sun's Niagara machines) which will cause a change in the statistics...
> In principle there's no reason we can't skip the endian shuffling on Unix
> sockets too.
Erm... not really. You can skip the if() but you still may need extra
copies for aligment etc.. In the SHM-transport case you can simply move
this work to the Xserver side where this job can be done more
efficiently (via accerlation etc.).
> > For example if you measure Sun's shared memory transport with
> > "x11perf" you only get a one-digit percent improvement - but when you
> > test it with Mozilla's DHTML perf. tests then you get a perf.
> > improvement in the three-digit percent range (!!!; this is the reason
> > why Mozilla/Firefox use the Xsun shared memory transport by default),
> > assuming you use the default buffer size - which is far too small for
> > todays applications, incresing it makes applications even faster.
> Sun boxes are not x86 boxes. (Well. amd64 excluded.) Sparc chips have a
> sensible MMU;
Actually only UltraSPARCs as they have a (partially) software-managed
MMU, the Fujitsu SPARC64 uses a different design here. And the whole MMU
issue on the UltraSPARCs can be avoided via locking (e.g. the pages will
not be swapped out and the hardware MMU entries can be shared between
processes (e.g. no address translation)) the shared memory pages in
memory (usually 4MB large pages, Solaris >= 9 could use any page size
ranging from 8KB, 64Kb, 512KB, 4MB etc. for that).
> the ratio of GPU:CPU performance can be much higher than it is
> on a typical Linux desktop; they have high bandwidth relative to their clock
> speed. They are, in short, not subject to the same design considerations.
> So, yes, SMT works for Sun. We don't know that it works for x86. Previous
> research indicates that it doesn't, but that may have changed by now.
Solaris/x86 has the shared memory transport, too. It would be nice if
Sun can contribute that code 1:1 for Xorg.
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 7950090
(;O/ \/ \O;)
More information about the xorg