Compression branch ready

Thu Aug 25 15:56:43 PDT 2011

On Thu, Aug 25, 2011 at 10:12 PM, Zack Rusin <zack at kde.org> wrote:
> On Thursday, August 25, 2011 01:17:30 PM José Fonseca wrote:
>> On Thu, Aug 25, 2011 at 5:58 PM, Zack Rusin <zack at kde.org> wrote:
>> > I think we need two extra threads:
>> > - compression thread - the rendering thread writes the data to a
>> > ringbuffer, the compression thread compresses it and sticks the
>> > compressed data in another ringbuffer
>> > - disk io thread - which reads from the compressed data ringbuffer and
>> > writes it to disk.
>>
>> We could have async I/O instead of another thread.
>
> Yea, that'd be ideal and could be done independently, but we'd need to either
> write our own aio abstraction or use something like boost asio, which doesn't
> seem to me like a whole lot of fun.
>
> Also interestingly I just did some tests:
> - ipers default: ~150fps
> - ipers trace current : ~55fps
> - ipers trace with rawWrite in Trace::File::write commented out: ~80fps
> - ipers trace with rawWrite and contents of OS::Acquire/ReleaseMutex commented
> out: ~112fps
> At this point callgrind says that most of the time is spent in strlen in
> Writer::writeString for every call...

last time I checked, callgrind was not accurate -- it uses intruction
count, and CPU simulation, instead of real CPU timings.  It would be
better to base decisions on timings from oprofile/sysprof/linux perf .

> So it seems that:
> 1) compressing/writing the data for ipers is less costly than the mutex we
> need to get and release at every call. I'm not sure what we could do to avoid
> it but it seems pretty costly so might it be worth us taking a look at it.

We could defer mutex holding until we see the second thread. We could
make the compression chunks per-thread, and only take the mutex when
writing them to the file.

> 2) We should probably add name_length, arg_lengths, member_lengths members to
> Trace::FunctionSig, Trace::StructSig and others and have the python generators
> also encode the lengths of each in the definitions. It would save millions of
> strlen calls for larger traces which add up.

It's a good idea, but these don't add up to millions -- every
function/structure/etc is processed only once. strlen must be called
for something else -- e.g., actual strings.

Again, I'm sceptical about any figure obtained with callgrind.

At any rate, although there is always space for improvement, I believe
at this point there are more worthwhile areas for improvments -- e.g.,
big traces, as you mentioned before -- which will become more likely
as now we can write traces faster to disk...

>  > The exception code is already commited on master, for better and
>> worse.  It might be worth to review if libgg does anything we don't we
>> our signal handling code yet, but that can be done any time.
>>
>> So the only thing on compression branch is the snappy format. So go for it.
>
> Awesome. Done. The code is in master :)
> Actually something just hit me. the way I encode the length of the compressed
> segments is going to break if the endianess of the tracing maching is different
> from the retracing machine. Do we want our traces to be endianess independent
> because I definitely broke it =)

We can define the chunk length should be always encoded in the file as
little endian.

Jose