[PATCH] mieq: Provide better adaptability and diagnostics during mieq overflow

Sun Oct 16 17:25:57 PDT 2011

On Sun, Oct 16, 2011 at 02:11:18PM -0700, Jeremy Huddleston wrote:
> 
> On Oct 16, 2011, at 8:44 AM, Keith Packard wrote:
> 
> > On Sat, 15 Oct 2011 23:31:35 -0700, Jeremy Huddleston <jeremyhu at apple.com> wrote:
> > 
> >> Yes.  This patch was motivated by some server bugs I read through in
> >> bugzilla today.  Apparently some DRI drivers can hog the CPU for a
> >> while, delaying processing.  In one of the reports, a user reported
> >> that setting the queue size to 4096 actually solved the issue for
> >> them.
> > 
> > One might suggest a better plan would be to fix the obvious problem :-)
> 
> I agree.  The problem is that the (sometimes) singular backtrace doesn't
> always point to the problem, and the current log message makes people
> blame mieq.  I don't think my patch addresses the second point, but it
> should help the first.

when we print the backtrace, we can print a message that the culprit is
most likely 7 layer up the backtrace. won't do that much, but it'll make it
easier to understand for those users that actually read.

> > How much memory does a queue of length 4096 take these days?
> 
> Well in the normal case, this will actually *reduce* memory usage since
> the default size is 128 rather than 512.  Only systems with problems will
> grow to 4096, and it was reported that the increased queue actually helps
> (although honestly I don't see why that would be the case).

it's easy to queue up a few hundred events before realising that the server
isn't responding. It takes a longer time to queue up 4096, so my guess is
human factor: user notices everything is hung, user stops, scratches head,
possibly makes and/or consumes beverage, whatever is stuck finishes and
voila - everything works again.

I do wonder though, does the increased queue size really help? assume that
you have an issue that gets unstuck after 10 seconds. After that time,
suddenly all the events come are replayed, possibly on a severely changed
screen. wouldn't it be better to drop the events and let the users redo the
input whenver the screen is up-to-date again?
increasing sampling and growing the queue are two separate issues that can
be implemented independently. I think only the first is really useful.

Cheers,
  Peter